Many artificial intelligence (AI) algorithms are designed to identify patterns in order to automate decisions and make people’s lives easier. This kind of technology can recognize a person’s favorite style of music, genre of films, or news topic. However, because they are programmed to create behavioral models, the algorithms can also replicate undesirable behavior, such as racism, misogyny, and homophobia. They often absorb, reproduce, and as a result, strengthen the various forms of discrimination and intolerance seen in society today.
In August 2019, a study by researchers at the Federal University of Minas Gerais (UFMG) presented an example of this vicious cycle that was picked up by several international publications: a political radicalization process on YouTube, in which the AI recommendation algorithm plays an important role. “There were already qualitative studies and reports that showed YouTube to be a breeding ground for the proliferation of obscure communities linked to the alt-right, whose ideas are closely related to white supremacy,” says computer scientist Manoel Horta Ribeiro, currently a PhD student at the École Polytechnique Fédérale de Lausanne (EPFL), Switzerland. During his master’s degree at UFMG, where he was supervised by computer scientists Wagner Meira Jr. and Virgílio Almeida, he attempted to understand how this phenomenon occurred.
The group examined 331,849 videos posted by 360 channels of varying political stances and 79 million comments. The volume of data was immense, with the analysis itself only possible thanks to artificial intelligence. “The only manual work involved was classifying the channels according to their political position,” says Ribeiro. The results revealed that viewers of politically conservative channels that publish less radical content frequently migrate to white supremacist channels.
“We tracked the trajectory of users who commented on videos from conservative channels and found that, over time, they began commenting on videos uploaded by more radical channels. There was a consistent migration from the lighter to the more extreme content, but we don’t yet know exactly why this happens,” explains Ribeiro. “I believe there are three reasons for the phenomenon: the media format, through which anyone can create content and viewers interact directly with the creators; the current global political landscape; and the algorithm, which allows users to find or continue to consume extremist content via the recommendation system.”
Research involving YouTube has become more prominent in recent years. According to Virgílio Almeida, professor emeritus at UFMG’s Computer Science Department, the video platform has proven a very interesting topic for science. “The number of users is huge—more than 2 billion worldwide and 70 million in Brazil—and its impact on society is just as big,” says the researcher. His department has become a hub for research on social network phenomena.
Almeida first dedicated himself to the field in 2007. The most impactful research is often topics related to politics, an area that is currently highly polarized in both the USA and Brazil. In 2018, an analysis of hate speech and discrimination in videos posted on YouTube by American right-wing groups was highlighted at the International ACM Conference on Web Science, held in the Netherlands. The research, carried out by PhD students Raphael Ottoni, Evandro Cunha, Gabriel Magno, and Pedro Bernardina—all from the group led by Wagner Meira Jr. and Virgílio Almeida—was recognized as the best study done by students.
The UFMG researchers used the Linguistic Inquiry and Word Count (LIWC) and Latent Dirichlet Allocation (LDA) methods to analyze transcriptions of YouTube videos and the comments posted on them. With LIWC, words are classified according to the sentence structure (pronouns, verbs, adverbs, etc.) and emotion of the content (expressions of joy, sadness, anger, etc.), while LDA looks for words that can define the main topics of a conversation.
“We also used a tool based on a psychological test to observe biases behind the messages,” explains Raphael Ottoni. The tool compares the distances between words of the same context with the objective of establishing associations by using machine-learning techniques to convert words from a text into numeric vectors; these vectors are then used to calculate the semantic similarity of the words. For a given subject, there tends to be an association in the meaning of words that are closer together. “Words like Christianity appear to be associated with positive attributes, such as good or honest, while Islam was often associated to terrorism and death,” says Ottoni.
The same techniques were applied to Brazil during the 2018 presidential election, with researchers studying videos published on 55 YouTube channels from across the political spectrum, from the extreme left to the extreme right. Hate messages and conspiracy theories were identified most often on the far right channels—and these have seen the greatest growth in the number of views. The researchers are now finalizing an article in which they will present the results of this study. But even before publication, the study was cited in August 2019 by The New York Times, which published a series of articles about the influence of YouTube in various countries, including Brazil.
According to Almeida, other studies have already found that news and video recommendation algorithms exploit people’s natural attraction to negative news and conspiracy theories to increase user engagement with the platform. “Research by a group from MIT [Massachusetts Institute of Technology] published in the journal Science in March 2019 showed that fear, anger, and other extreme emotions are key factors in the spread of false tweets,” he says.
In the same way that algorithms assimilate music and movie preferences, they also learn a user’s political preferences, which is why content-sharing platforms, such as Facebook, can become almost insurmountable political bubbles. Users only receive information that corroborates their previous opinions.
American computer scientist Christo Wilson, from Northeastern University in Massachusetts, USA, began studying social networks in 2012 precisely to research this phenomenon—inspired by the book The Filter Bubble (Zahar, 2012) by American activist Eli Pariser. “My research originally focused on studying the personalization of algorithms used by search engines, and since then I have expanded to other types of algorithms and contexts,” the researcher told Pesquisa FAPESP. Wilson intends to return to the field of politics in 2020: he is planning a major study on the impact of social networks on the USA’s next presidential election.
Biased algorithms can be found where they are least expected—in cell phone voice assistants, for example. A study carried out by the UFMG group in partnership with the University of Fortaleza (UNIFOR) found that the efficiency of voice assistants, such as Google Assistant and Apple’s Siri, varies according to the user’s accent and level of education. Computer scientist Elizabeth Sucupira Furtado, head of UNIFLOR’s Laboratory for the Study of Users and Systems Use Quality, conducted a study with two groups of volunteers: residents of Fortaleza, including some born in other states, and students from an evening class for Youth and Adult Education. “Users born in the south and southeast of Brazil were more likely to be understood by voice assistants than others,” reveals the researcher.
Pronunciation errors (cacoepy), stuttering, repetition of words, and truncations (lack of fluency) also impaired the performance of the AI assistants. According to the researcher, since the systems are trained by users with higher levels of education, they tend to be limited to certain forms of speech. “It is important for companies to realize that part of their audience is not being properly served,” warns Furtado.
Search engines also hide prejudices, as demonstrated by computer scientist Camila Souza Araújo in her master’s dissertation, defended at UFMG in 2017. The researcher searched for the terms “beautiful women” and “ugly women” on Google and Bing, and found prejudice in terms of race and age. The women identified as beautiful were mostly white and young. The bias was reproduced in most of the 28 countries where Bing operates and 41 countries that use Google, even those located in Africa.
By using machine-learning systems, society risks inadvertently perpetuating prejudice, thanks to a common belief that mathematics is always neutral. American data engineer Fred Benenson coined a term to define this risk: mathwashing. The term is based on the concept of greenwashing, which describes how marketing strategies are used by companies to make it look like they are concerned about the environment. Similarly, the idea that algorithms are neutral benefits companies and exempts them from responsibility.
The truth is that AI systems are powered by data, and that data is selected by human beings—who can be driven by prejudice, whether consciously or not. One example of this was described by a study published in the journal Science in October, led by a scientist from the University of California, Berkeley, in the USA. The researchers found that a hospital algorithm responsible for classifying patients most in need of follow-up care—because they are at greatest risk—favored white patients over people of color. The explanation was that the system was based on health plan payments, which are higher for people who have greater access to medical care, and not on the likelihood of a patient suffering a serious or chronic illness. The case shows that algorithm design can be directly responsible for prejudice in the results.
Protecting society from the misinformation and prejudice spread by artificial intelligence is a challenge that we can try to overcome through education. Virgílio Almeida highlights schools in Finland as an example, which encourage children to develop a critical spirit and to identify fake news online. But educating users is not enough, programmers also need to be more aware of the issue. “One of the ways to prevent bias is to train the algorithm with more diverse data,” points out Almeida.
Undergraduate student Bruna Thalenberg, one of the founders of Tecs, a Social Computing Group at the Institute of Mathematics and Statistics of the University of São Paulo (IME-USP), agrees: “The world is constantly changing. These algorithms must not repeat the past.” Founded in 2017 as an extension team, Tecs was born from a dialogue between a group of students from USP and a Brazilian colleague, Lawrence Muratta, who was studying computer science at Stanford University, USA, where a group was already discussing the issue of bias.
“We felt that the computer science course was too far removed from society,” says former student Luiz Fernando Galati, who now works at the Center for Teaching and Research in Innovation at Fundação Getulio Vargas. The group’s initial objective was to promote lectures and debates, but they ended up introducing a new course to the curriculum.
“Our lectures are offered as part of the law and software discipline, under the supervision of professors Daniel Macedo Batista and Fabio Kon,” says Galati. Tecs is also a member of the TechShift Alliance, an organization of 20 university student organizations from North and South America and Asia who come together to debate social issues related to AI.
As well as reflecting on the topic, Tecs takes concrete action through projects that allow marginalized groups to access the digital world. One of these projects is programming logic classes for students at the Socio-Education Center for Adolescents (Fundação CASA). “The first class was given in the second semester of 2018,” says student Jeniffer Martins da Silva, who teaches with the project. Since it was created, more than 40 young people have completed the course.
Artificial intelligence can also offer means of prevention and control. In 2018, researchers from USP and the Federal University of São Carlos (UFSCar) launched the pilot version of a digital tool designed to identify fake news. It is available free of charge online or via WhatsApp. Users simply submit suspicious news articles to the system, and if it finds evidence of falsehoods, it responds: “This news may be fake. Please look for other reliable sources before sharing it.” According to the study’s authors, the system accurately identifies up to 90% of news articles that are either totally false or totally true.
At the University of Campinas (UNICAMP), a group led by computer scientist Anderson Rocha, director of the Institute of Computing, has been working on mechanisms capable of identifying false information in photos and videos. “We use AI techniques to compare the information in a given text with comments and images. We can check these three groups of information and then indicate the possibility of a discrepancy that could lead to the identification of fake news,” says Rocha.
Greater transparency is also needed from the private sector. The term “algorithmic accountability” is being increasingly used in debates on the use of AI. According to lawyer Rafael Zanatta, a specialist in digital law and a member of the Digital Ethics, Technology and Economics research group at USP, there are still no specific laws related to discriminatory algorithms, but steps in this direction are being taken. A bill called the Algorithmic Accountability Act has recently been proposed in the USA. If approved, companies will have to assess whether the algorithms behind their AI systems are biased or discriminatory and whether they pose a privacy or security risk to consumers.
In April 2019, the European Union released ethical guidelines for the use of AI, including measures that hold businesses accountable for its social consequences and the potential for human intervention and supervision of the system.
In Brazil, an attempt was made in 2019 to pass legislation requiring some automated decisions to be reviewed by a human. A citizen who felt harmed by an algorithm’s decision—during a loan application, for example—could demand that a reviewer clarify the criteria behind the decision. The bill, however, was vetoed by the Presidency of the Republic, sensitive to the argument by businesses that such measures would incur high costs.
Déjà vu: Coherence of the time, space, and characteristics of heterogeneous data for integrity analysis and interpretatio (nº 17/12646-3); Grant Mechanism Thematic Project; Principal Investigator Anderson de Rezende Rocha (UNICAMP); Investment R$1,385,219.47.
RIBEIRO, M. H. et al. Auditing radicalization pathways on YouTube. arXiv. 22 ago. 2019.
CAETANO, J. A. et al. Characterizing attention cascades in WhatsApp groups. Proceedings of the 10th ACM Conference on Web Science. p. 27-36. 26 jun. 2019.
CAETANO, J. A. et al. Analyzing and characterizing political discussions in WhatsApp public groups. arXiv. 2 abr. 2018.
OTTONI, R. et al. Analyzing right-wing YouTube channels: Hate, violence and discrimination. Proceedings of the 10th ACM Conference on Web Science. p. 323-332. 15 mai. 2018.
RIBEIRO, M. H. et al. Characterizing and detecting hateful users on Twitter. Twelfth International AAAI Conference on Web and Social Media. 15 jun. 2018.
ARAUJO, C. et al. Identifying stereotypes in the online perception of physical attractiveness. International Conference on Social Informatics. p. 419-37. 23 out. 2016.
LANNA, L. et al. Discrimination analysis of intelligent voice assistants. 18th Brazilian Symposium on Human Factors in Computing Systems. October 22-25, 2019.