{"id":249883,"date":"2017-12-05T17:52:05","date_gmt":"2017-12-05T19:52:05","guid":{"rendered":"http:\/\/revistapesquisa.fapesp.br\/?p=249883\/"},"modified":"2017-12-05T18:55:36","modified_gmt":"2017-12-05T20:55:36","slug":"the-reality-emerging-from-an-avalanche-of-data","status":"publish","type":"post","link":"https:\/\/revistapesquisa.fapesp.br\/en\/the-reality-emerging-from-an-avalanche-of-data\/","title":{"rendered":"The reality emerging from an avalanche of data"},"content":{"rendered":"<p><a href=\"http:\/\/revistapesquisa.fapesp.br\/wp-content\/uploads\/2017\/12\/018-humanidades-digitais_abre.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright wp-image-249888\" src=\"http:\/\/revistapesquisa.fapesp.br\/wp-content\/uploads\/2017\/12\/018-humanidades-digitais_abre-739x1024.jpg\" alt=\"\" width=\"300\" height=\"416\" \/><span class=\"media-credits-inline\">Z\u00e9 Vicente<\/span><\/a>Computers are tools used in the work of researchers in all fields of knowledge, but in the case of the humanities and social sciences community, the digitization of artistic and historical collections and the input of economic and social information into giant databases have opened up new fronts for observing phenomena and analyzing trends. This has rather naturally developed into a closer relationship with computer scientists, whose Big Data research studies have multiplied the ways to organize and analyze information, giving rise to an interdisciplinary field known as the digital humanities. \u201cThe term was coined to define research that uses computational technology to study the humanities, but it also refers to the research that uses the humanities to study digital technology and its influence on culture and society,\u201d explains Brett Bobley, director of the Office of Digital Humanities at the National Endowment for the Humanities (NEH), a U.S. government funding agency.\u00a0 This is not a new field, says Bobley, but rather a range of activities that can include the use of aerial photographs by archeologists to scan sites, the development of data analysis techniques that help linguists study old newspapers, and the study of the ethics of the technology by philosophers, to give just a few examples.<\/p>\n<p>One of the NEH-funded projects in digital humanities recovered the field diaries of British explorer David Livingstone (1813-1873). Historical accounts of his 1871 voyage to Central Africa were written on old newspapers because there was no paper available.\u00a0 Over time, the ink faded and the writings in which Livingstone recorded his impressions about the dynamics of the slave trade, among other observations, were rendered illegible.\u00a0 Between 2013 and 2017, a group of humanities and computer science researchers from the United States and the United Kingdom were able to recover the writings by using spectral imaging photographic techniques that permitted retrieval of information invisible to the human eye.<\/p>\n<p>Another example was the collaboration between historians from several parts of the world in organizing records about nearly 36,000 slave ship voyages that took place between 1514 and 1866, carrying more than 12 million slaves from Africa.\u00a0 The effort, begun in the 1990s by American historian David Eltis at Emory University, resulted in the Trans-Atlantic Slave Trade Database, available online since 2007 at slavevoyages.org. Analysis of the data, which assembles records in several languages and encompasses the activities of the ports through which the vessels passed, has offered the historians new insights on how Africans experienced and resisted deportation and enslavement, and revealed new transatlantic connections in the slave trade.<\/p>\n<p>An initial compilation was released as a CD-Rom in 1999, but the collaborative effort to obtain data about the voyages was able to subsequently put together a more complete picture of the slave trade.\u00a0 During its initial phase, it is estimated that Brazil took in nearly 3.6 million slaves, but documents showed that this contingent was closer to 5 million\u2014for a total of 10.7 million Africans deported to the Americas.\u00a0 The initiative had a considerable impact on the research about slavery, says Manolo Florentino, a professor at the Federal University of Rio de Janeiro (UFRJ) in charge of the Brazilian arm of the project. Chief among them was the fact that it replaced estimates with solid data obtained from primary sources.\u00a0 Another impact was to show Brazil\u2019s prominence in the slave trade. \u201cA large number of the documents obtained through the project are written in Portuguese, a sort of lingua franca of the slave trade,\u201d says Florentino, who in recent years has embarked on efforts to translate the entire site into Portuguese.\u00a0 Florentino says that the collection of data on the deportation and enslavement of the Africans is now providing information for a less-explored line of research involving the paths the slaves took inside Brazil after they arrived in the ports.<\/p>\n<p><strong><a href=\"http:\/\/revistapesquisa.fapesp.br\/wp-content\/uploads\/2017\/12\/018-humanidades-digitais_02.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-medium wp-image-249885\" src=\"http:\/\/revistapesquisa.fapesp.br\/wp-content\/uploads\/2017\/12\/018-humanidades-digitais_02-300x153.jpg\" alt=\"\" width=\"300\" height=\"153\" \/><span class=\"media-credits-inline\">Z\u00e9 Vicente<\/span><\/a>A variety of projects<\/strong><br \/>\nThe results of a recent international call for proposals has demonstrated the diversity of the digital humanities. One hundred and eight proposals by interdisciplinary teams from 11 countries were submitted during the fourth edition of what is known as the Digging into Data Challenge, and 14 were approved. The initiative is part of the Trans-Atlantic Platform (T-AP), a collaboration in the humanities and social sciences that is bringing together 16 funding agencies from Europe and the Americas, including FAPESP. \u201cWe saw a noticeable increase in the number of countries taking part, which in previous calls for proposals had numbered only four. The surge in new collaborations is making a big difference,\u201d says Brett Bobley, who devised the idea for the Digging into Data program in 2008. Approved projects encompass disciplines such as musicology, linguistics, history, political science and economics, and they will receive investments totaling $9.2 million, equivalent to R$29 million.\u00a0 One of the proposed projects involves researchers from the United States, Germany and The Netherlands and will focus on three databases that make up the written and oral records of folklore from several corners of Europe.\u00a0 The goal is to identify patterns that reappear over time in different places, helping show which beliefs were common in the past, based on the stories told and the spread of legends and tales of supernatural occurrences.<\/p>\n<p>Another example, led by economists and computer scientists from the United States, Canada and The Netherlands, plans to cross-reference information about price variations of products sold on the Internet all over the world, continuously collected by the Billion Prices project at the Massachusetts Institute of Technology (MIT), with economic data that can be used to produce research studies on inflation, purchasing power, and standards of living in several countries.\u00a0 There is also an initiative to analyze 70 years of press coverage of terrorist attacks, in a search for patterns of what could constitute a responsible approach to the problem.\u00a0 Still another project will investigate the melodic structures of jazz recordings, in an attempt to connect them to the development of the historical and social context in which the songs emerged.<\/p>\n<p>To select the 14 projects included, more than 200 experts evaluated the 108 proposals. \u201cThe variety of issues covered shows that there is a huge potential to be developed in the field of digital humanities in Brazil,\u201d says Claudia Bauzer Medeiros, a professor at the Institute of Computing at the University of Campinas (Unicamp) and FAPESP representative on the T-AP.\u00a0 Medeiros took part in the entire process, from drafting the call for proposals to selecting the projects.\u00a0 \u201cThe field is under-explored in Brazil because there is still so little collaboration among researchers from the humanities and social sciences and computer science.\u00a0 They\u2019re gradually realizing that this interaction is possible.\u00a0 Researchers in the humanities and social sciences don\u2019t have to understand computing to work well in this field, but they do have to collaborate with experts on aspects of computing,\u201d says the researcher who is also coordinator of the FAPESP Research Program on eScience.<\/p>\n<p>Brazilians are participating in one of the projects selected under the Digging into Data Challenge.\u00a0 It involves a collaboration among researchers from France, Argentina and Brazil studying how opinions spread in society and how the process has changed as a result of advances in information technology.\u00a0 The study will analyze two databases to map the establishment of networks of relationships among groups of individuals\u2014such connections will be represented in visual structures (graphs).\u00a0 In one collection, by the<em> New York Times<\/em> newspaper, the objective will be to analyze reports about Brazil published over the course of 70 years in order to map the relationships between groups of individuals and entities mentioned in the pieces that talked about Brazil. \u201cThe plan is to understand where they came from and how the ideas and opinions reproduced in the texts were related, especially those regarding political and economic topics, and how this has changed over time.\u00a0 We also want to determine the possible influence that news by foreign correspondents published in that newspaper had on the formation of public opinion in Brazil,\u201d explains researcher Maria Eunice Quilici Gonzalez, head of the Brazilian group that is taking part in the project and a professor in the Department of Philosophy of the School of Philosophy and Sciences at S\u00e3o Paulo State University (Unesp), Mar\u00edlia campus.<\/p>\n<p>The second database is a collection of Twitter postings on electoral processes. The idea is to show how opinions form and grow stronger in the virtual environment.\u00a0 \u201cWe would like to analyze the dynamics of how opinions spread through social media. The more extensive the relationships, the tighter are the network connections represented in the graphs.\u00a0 The trend is for them to take center stage and inhibit the growth of other connections, thus showing the pathway to how opinions are formed,\u201d Gonzalez reports.\u00a0 One of the group\u2019s interests lies in studying the formation of politically polarizing environments on social media. \u201cGroups that once were isolated are now able to reinforce their opinions and gain followers, feeding off of communications on social media.\u00a0 This happened recently, for example, with groups for or against impeachment in Brazil.\u201d\u00a0 Besides specific objectives, the project has more general ambitions, including assessing possibilities for creating models to study social practices and investigate the potential ethical consequences of using Big Data analysis on processes of social self-organization, which are those that emerge from spontaneous interactions among various social actors\u2014leaderless and without interference from an organized center.<\/p>\n<p>The project will be carried out in partnership with researchers from the universities of Cergy-Pontoise in France and Buenos Aires in Argentina. The team is critical of the idea that it is possible to shape behaviors or guide how opinions are formed by manipulating trends obtained through analysis of Big Data alone.\u00a0 \u201cIt would be an exaggeration to say that Donald Trump was elected president and that the British voted to leave the European Union solely because the respective campaigns hired the political marketing firm Cambridge Analytics to utilize data and social media tools to manipulate voters\u2019 wishes and fears,\u201d Gonzalez says. \u201cThe study of Big Data can identify trends, but it is far from capable of explaining human nature.\u00a0 Its use will only be efficient if it is accompanied by the study of the attitudes of certain groups, which in the case of the United States and the United Kingdom were related to the preponderance of nationalism and an aversion to multiculturalism.\u201d<\/p>\n<p>With an undergraduate degree in physics, a master\u2019s degree in philosophy and a PhD in linguistics and cognitive science, Gonzalez will also contribute to the project, with the support of a team of Brazilian researchers, by providing ideas concerning the ethics involving individuals\u2019 actions on social media.\u00a0 \u201cThe concept of privacy, for example, is changing.\u00a0 Some of the notions of privacy held by my generation do not apply to people on social media who systematically expose their personal details.\u00a0 There is also the issue of individuals who create false profiles, altering their personal characteristics, socioeconomic status and even their gender in an effort to virtually interact with others,\u201d she says. In her view, if at home a lot of people have to maintain an identity they do not like, they can live out their fantasies on social media without any apparent family pressures.\u00a0 \u201cTheir identity is fictitious, but the interaction that it provides can to some extent be real.\u00a0 They are able to use it to create a relationship with virtual partners, which in the past was not possible.\u201d\u00a0 To address situations like this, the Brazilian group will think about how Big Data analysis can help in the understanding of new patterns of behavior and the dynamics of formulating public opinion.<\/p>\n<p><strong><a href=\"http:\/\/revistapesquisa.fapesp.br\/wp-content\/uploads\/2017\/12\/018-humanidades-digitais_03.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright size-medium wp-image-249886\" src=\"http:\/\/revistapesquisa.fapesp.br\/wp-content\/uploads\/2017\/12\/018-humanidades-digitais_03-300x108.jpg\" alt=\"\" width=\"300\" height=\"108\" \/><span class=\"media-credits-inline\">Z\u00e9 Vicente<\/span><\/a>Topics and advances<\/strong><br \/>\nThe next scheduled edition of the Digital Humanities conference in August 2017, which will bring nearly 1,000 researchers from several countries together in Montreal, Canada, gives us some idea of the scope of the topics and technological advances that have established bridges between computer scientists and professionals in the humanities and social sciences. Workshops will address topics such as research applications in the humanities for computer vision tools, a concept used mainly in robotics through which artificial systems are able to extract information about images, simulating the functioning of the human vision system.\u00a0 \u00a0Or they may raise discussions about ethical and legal problems related to the use of digitized data that could expose an individual\u2019s privacy. \u00a0Honored at the conference in Montreal will be those responsible for the Text Encoding Initiative (TEI) project, a consortium which since the 1980s has developed and maintained a standard for the representation of texts in digital format, making them machine readable, and driving studies in the human sciences, especially in linguistics.\u00a0 \u201cIn the last 15 years, we\u2019ve had a qualitative change in the volume of textual data available, which has radically changed the possibilities of research,\u201d says Karina van Dalen-Oskam, chair of the Steering Committee of the Alliance of Digital Humanities Organizations (ADHO), the entity that organizes the conference.\u00a0 A professor of computational literary studies at the University of Amsterdam in The Netherlands, van Dalen-Oskam points to the progress new approaches have made in researching literature, such as the concept of remote scanning, which analyzes large volumes of data related not only to the work being studied, but also to the entire historical context in which it was produced, or to the field of stylometry that enables attribution of authorship to works of doubtful authenticity. \u00a0\u201cThese approaches allow us to learn more about the development of literary genres and even about factors that make a particular text a best seller or not,\u201d she says.<\/p>\n<p>The growth of this interdisciplinary field is accompanied by criticism that the digital humanities have generated more headlines than solid advances in knowledge and that they compete with traditional humanities in terms of the allocation of research funding.\u00a0 In an article published in <em>The New York Times<\/em> in 2015, Armand Marie Leroi, a professor of evolutionary biology at Imperial College London, calls into doubt digital humanities\u2019 capacity to produce innovative analyses of literature. He says that converting art into data does make it possible to look for new meanings in a work through new algorithms.\u00a0 \u201cBut it would have to create a very smart algorithm capable of flagging irony in the work of Jane Austen,\u201d he wrote.\u00a0 \u201cThe truth we talk about in art criticism is not the same as scientific truth.\u201d<\/p>\n<p>Researchers in this field respond with the argument that the digital humanities offer only an extension of traditional methods and skills, and are not intended to replace them.\u00a0 Written by a group of authors, the book <em>Digital Humanities<\/em> (MIT Press, 2012) states in its first chapter that the digital humanities \u201cdo not obliterate the ideas of the past, but rather supplement the commitment by the humanities to academic interpretation, informed research, organized argument and dialogue between the communities that practice it.\u201d<\/p>\n<p>Political scientist Eduardo Marques, a professor at the University of S\u00e3o Paulo School of Philosophy, Literature and Human Sciences (FFLCH-USP), points out that the approaches used by computer science and human and social sciences within the digital humanities come from different sources. \u201cThere was a meeting of two movements.\u00a0 One came from the hard sciences, with the development of data mining tools that enabled the production of information about the social world and the generation of new empirical fields.\u00a0 The human sciences, however, made use of existing statistical tools to study social phenomena,\u201d he explains.\u00a0 Since the rationales are different, it is hard to bring them together, Marques notes.\u00a0 \u00a0\u00a0\u00a0\u201cWhile the computer scientists are looking for patterns in large volumes of data in order to raise research questions, the social scientists are working from theoretical assumptions and are using digital tools to test their validity.\u00a0 There is a lot of dialogue, but it is hard to bring together different ways of approaching the issues.\u201d<\/p>\n<p>This dialogue has influenced the training of researchers.\u00a0 In the case of the human and social sciences, courses and disciplines in quantitative methods and analysis are gaining ground.\u00a0 \u201cThis is good news because the social sciences have always had a huge weakness in this field in Brazil, which also extends to qualitative analysis and studies with small samples,\u201d Marques explains, referring to initiatives such as the Summer School in Concepts, Methods and Techniques in Political Science and International Relations offered by the International Political Science Association (IPSA), the Department of Political Science at FFLCH-USP and the Institute of International Relations at USP.\u00a0 Also growing in importance are disciplines on the ethical use of data.\u00a0 \u201cIt is an emerging issue and does not just look at how to prevent the dissemination of confidential patient data or sensitive public safety information,\u201d adds Claudia Bauzer Medeiros. There is the risk of producing biased analyses because many computer programs \u201clearn\u201d as the data is processed.\u00a0 Software is being developed to identify long-term patterns and incorporate them into their analytical capacity.\u00a0 \u201cThere have been situations in which the learning inadvertently reproduced biases.\u00a0 In the United States, it was discovered that a program used experimentally by judges in some cities to expedite rulings dealt more stringently with blacks and Latinos because it used as a lesson data from previous rulings.\u201d<\/p>\n<p>The development of computational tools that help analyze large volumes of data about health, demographics and violence is used in studies of social processes that are then applied in public policies.\u00a0 \u201cSocioeconomic and demographic data analyses are often used in urban planning strategies.\u00a0 Digitization of data on migratory waves feeds studies that help understand future trends in immigration,\u201d says the IC-Unicamp researcher.<\/p>\n<p>An example of the growing involvement of the social sciences in Big Data in Brazil can be seen at the Center for Metropolitan Studies (CEM), one of the Research, Innovation and Dissemination Centers (RIDCs) funded by FAPESP. One focus of the center is to produce and disseminate georeferenced data on Brazilian cities.\u00a0 Public agencies generated data that ended up not being made available and the information was appropriated by companies, which charged to provide them.\u00a0 The CEM purchased several databases and digitized others, making them available <a href=\"http:\/\/fflch.usp.br\/centrodametropole\" target=\"_blank\" rel=\"noopener noreferrer\">on its website<\/a>. At first, the collections were not large enough to be associated with the notion of Big Data. This changed a few years ago when the center developed a database tailored towards a large research effort on the study of patterns of inequality in the last 60 years.\u00a0 Significant work was required to provide consistency to questionnaires and correct the gaps in a 1960 Census sample whose punch cards had been lost, and to reorganize the information from five later censuses to generate comparable data. \u00a0\u201cThis generated a multi-terabyte database of information, at a volume much larger than what is traditionally seen in Brazil\u2019s social sciences,\u201d says Eduardo Marques, who was CEM director from 2004 to 2009. The effort led to the book entitled <em>Trajet\u00f3rias das desigualdades<\/em> \u2013<em>Como o Brasil mudou nos \u00faltimos 50 anos<\/em> (Editora Unesp, 2015) [Paths of Inequality in Brazil: A Half-Century of Change], edited by current CEM Director, Marta Arretche, containing chapters written by experts on topics such as education and income, demographics, labor markets and political participation. Each chapter required specific processing of data.<\/p>\n<p><strong><a href=\"http:\/\/revistapesquisa.fapesp.br\/wp-content\/uploads\/2017\/12\/018-humanidades-digitais_04.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-249887\" src=\"http:\/\/revistapesquisa.fapesp.br\/wp-content\/uploads\/2017\/12\/018-humanidades-digitais_04-959x1024.jpg\" alt=\"\" width=\"300\" height=\"320\" \/><span class=\"media-credits-inline\">Z\u00e9 Vicente<\/span><\/a>London in the fight against crime<\/strong><br \/>\n<em>Tools explore data on 197,000 trials<\/em><\/p>\n<p>Records on 197,000 trials conducted between 1674 and 1913 by London\u2019s Central Criminal Court, commonly referred to as Old Bailey, which is the name of the street on which the court is located, were made available for consultation on the Internet back in 2003 at oldbaileyonline.org. The challenge posed by the task of identifying phenomena and trends buried in a volume of information approaching 127 million words mobilized researchers from the United Kingdom and the United States to develop ways to tap textual data that are much more sophisticated than performing a search of the repository.<\/p>\n<p>The project known as Data Mining with Criminal Intent, funded in 2009 under the initial call for proposals for the Digging into Data project, scoured the records of Old Bailey with the help of a combination of digital tools.\u00a0 One of them is Zotero, which allows for the collection and organization of information, and the other, a portal called TAPoR that helps users analyze writings through a variety of software.\u00a0 The strategy has led to some interesting results.\u00a0 It was possible to see, for example, that the word \u201cpoison\u201d was much more commonly associated with \u201ccoffee\u201d than with \u201cfood,\u201d as an indication of how Londoners were murdered by poisoning.<\/p>\n<p>By the same token, one notes that punishments for bigamists became less severe throughout the 19<sup>th<\/sup> century.\u00a0 According to Stephen Ramsay, a professor of English at the University of Nebraska-Lincoln, one of the leaders of the initiative, the project\u2019s contribution is not limited to obtaining previously unnoticed historical evidence.\u00a0 \u201cThe stories of Old Bailey express the darker motivations behind the human condition, such as revenge, dishonor and loss, which is the raw material of the humanities,\u201d he said, according to <em>The Chronicle of Higher Education<\/em>.<\/p>\n<div id=\"attachment_249884\" style=\"max-width: 310px\" class=\"wp-caption alignright\"><a href=\"http:\/\/revistapesquisa.fapesp.br\/wp-content\/uploads\/2017\/12\/018-humanidades_abre_praca.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-249884\" src=\"http:\/\/revistapesquisa.fapesp.br\/wp-content\/uploads\/2017\/12\/018-humanidades_abre_praca-300x197.jpg\" alt=\"\" width=\"300\" height=\"197\" \/><p class=\"wp-caption-text\"><span class=\"media-credits-inline\">Hildegard Rosenthal\/Moreira Salles Institute Collection<\/span><\/a> The city in the 1940s, when it reached its first million inhabitants<span class=\"media-credits\">Hildegard Rosenthal\/Moreira Salles Institute Collection<\/span><\/p><\/div>\n<p><strong>How S\u00e3o Paulo became urbanized<\/strong><br \/>\n<em>Platform will assemble georeferenced data about the transformation of S\u00e3o Paulo\u2019s capital city from 1870 to 1940<\/em><\/p>\n<p>S\u00e3o Paulo became urbanized at a faster rate than other cities, growing from only 30,000 inhabitants in 1870 to one million in 1940. The study of the city\u2019s transformations during this period will be backed up by a platform of georeferenced information supplied by numerous sources, such as theses, reports and maps.\u00a0 Any researcher who has data and can relate it to an address in the S\u00e3o Paulo capital is invited to include it in the Pauliceia 2.0 platform, whose design was opened to suggestions from potential users on April 4 2017.<\/p>\n<p>The project, which brings together researchers from the Federal University of S\u00e3o Paulo (Unifesp), the National Institute for Space Research (INPE), the S\u00e3o Paulo State Public Archives and Emory University, is funded by the FAPESP research program in eScience. \u201cAnyone who has studied S\u00e3o Paulo\u2019s hotels could add information about them to the addresses.\u00a0 Anyone who has studied crimes committed in the city can do the same for that data.\u00a0 Any information that can be referenced in the space can be added to the platform,\u201d says historian Luis Ferla, the Unifesp professor who coordinated the project.<\/p>\n<p>There is one project team that is dedicated to developing a database of the numbering on buildings of that time to ensure that data localization is reliable. \u201cIt is such complex work that it is first being tested in a pilot area, in downtown S\u00e3o Paulo,\u201d Ferla explains. A preliminary version of the platform will be available for testing in July 2018.\u00a0 \u201cAnyone who wants to study this period will find a lot of material on the platform to use in their analyses.\u00a0 The project seeks to curate knowledge about the city\u2019s urbanization.\u201d More information is available at <a href=\"http:\/\/unifesp.br\/himaco\" target=\"_blank\" rel=\"noopener noreferrer\">unifesp.br\/himaco<\/a>.<\/p>\n<div id=\"attachment_249889\" style=\"max-width: 310px\" class=\"wp-caption alignleft\"><a href=\"http:\/\/revistapesquisa.fapesp.br\/wp-content\/uploads\/2017\/12\/018-humanidades-vieira.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-249889\" src=\"http:\/\/revistapesquisa.fapesp.br\/wp-content\/uploads\/2017\/12\/018-humanidades-vieira.jpg\" alt=\"\" width=\"300\" height=\"421\" srcset=\"https:\/\/revistapesquisa.fapesp.br\/wp-content\/uploads\/2017\/12\/018-humanidades-vieira.jpg 500w, https:\/\/revistapesquisa.fapesp.br\/wp-content\/uploads\/2017\/12\/018-humanidades-vieira-120x168.jpg 120w, https:\/\/revistapesquisa.fapesp.br\/wp-content\/uploads\/2017\/12\/018-humanidades-vieira-250x351.jpg 250w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><p class=\"wp-caption-text\"><span class=\"media-credits-inline\">Reproduction of oil on canvas, by an early 18th-century unknown author \/ Wikimedia Commons<\/span><\/a> Writings by Father Ant\u00f4nio Vieira (1608-1697) are part of the collection<span class=\"media-credits\">Reproduction of oil on canvas, by an early 18th-century unknown author \/ Wikimedia Commons<\/span><\/p><\/div>\n<p><strong>A historical corpus of the Portuguese language<\/strong><br \/>\n<em>Database containing 3.3 million words assembles annotations on writings from various eras<\/em><\/p>\n<p>Collaboration with computer scientists has occurred more naturally in some fields of the humanities than in others.\u00a0 One example is the studies about changes in the use of language. Charlotte Galves, a professor at the Institute of Language Studies of the University of Campinas (IEL-Unicamp), often says that she became devoted to the digital humanities long before she knew there was such a thing.\u00a0 In 1998, she began to compile 16<sup>th-<\/sup> to 19<sup>th<\/sup>-century writings to put together a historical corpus of the Portuguese language, a database of texts with morpho-syntactic annotations of words and sentences that had already served as a basis for a series of studies about the history of the Portuguese language in Portugal and Brazil. \u201cIt is now possible to observe how the language has changed over the centuries, particularly in Brazil, which has increasingly distanced itself from European Portuguese as a result of its contact with other languages, despite being influenced by it again during the second half of the 19<sup>th<\/sup> century,\u201d says Galves.<\/p>\n<p>The database has continued to grow and now contains 3.3 million words from 76 original documents.\u00a0 Named Corpus Tycho Brahe, in reference to the 16<sup>th<\/sup>-century Danish astronomer who proposed documenting the movement of the planets, the collection used its first word-labeling tools developed by computer scientist Marcelo Finger, a professor at the Institute of Mathematics and Statistics of the University of S\u00e3o Paulo (IME-USP). The database grew slowly\u2014corrections to the automatic notations were made by Galves herself, with the help of postdoctoral researchers and students she advised. \u201cI learned a lot about Big Data, but I couldn\u2019t do without the help of computer scientists,\u201d she says.\u00a0 The next step is to make the database fully accessible on the Internet.\u00a0 It is currently possible to download the collection at: <a href=\"http:\/\/tycho.iel.unicamp.br\/corpus\" target=\"_blank\" rel=\"noopener noreferrer\">tycho.iel.unicamp.br\/corpus<\/a>, but not to search online.<\/p>\n<p>The same model of historical Portuguese is now being used by Galves and Filomena Sandalo, also a professor at Unicamp, for a study of an indigenous language, <em>Kadiw\u00e9u<\/em>, spoken by an ethnic group in the Brazilian state of Mato Grosso. Oral accounts by indigenous people were collected and are being converted into annotated texts.\u00a0 \u201cThe idea is to use the same platform to create the corpora for other languages, using the same tools,\u201d Galves explains.<\/p>\n","protected":false},"excerpt":{"rendered":"Analysis of data expands the field of activity for the humanities","protected":false},"author":11,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[156],"tags":[219,214,256,261,265],"coauthors":[98],"class_list":["post-249883","post","type-post","status-publish","format-standard","hentry","category-cover","tag-computation","tag-political-science","tag-public-policies","tag-sociology","tag-urbanism"],"acf":[],"_links":{"self":[{"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/posts\/249883","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/users\/11"}],"replies":[{"embeddable":true,"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/comments?post=249883"}],"version-history":[{"count":0,"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/posts\/249883\/revisions"}],"wp:attachment":[{"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/media?parent=249883"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/categories?post=249883"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/tags?post=249883"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/coauthors?post=249883"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}