Digitized files

Recovering knowledge

Digitizing archives brings to light rarities and forgotten documents and helps perfect the work of researchers

Digitization of books at the USP Brasiliana Library: 4,800 volumes from the book collector José Mindlin's collection are now available on-line

LÉO RAMOSDigitization of books at the USP Brasiliana Library: 4,800 volumes from the book collector José Mindlin’s collection are now available on-lineLÉO RAMOS

The increasing number of projects related to digitizing the collections of libraries, archives and museums is changing the way Brazilian researchers work.  Over the past 15 years, several institutions have begun making documents, photographs and videos that could previously only be consulted during scheduled visits available on-line. The result of this effort is significant. In some cases, the ease of searching and finding items using search engines expanded access to information that would have been difficult to obtain manually, thereby enhancing the quality of research. In other instances, digitization allows researchers to become familiar with a given collection remotely in order to more quickly and efficiently consult it later, in person. “Students and researchers are being trained in this new context. It is a path of no return,” says historian Pedro Puntoni, professor at the University of São Paulo (USP) and researcher in the Digital Culture Department of the Brazilian Analysis and Planning Center (Cebrap). “Libraries and physical collections will always be important, but they are losing ground to the Internet due to the ease of access to documents, images and books, as well as theses and digital magazines available online,” he says.

In mid-March 2015, from a personal computer in his home, professor and researcher Wilton José Marques ran across a forgotten poem by one of the leading names in Brazilian literature, the writer Machado de Assis (1839-1908), known mainly for his short stories and novels. It was not just any poem, but the first, published in the Correio Mercantil newspaper, in Rio de Janeiro on September 9, 1856, entitled “O grito do Ipiranga” (The cry from Ipiranga). Experts on Machado de Assis’s work had believed that his production began in 1858 with the poem “Esperança” (Hope), when the author began working as a proofreader at that newspaper at age 19. Marques’ discovery was made possible by the digitization of the Correio Mercantil by the Brazilian National Library, which established its digital newspaper library in 2009. Marques, a professor in the Languages and Literature Department of the Federal University of São Carlos, says that he found the poem thanks to “a little bit of instinct and a little bit of luck.” He was looking for the author’s first poems for a study on romantic influences in his work. “I was checking sources, looking at each poem from the collections starting in 1858. Out of curiosity, I decided to search for previous years and ‘O grito do Ipiranga’ appeared,” he says. Marques decided to set aside his original research topic in order to work on an article about the poem. “It is a long poem glorifying the cry of independence and Dom Pedro I. One of the features of Machado de Assis’s work is intertextuality: he dialogs with other works and historical references. In this first poem, he constantly compares independence with the Roman republic,” says Marques.

Another of the researcher’s interests is to shed light on Machado de Assis’s youth. “Imagine how, in a slave society, a 17-year-old black youth with a formal education—and we do not know exactly how it was obtained—managed an entrée into the intellectual universe of Rio de Janeiro and worked for an important newspaper.”

The National Library has one of the oldest digitization programs in Brazil. It began in 2006 and today provides 900,000 documents on-line and receives 400,000 virtual queries per month. There are collections of photos, maps and music. Last month, the Brasiliana Photo portal was launched with more than 2,000 historical photos from the library’s collections and from the Moreira Salles Institute. Most of the digital archive consists of Brazilian newspapers. There are 5,000 titles, digitized with funds from the Brazilian Innovation Agency (Finep). “We have the right of legal deposit, which means that we receive a copy of all publications produced in the Brazil. Therefore, our collection is the most comprehensive in the country,” says Angela Bettencourt, coordinator of the National Digital Library. The decision to make the newspapers available was also due to practical concerns: the physical collection was the institution’s most frequently consulted archive.

"O grito do Ipiranaga," a lost poem written by Machado de Assis and published in the newspaper Correio mercantil, was discovered thanks to the Brazilian National Library's digital newspaper archive

reproduction, Correio Mercantil O grito do Ipiranaga,” a lost poem written by Machado de Assis and published in the newspaper Correio mercantil, was discovered thanks to the Brazilian National Library’s digital newspaper archivereproduction, Correio Mercantil

Consulting the newspapers is simple—searching on a word brings up everything related to it. For researchers, their usefulness goes far beyond the hypothetical chance of finding a forgotten poem. Sociologist Benno Warken Alves, 25, completed his master’s thesis at USP in 2014 with a FAPESP grant. His topic was a 20th century black entrepreneur in Curitiba, Sydnei Lima Santos (1925-2001). Alves obtained references on the history of the businessman’s ancestors by consulting the electronic libraries of newspapers from Sergipe, Rio de Janeiro and Paraná. “If I had had to search microfilm, I would not have had enough time or may not even have found the information.”

While a digital newspaper library is sufficient for finding and checking data, when researching historical documents one must normally access the physical file, although access to a digitized version can accelerate this task. Architect Suely Figueirêdo Puppi has been using the digital collections of the Lina Bo and Pietro Maria Bardi Institute for her PhD dissertation at the Federal University of Rio Grande do Sul on the restoration and design techniques of Lina Bo Bardi (1914-1992), the Italian-Brazilian architect known for designing the São Paulo Museum of Art (MASP). “Bardi’s designs and drawings available on the site do not have the required definition for publication in a book, but for a dissertation they are sufficient,” says Puppi. She has visited the Institute’s headquarters in São Paulo several times since beginning her PhD in 2012 in order to analyze documents. “I do pre-selection on-line so that I will know what I need when I visit,” she affirms.

The cataloging and digitization of the archives were also useful during Bardi’s centennial celebration. “The drawings were very helpful to the curators of exhibitions of her work in Zürich and Munich, as well as two exhibitions in New York in which Bardi was represented. The initial evaluation of the collection was done remotely,” says Renato Anelli, professor of the USP Institute of Architecture and Urban Studies in São Carlos and the researcher responsible for the digitization project. Bardi’s designs and drawings are the strength of the digital collection. “She colored and drew scenes and perspectives on architectural plans. They are fabulous drawings,” says Anelli. The database allows searching of the cataloging data on photos and documents written by the architect. The institute began organizing the collections of Pietro Maria Bardi (1900-1999), who founded MASP. Pietro and Lina were married for 45 years.

Digitization of historical maps in the São Paulo State Public Archives: taking care to provide metadata for documents

Léo RamosDigitization of historical maps in the São Paulo State Public Archives: taking care to provide metadata for documentsLéo Ramos

The main limitations of digital collections are the inability to provide the sensory experience of seeing or handling a historical document and also the difficulty in providing all necessary information needed to contextualize the circumstances in which each document was produced and stored. “Documents in public collections have a special feature. They make sense within the context in which they were produced. Outside of this context, they cannot be understood in their entirety,” says Marcelo Chaves, director of the Diffusion and Research Support Center of the São Paulo State Public Archives. Extracting more complete information from an archive document, he explains, requires understanding its background, why it was produced and where it was circulated. “Without this, the document loses some of its informative potential,” he explains. The work of historian Bruno de Andréa Roma, who worked at the Archives as an intern for two years, demonstrates this challenge. During his undergraduate studies at USP he produced, under the guidance of Professor Carlos Bacellar, former Archives coordinator, a guide to the sources of everything on the University of São Paulo that can be found in that huge collection. “Guides and inventories are key tools for creating awareness of what is available on a subject in a given collection, in addition to providing information on the history of the documents,” said Roma, who is now working on a master’s degree on photography in the archive environment. “The place of photographs is not as well defined as that of other documents. In the State Archive, the negatives belonging to collections such as that of the newspaper Última Hora are in one location, but the contacts and enlargements, which provide information on their use, are in another, and there is no link between them.”

Marcelo Chaves stresses that digitization was essential to the democratization of information and, without digitization, it would not be possible to understand today that making documents available on the Internet is not enough to make them truly accessible. “Digitization policies are irreversible, but they must be based on criteria,” he says. Recently, Chaves was tasked by the Archive with advising city governments in São Paulo about digitization and encountered disastrous situations. “Many mayors have been convinced by specialized companies that digitization is a benefit in itself and they end up spending a lot of money converting unorganized physical files into inaccessible digital collections,” he says.

In recent years, the Public Archives of the State of São Paulo decided to make its collection available on the Internet, with the flagship archive being the São Paulo State Department of Political and Social Order (DEOPS), the main agency of the São Paulo state political police, abolished in 1983. Two years ago it launched the Memory and Resistance portal, which allows on-line searching of more than 314,000 index cards and 12,800 police records—a total of 1 million images—produced by political surveillance agencies between 1924 and 1999. The archive’s technical staff is analyzing the results to determine which part of this work needs to be corrected or even redone. This is because, due to the bulk manner in which digitization was done, important information on the documents offered is not available to users. This might not be a problem for someone who just wants to find out what appears under their name or that of a family member within the archive. However, for researchers, the missing information could bias results, notes historian Marcelo Quintanilha Martins, director of the archive’s Permanent Collection Center. The DEOPS collection is made up of three distinct archives, from its three specialized police agencies: Social Order, Political Order, and Secret Service. Documents are frequently interrelated. But this is not always apparent in on-line searches. “For example, DEOPS communicated with the United States FBI, from which it received memorandums related to arrest warrants. But this context is not available in the scanned documents.” For the approximately 50,000 images currently scanned by archive staff every month, the so-called metadata, or information describing what can be found in the documents, is catalogued using open access software called ICA-AtoM, which allows storage of a comprehensive set of data to help prevent the loss of context. The idea is to revisit documents that have already been digitized and fit them into that standard.

Sesc Pompeia Theater, in a drawing by Lina Bo Bardi: site contains collection of drawings by the modernist architect

ACERVO INSTITUTO LINA BO EP. M. BARDI Sesc Pompeia Theater, in a drawing by Lina Bo Bardi: site contains collection of drawings by the modernist architectACERVO INSTITUTO LINA BO EP. M. BARDI

The standardization of data is not a discussion restricted to archives and libraries. Giselle Beiguelman, professor at the USP School of Architecture and Urban Studies and organizer of the book Futuros possíveis—Arte, museus e arquivos digitais (Possible futures—Art, museums and digital archives), published with support from FAPESP, calls attention to the “on-line collector,” the individual who makes documents, images or videos of historical value available on the Internet. “Through generosity and investment of time they provide family photos, personal memories or videos on YouTube with TV shows from the 1970s and historical films from as far back as the early 1920s, which could be useful for researchers, but would require standardization of their cataloging procedures and metadata,” says Beiguelman. She mentions the problems related to “memory corporatization,” which is the archiving of images and documents on private platforms such as Flickr and YouTube. “Each company organizes documents in a different way and there is a chance that they will be taken off-line at any time, depending on the company’s interests. There is no discussion on standardized metadata systems that everyone, including private platforms, could use,” she states. The idea of creating standards could be implemented, she says. “The Internet is the best example. It works based on protocols and the rule-based use of common symbols that everyone follows, such as the @ in e-mail addresses.” One of the book’s articles, written by Monika Fleischmann and Wolfgang Strauss, of the Mars-Exploratory Media Lab, in Germany, cites the concept of a semantic map of knowledge as a possibility for recording and viewing a set of information about a document. “The individual entries for the files are located relationally and semantic relations are displayed,” say the authors.

Digitization of collections has advanced all over the world since the early 2000s. “In the United States, universities are at the helm of the library digitization process,” says Pedro Puntoni, who headed the USP Guita and José Mindlin Brasiliana Library between 2007 and 2014 and developed the Digital Brasiliana Library. “In contrast, the European Union funded a consortium to strengthen its virtual libraries, led by Gallica, the National Library of France.” Brazil, notes Puntoni, pioneered to some extent in the last decade when it cataloged, microfilmed and digitized about 3 million pages related to the first 300 years of Brazilian history belonging to the Overseas Historical Archive, in Portugal. The work was coordinated by Esther Bertoletti, of the Brazilian National Library. But the digitization of collections in Brazil was hampered by the difficulty in transforming projects into permanent programs. “In many cases, teams are contracted for specific time periods and disperse when there are breaks in the flow of funds or changes in the management of the institution,” says Puntoni.

Brazilian flag used during the monarchy, from the digital archives of the Imperial Museum

Imperial Museum CollectionBrazilian flag used during the monarchy, from the digital archives of the Imperial MuseumImperial Museum Collection

In 2009, FAPESP published a call for proposals for the Research Infrastructure Program, specifically for the Museums and Information, Document and Biological Collection Depositary Centers. Twenty projects were selected, and many of them focused on the organization, digitization and provision of documents on-line. Grants were awarded to the Lina Bo and Pietro Maria Bardi Institute, the São Paulo State Public Archives, the Lasar Segall Museum, the USP Institute of Brazilian Studies, and other institutions. Organizations such as the Brazilian Development Bank (BNDES), Petrobras and Finep also provide resources for digitization projects. A group of institutions committed to digitization policies in Brazil joined together to form the Memorial Network in order to share experiences and help organize collections, as well as develop a search tool that unifies searching of the digitized collections of all national libraries.

“In the last decade a good deal of funds have been made available for digitization projects in Brazil, while at the same time equipment has become less expensive,” says Millard Schisler, a researcher and consultant in the field of digitization and digital preservation. However, in his opinion, the race to provide on-line access to collections created distortions. “A common situation was for an institution to invest heavily in digitization, but little in preserving the original documents, which were stored in a precarious manner,” says Schisler. Today, we seek sustainable strategies. “Instead of wanting to scan everything, in high definition, it is more reasonable to digitize the most sought after elements of a collection, putting aside some funds for the preservation of originals. Digitization should be seen as a complement, not as the main strategy.”

Photograph of clothing worn by Emperor Pedro II, image available on-line

Imperial Museum CollectionPhotograph of clothing worn by Emperor Pedro II, image available on-lineImperial Museum Collection

The Imperial Museum, in Petropolis, is an example of the challenges involved when digitizing collections. The undertaking began in 2009. Today, 8,000 items are available on the institution’s website, including documents, books and images of museum items, equivalent to 3% of the collection, which are accessed 2,000 times per month. “The intention is to digitize the entire collection,” says historian Jean Bastardis, coordinator of the team hired by the society of friends of the museum and tasked with searching for the documents, describing them, organizing the database and making the material available on the website. The priority is to scan complete sets of collections in order to facilitate the work of researchers. Despite six years of experience, maintaining the digitization process has been a bumpy path. The team is hired project-by-project. Every year they must repeat the challenge of obtaining resources from companies or through public announcements of funding opportunities.

One of the most sought-after items on the site is a cookbook from the 19th century, O cozinheiro imperial (The Imperial Cook), which describes dishes typical during that era and includes the menus for banquets. Researchers are more interested in the archives of the Imperial Household, with documents produced in Portugal dating back to the 13th century. The museum’s experience shows that promotion is essential. “When a report on some item is published, the demand for it soars,” says Bastardis.

The United States Library of Congress, which has more than 20 million digital objects available on-line, realized that making this material available electronically is not enough to disseminate knowledge. Therefore, it developed a program under which its technical personnel visit cities in order to teach local professors and librarians how to use its archives. Without proper outreach programs, on-line collections do not reach all of their potential audience. The digitalization of 100 hours of drama from the defunct TV Tupi, stored in the Cinemateca Brasileira, made possible studies on soap operas that had not been feasible before then. However, the material, available on the site, is still little explored, says Esther Hamburger, a professor at the USP School of Communications and Arts. She was the researcher responsible for the project that cataloged and digitized the 100 hours of video, and has been studying the material herself.

According to Hamburger, the problem is that the Tupi archives are still kept on a site with low visibility. “A network structure capable of allowing many users to access the material is needed. The database is solid in this respect, but the project needs to be finalized,” she says. The television content needs to be more available on-line, says the researcher. “Countries such as France and Sweden make TV content that has already aired available through their national libraries,” she states. She observes that Brazil still does not have a tradition of historical research using material that aired on TV.

Institutions have also learned that a digitization project, once complete, is not really over. One must invest in maintenance and technological upgrades. “In the case of movies, maintaining a digital archive can cost up to 10 times more than maintaining physical files,” says consultant Millard Schisler, citing the study The digital dilemma, published by the Science and Technology Council of the Academy of Motion Picture Arts and Sciences, Hollywood, which compared the maintenance costs for movies in film and digital formats. Databases must be developed and maintained, software upgraded, and the condition of the documents must be monitored. Every time an archive migrates to another server or digital formats are upgraded, up to 10% of documents may be lost, and an administrator must recover them from the originals.

Inaugurated at USP in March 2013, in 2014 the Guita and José Mindlin Brasiliana Library incorporated the volumes digitized by the USP Brasiliana Project and established its digital collection. Now, 4,800 of the 32,000 volumes in the collection have been digitized. They consist of rare books, manuscripts and periodicals collected over a period of 80 years by businessman José Mindlin (1914-2010). The library’s digitization work is now being upgraded. What happened is that most of the books available on-line were digitized and then subjected to a process in which color images were converted into just two tones. The process created something similar to a photocopy of the original, in which the text appears against a white background. The decision was made at the time by the USP Brasiliana Project in order to generate cleaner images and smaller files, which would facilitate access by users with slower Internet connections and economize ink when the files were printed by users. Today, explains Jony Fávaro, the library’s digitization specialist, the method is different. We want to digitize the works while preserving their characteristics, such as the yellowing of the pages, signs of use and a faithful reproduction of the cover. “For a user interested only in the text of the work, this might not make a difference,” says Sandra Guardini Vasconcelos, director of the library and Professor in the Modern Languages Department of the USP Faculty of Philosophy, Languages and Literature, and Human Sciences. “However, in the eyes of a historian, a faithful photographic reproduction contributes important elements about the publishing history of that work.” The library plans to re-digitize the first works, although the current priority is make available those works that are still undigitized. “Such concern for the book in its material form, as a physical object, is the view taken by important institutions such as the British Library and the Bibliothèque Nationale de France,” says Vasconcelos.

A historic edition of the newspaper about the abolition of slavery: digitized collection will be made available on-line during the second half of 2015

LÉO RAMOSA historic edition of the newspaper about the abolition of slavery: digitized collection will be made available on-line during the second half of 2015LÉO RAMOS

A Redempção (Redemption) will be on-line

The archives of the abolitionist newspaper are recovered

In the second half of 2015 the Public Archives of the State of São Paulo will launch a site with 132 digitized editions of the abolitionist newspaper A redempção, which circulated in São Paulo from January 2, 1887 until promulgation of the Emancipation Law on May 13, 1888. It was the publication of the abolitionist movement of the “caifazes,” a group that rescued slaves and moved them to safe places. “It was a radical newspaper, with attacks on farmers, politicians and other newspapers, even abolitionist ones,” says Marcelo Quintanilha Martins, director of the archive’s Permanent Collection Center. The digitized collection came from the Historical and Geographical Institute of São Paulo, and most of the copies were in fragments. The restorers assembled the newspaper fragments using tweezers, even consulting microfilm copies of the issues held in the Lamont Library, Harvard University. Through this process, seven issues that were thought to have been lost were recovered. At the end of 2014, the collection was included as Brazilian Registered Heritage under the Memory of the World Program, sponsored by the United Nations Educational, Scientific and Cultural Organization (UNESCO). That program recognizes documents, archives and libraries of international, regional and national importance as part of the world’s heritage. The goal is to expand the dissemination of collections.

1. Quadruplex archives of the defunct TV Tupi (No. 2009/54923-7); Grant Mechanism: Infrastructure program; Principal investigator: Esther Império Hamburger (USP); Investment: R$ 446,934.77 (FAPESP).
2. Collection of the Instituto Lina Bo and P. M. Bardi: cataloging, scanning and building an on-line database (No. 2009/54901-3); Grant Mechanism: Infrastructure program; Principal investigator: Renato Luiz Sobral Anelli (USP); Investment: R$ 253,269.46 (FAPESP).
3. Preservation and dissemination of public memory: modernization and enlargement of the laboratories of the State of São Paulo Public Archive (No. 2009/54965-1); Grant Mechanism: Infrastructure program; Principal investigator: Carlos de Almeida Prado Bacellar (USP); Investment: R$ 1,692,982.33 (FAPESP).
4. Towards a Digital Brasiliana Library (No. 2007/59783-3); Grant Mechanism: Regular research project; Principal investigator: Pedro Luis Puntoni (USP); Investment: R$ 663,514.35 (FAPESP).