An unprecedented program, capable of being used in any given computer, promises a better future for those who work with documents filing such as notary offices, public organs and municipal town halls, for example. The application is going to automate a great part of the present costly and complicated process of filing, organizing and updating of certificates, lawsuits, acts and other types of paper documentation that register and tell the life of citizens, companies and public administration. Starting out from a scanner, the new program will register the papers, mainly those old papers printed on typewriters, and place them on the screen of a computer that runs on the popular Microsoft Windows.
Belonging the family of automatic recognizers of characters, known by the acronym OCR, Optical Character Recognition, the new software is being prepared by the company Carta Consultoria form São Paulo through a project of the Small Business Innovation Research (PIPE in the Portuguese acronym) of FAPESP. In the first phase, the version that should arrive by the end of the year on the market will be attend to the jobs of a notary’s office. These establishments are important issuers and custodians of certificates, legal documents and deeds and daily receive many information request. In the city of São Paulo alone, the notary offices carry out an average of 7,000 searches for documents every day. At the end of a month, they sum up 140,000 consultations, a situation that as yet cannot count upon an efficient tool for the searching of archives through a system of computing.
In order to have a grasp of what kind of breakthrough the new program means, much more specialized than the similar ones available in the market, the content of several rooms full of papers could be piled up in a fistful of diskettes. But it is a parameter still difficult to be reached because the OCR program of the Carta Consultoria does not intend, in any way, to replace any of these documents. Legally, this is still not possible. But the software is supposed to organize and to make access easier to all of these papers, both for office clerks and providing better service to the general public. Thus, paper documents that go around in these places could be quickly transformed into digital documents.
The development of the product was under the coordination of Felício Sakamoto, Projects Manager of Carta. The company is a computer systems consultancy, mainly in the area of documentation, geoprocessing and mapping. During the elaboration phase of the software, Carta had important collaboration, in the form of consultancy, of the Institute of Mathematics and Statistics of the University of São Paulo (IME-USP), through its Department of Computer Science. “Armed with the information gotten from the people at IME, which included a prototype and indications of the real technical possibility for the product, we began researching the market, defining also the financial feasibility for the launching of new software.”, says Mr. Sakamoto.
Scanning the market made the company realize that the documents stored in notary offices had to be the first lot to have a version for the product. Even though just recently these establishments came to enjoy the advantages of digital filing, the greatest chunk of the documents remained stored in typical books and every time that there was a request for an alteration in the content, it was necessary to redo all of the document. “The job of retyping is very long.”, explains Mr. Sakamoto. However, why dedicate yourself to a specific product, if there are powerful commercial OCR programs?” We consulted some people responsible for the notary offices about the products available on the market and they affirmed that the performance of these products was weak and due to that they had dumped them.”
Types of paper
The innovation of the Carta’s OCR system is the specialization as the common OCRs do not permit a pre-selection of the structure of the documents. “We chose some categories of papers and is this way we won out on performance.” explainsprofessor Routo Terada, a professor of Computer Science at USP and one of the main consultants involved in the project. Therefore, the new program began to be developed tackling the problems in the market. “We made use of the experience of IME and we developed a much more versatile tool.”, explains professor Junior Barrera, a professor at IME-USP. The following stage would demand a long and patient process of programming and of testing that, in the end, would take up almost 80% of all of the period dedicated to the project which had begun in 1997.
From the technical point of view, there are several ways for a computer program to make the transformation of images captured by a scanner into text characters. In practice, it is only necessary to understand that the mathematical and geometrical concepts, followed afterwards by statistical rules, are used in this specific case to map out and to work on each tiny bit of the pre-digitalized documents. From the computer’s point of view , any image is no more than a sequential accumulation of numbers.
Nevertheless, when facing about old documents the story changes. With all of its blotches, rubber stamp, stains and the typically imperfect printing of old typing machines, even the most praised commercial programs of OCR did not achieve a satisfactory efficiency. “We carried out some tests with recent documents already digitalized and the best of the existing software got to 90% and ours to 98%. In a second document, much older and produced by a typewriter, the level reached by the commercial program was very poor, close to 69%, while ours reached to 90%.”, proudly says Mr. Sakamoto.
In the final leg of the preparationof the program, Carta is making the final technical adjustments and making an effort at marketing and shaping the product aimed for the notary offices. They find it difficult to calculate figures about possible sales and revenues forecasts. However, they can easily estimate its potential use. Within the universe of 5,000 municipalities throughout the country, it is a good guess that there is at least one notary’s office in each of these locations. A market lacking in efficient innovations for the filing and the organizing of documents.
The Application of Computer Learning and Mathematical Morphology in OCR of Documents (nº 97/07325-8); Modality Program of Innovative Technology in Small Companies (PIPE); Coordinator Dr. Felício Sakamoto – Carta; Investment R$ 75,000.00