Numerous national and international assessments highlight the difficulty students have with written expression. It is one of the most vexing education issues in Brazil, and not just in Brazil. Having zeroed in on this issue, Professor Eliseo Reategui, from the Federal University of Rio Grande do Sul (UFRGS) School of Education, began a project in 2010 to address it. With the help of students he created a digital tool that can automatically extract the key concepts of a text and graphically show their level of importance and interrelationships. The process is known as text mining.
The tool, available in Portuguese and English, is called Sobek (name of an Egyptian deity symbolizing strength, devastation and reconstruction) and can be accessed for free on the university’s website (http://sobek.ufrgs.br/). Any text submitted to the tool is broken down into its main concepts, represented by a graph—diagram made up of nodes (concepts isolated in frames) and intersecting lines (connecting lines). The method is statistical, so the importance of concepts is measured by the number of times that the same word is repeated in the text. There are filters, however, that discard frequently used words that do not make sense in isolation, such as articles and prepositions.
The model used to extract the concepts is based on the Schenker algorithm, which was written in 2003. However, the Sobek tool offers a simplified representation that makes reading more concise and accessible. “My development of the tool always had education in mind, and it has been adapted to support reading and writing, even though it may serve other purposes, including commercial ones,” says Reategui. “Its strength lies in identifying the main themes found in the text.” This is the key to discovering deficiencies in cohesion and unity.
In this respect, the Sobek tool creates a useful learning environment for analyzing texts produced by students as well as working with existing texts during reading exercises. The tasks of evaluating and reorganizing concepts can be done by the students themselves, the teachers or by the two together—possibilities opened by the tool’s adaptability to different educational moments in time, from the initial learning period (after the initial phase of literacy) up to even postgraduate studies.
The student’s interaction with the system can begin before the graph is generated, during the production process, when the user selects and refines basic concepts—contrary to the usual way of working. An English teacher could, for example, wish to see prepositions in order to evaluate how students are using them, and then, in the adjustment stage of the graph remove and add concepts according to their relevance, which is done manually. This phase allows users to refine the text analysis according to their specific purposes. “From an educational standpoint, if the first graph fully met the expectations of the user, the process would not be so interesting,” says Reategui. “These periods of reflection allow students to understand the text in a deeper way, little by little building the network of relationships necessary for structuring the written text.”
Thus a structured, interactive process is established, allowing a constructivist approach to learning in accordance with the description of human intellectual development introduced by Jean Piaget (1896-1980). To the Swiss psychologist, knowledge is not transmitted. Learning is achieved by examining and reorganizing knowledge when one confronts a new situation. “Sobek tries to be very indirect and is essentially based on the reflective actions of the students, on their perspective and their experiences,” says Reategui.
Text mining
According to Reategui, in the United States the use of diagrams and other graphical methods of organizing texts in production usually forces students to focus on the phases of structuring and planning the text, during the process known as the pre-written stage. It rests on the idea that this is the main and conceptually more complicated work of creation. Reategui believes, however, that the customary use of text mining in Brazil, via Sobek, is more dynamic, because it causes a “shuttle” back and forth that translates into engagement by students.
The tool has been tested in several areas. Participating students or those nearby the Interdisciplinary Center for New Technologies in Education (CINTED) of UFRGS structured pilot projects in public schools. There were activities associated with literacy and the construction of texts, but also the tool was used as a resource, for example, to understand concepts in science classes (describing photosynthesis, for example). “Teachers at these schools become multipliers,” says Reategui. “We have no systematic monitoring of what is being done; that is an important step yet to be taken. We hope to create a community of online teachers to exchange experiences.”
Because it is available without restriction on the Internet, the use of Sobek is also unrestricted. Recently, a student from Spain contacted CINTED to present a version that he developed to mine texts in Spanish. And a Mozambican teacher, in an activity focused on storytelling for young children, had the idea that, instead of working only with generating concepts, the Sobek configuration could also be used to search for pictures on the Internet in real time. “Digital technology is part of everyday life and is fundamental to motivating children,” says Reategui. “Teachers need to constantly employ a variety of strategies to keep their students interested.”
Updates and refinements are also modifying Sobek itself. A group of student programmers are constantly working on this. Recently the entire interface of the environment was redesigned to make it more dynamic, and an application for use of the tool will soon be released to make it available on mobile devices when not connected to the internet.
Republish