What may seem like magic today might become a common sight in a few years. In any given airport, a visually impaired traveler points his smartphone at the arrivals and departures board. The gadget immediately starts reciting the flights listed on the display. The scene is repeated in train stations and bus stops that have timetables on display. In corporate environments, blind individuals easily find out what soft drinks, juices, chips, or other snacks are available in a vending machine. It will all be made possible by a new technology designed by IBM Research Brazil, the research laboratory operated in the city of São Paulo by the US-based IT multinational. The app, called Marker-Assisted Recognition of Dynamic Content, uses computational vision, artificial intelligence, and image processing to recognize texts and objects in public environments.
“The innovation in relation to similar image recognition apps is the use of markers,” says Andréa Mattos, the young IBM scientist who headed the app’s development. The markers, a set of four stickers depicting different images, are placed on the upper and lower corners of the target object. “They are reference points that enable the app to detect and identify the objects in the scene,” says 28-year-old Mattos.
In an airport, for instance, a blind person would only require assistance locating an arrivals and departures board that has been delimited by the markers. Then, by pointing a smartphone or tablet camera at the board, the traveler would be able to check if a certain flight was on time. If the user has trouble framing the flight board properly in the camera display – a necessary condition for the app to work and for reading the visual information and converting it into audio –, the app would help out by saying things like “move your camera to the right” or “raise your camera slightly”. “Each marker has a precise position in relation to the others. It is possible to give instructions for correct framing so long as one of the four markers has been captured by the smartphone camera,” Mattos explains.
For the app to work, the objects or texts to be recognized must also be arranged in a specific layout. The messages on a flight board change frequently, as do the products offered by a vending machine. What must remain constant are the positions in which the products or information are displayed. The app automatically searches its memory for a “template” of that scene, a type of mask with fixed positions marking the space where the texts or images to be recognized are located. A vending machine template is simply a diagram of the slots where the products are inserted; a flight board template delimits the spaces where flight information is displayed.
The software then proceeds to identify and read the content. For vending machines, this is done by comparison. The app’s memory contains an image bank with photographs of every product sold by a particular machine – cans of soft drink A, bags of potato chip B, packets of cookie C, etc. It compares the products captured by the user’s camera with the stored photos and then speaks the names of the items on display. In the case of a board or panel with written information, the app recognizes the letters and number and reads its findings out loud to the user.
Mattos conducted a series of tests on vending machines to prove the technique’s viability. Sixty photographs were taken to measure the app’s efficiency, making up a total of 240 markers, given that there are four markers on each machine. The detection rate was 99.16%. The products in the machines were correctly recognized 89.85% of the time, a high rate according to Mattos, considering the challenges of the task at hand.
Blind or visually impaired
One of the advantages of the innovation, whose development team also included IBM researchers Carlos Cardonha, Diego Gallo, Priscilla Avegliano, Ricardo Herrmann, and Sérgio Borger, is to give to the blind or visually impaired more independence. The work won an award at the 11th Web for All Conference, which recognizes the world’s best projects relating to accessibility and the internet. The event was held in April 2014 in South Korea. The technology has been submitted to the United States Patent and Trademark Office (USPTO). It was only one of 19 patent requests filed with the USPTO by IBM Brazil in the first half of 2014.
But this is not the world’s first or only computational vision technology for image recognition. Bar coding is also a promising technique. Bar codes affixed to products can already be read by scanning apps installed on smartphones. But their usefulness is limited when the content is dynamic – such as the arrivals and departures on a flight board, which change constantly.
“Many groups around the world are trying to design apps that can recognize objects, but in the literature involving computational vision, we found no technologies similar to ours, which can recognize products in uncontrolled environments, that is, those that are subject to variable lighting and a variety of visual interferences,” says Sérgio Borger, IBM Systems of Engagement research manager. “We will conduct new tests to assess the usability of our app,” says Borger.