Starting with technology available on the market, namely the Xbox Kinect sensor released by Microsoft for video games, researchers at the University of Campinas (Unicamp) have developed an experimental system that converts videos into sounds that help the visually impaired in their daily activities. The Kinect technology uses two cameras. One of them consists of a system that emits and captures infrared light that is recognized by an algorithm that calculates the depth—or distance from the camera—of each point in the image. Objects and users in the environment can be identified based on this data, without confusion with respect to their outlines and variations of light and color. The project developed at Unicamp, together with Microsoft researcher Dinei Florêncio, included the assembly of a prototype mounted on a skateboard helmet. A Kinect sensor with the two cameras was placed on top of the helmet—the remaining elements of the original equipment were removed. The sensor was connected to a laptop with powerful information processing capabilities, in addition to a gyroscope, an accelerometer and a compass that, together, record changes in where the head is facing.
The laptop is carried by the user inside a backpack. “The environmental data captured by the camera is processed by the computer, which supplies the user with the information via audio,” says Siome Klein Goldenstein, professor at the Unicamp Institute of Computing and coordinator of the Project, which is financed by FAPESP and Microsoft as part of the Research Partnership for Technological Innovation (PITE) Program. The user also benefits from headphones with bone conduction technology, in which audio is transmitted to the ears via conduction of sound through the bones of the skull. This frees the ears to hear other environmental sounds, and not just the audio feedback provided by the system, which would be a problem if normal headphones were used.
Goldenstein explains that various use scenarios for the technology were studied during project development. One of the applications that is already complete is a module for detection and recognition of people in a closed environment. People have to be registered before the system can recognize them, however. “We used a face localization technique and employed an algorithm to classify every person registered,” explains Laurindo de Sousa Britto Neto, a PhD student at the Unicamp Institute of Computing participating in the project. When the user enters a room in which previously registered people are present, the face detection and recognition module passes the information to the 3D audio module, which is configured to reproduce a sound representing the spatial location of each person in the room. “When hearing the name of each person, the user will know the exact position of that individual,” explains Goldenstein. “It is as if the sound came out of the head of the person identified by the detection module.” The objective is to allow the visually impaired person to turn her face in the right direction and communicate in the most natural way possible.
The 3D audio technique was a separate chapter in the development of the project because each ear reacts in a different way to external stimuli, according to Goldenstein. The 3D audio research was conducted by Felipe Grijalva Arévalo, an Ecuadorian who moved to Brazil in 2012 to pursue a master’s degree under visually impaired Prof. Luiz César Martini of the Unicamp School of Electrical Engineering and Computing, who works on research related to this handicap. Since each head has a different anatomy, I need to resort to a process called personalization to generate 3D audio. “The simplest approach is to use known measurements of the anatomy of the ear and, based on those, develop a customized model,” explains Arévalo. He researched the topic for his master’s thesis, co-advised by Goldenstein, and currently continues to work with 3D audio in his PhD dissertation. The tests were originally carried out with blindfolded individuals. In the next stage, after approval by the Unicamp ethics committee, tests will be conducted with the assistance of visually impaired volunteers. Once all the experiments needed to validate the new system have been carried out, it can be shared with the community.
Simplified approach
Another application that is in the final development stages is a navigation aid capable of informing the user where obstacles are located in his path. “Kinect, positioned on the head of the wearer, will capture video frames that will be segmented into different planes, such as floor, door, stairs, using algorithms,” explains Lourindo Neto. The information is then passed on to the user via audio. Other researchers also collaborated on the project: Vanessa Maike, Maria Cecilia Baranauskas’ student at the Institute of Computing (IC), who carried out experiments on natural interfaces, which involved natural forms of interaction between people and computers, and Anderson Rocha, also at IC, who works in biometrics.
The idea of transforming images into sound for the visually impaired is not new, but the technologies used by the Unicamp research group have resulted in a recognition approach that is simpler for the user, and reproduces the natural environment. The best-known device on the market with this concept is The vOICe, developed by Peter Meijer, at the Philips Research Laboratory in the Netherlands. The system consists of an ordinary camera attached to glasses—a smartphone camera can also be used—that captures images from the environment. These are then converted into sounds by a computer and transmitted through headphones. In this case, the camera captures scenes from the left to the right, and this information is received by the user through the left and right ears, in turn.
Projects
1. Vision for the blind: translating 30 visual concepts into 30 auditory clues (No. 2012/50468-6); Grant mechanism: Research Partnership for Technological Innovation (PITE); Principal investigator: Siome Klein Goldenstein (Unicamp); Investment: R$32,648.40 (FAPESP) and R$ 32,648.40 (Microsoft).
2. Machine learning for signal processing applied to spatial audio (No. 2014/14630-9); Grant mechanism: PhD grant; Principal investigator: Luiz César Martini (Unicamp); Grant recipient: Felipe Leonel Grijalva Arévalo (Unicamp); Investment: R$136,000.80.