Hope -- Android Prototype (Under Development)
- An Overview - Prototype will consist of tagging an image with related keyword. The keyword when
spoken will display the image on the android phone. This can assist older people or person with autism to
refresh their memory.
Also by combining images we can create sentences which would help people with speaking difficulties create understandable sentence. The challenge out here is to arrange the commonly used icons in such a way that it would be easily navigable. - When Picture speaks 1000 words A picture can depict lot of meaning. For e.g. an image shown below
The above image can mean a person is feeling hungy or they want to have dinner or lunch or they want to go out to eat , .... . The possiblities here can be endless.
- Our Technique - We tag every image by set of keyword. If tagged keyword happens to be action, associated
nouns would be displayed on recognition and if keyword happens to be noun, list of possible action will be shown. The keywords
need to be semantically understood for creating complete sentence.
- Future Enhancements - We would require to understand the semantic behind the keyword. This can be quite tricy since there can be a billion of words associated with the keyword. The main challenge that would be faced is how to represent such semtically related words on the handheld devices. Linking the tagged word to semantic based network like wordnet or wikki page.
- An Overview - Prototype consists of Icon-to-Speech conversion of Human Activities. Various activities that a person does are represented using Image icons. When an Icon is tapped, voice saying the activity associated with that icon is generated. This uses the Text-to-Speech Engine available in Android. This application would help people with speaking disabilities to use their phone speak for them.
- Our Technique -
There are four main categories of activities in the application
1. Greetings
2. Daily Living Activities
3. Emergency
4. Miscellaneous Activities
- Future Enhancements -1) A feature can be added which gives the user flexibility to add his own gesture by drawing it on the mobile screen and associating it with some text which he wants to be converted to voice.
For example: The user draws the symbol "T" on the screen and associates it with the text "Thank you" by typing it. The next time he opens the application and draws "T", voice saying "Thank you" should be generated.
2) Categorization can be made better using "Wordnet".
3) Improvement to the User Interface can be done.
- An Overview - Prototype consist of image detection and image recognition. Face detection consists of two parts: Static Detection and Dynamic Detection. User can take image of his choice with the android phone using static detection mode, in dynamic detection phone will show a square once it detects any face in the preview. On clicking face recognition button application match the image detected with its dataset and user will get result if image matched or not via speech. This can assist older people or low vision people who problem in recognizing the people.
- Our Technique - We are successfully detecting face and then recognizing it using eigenfaces (eigenvectors and eigenvalues) which sends the bitmap image to server which in turn uses images dataset to recognize a particular image. Once image recognition phase is done, it sends the output in form of string to client (android phone) from where we are converting the string to speech using Text to Speech engine available on android 1.6, therefore user hears sound if image matched or not.
- Future Enhancements - Optimization of face recognition can be done which can make it faster using some new technique or algorithm. Training from dataset can be used in such a way that it does not matches the image to all set but some best sets (preprocessing with the help of metadata). Example, suppose Tom take a picture of a young lady Mary and tries to recognize image, current model will match they image of Mary with all type of images (young man, old man, young girl, old woman, children) that will consume time, but with the help of meta data if we already know our image is related to young girl we can only match it with young girl images. Another future enhancement can be done regarding measuring distance via phone camera which will be helpful for the low vision people and blinds.
Hope -- Emulator Prototype (No longer under development)
The program uses Android emulator for prototyping speech recognition on the phone. Currently we are using Android SDK with CMU Sphinx since there is no support for speech recognition on Android emulator.High Level Architectural Diagram
Above
is a high level architectural solution for prototype of the
application. The reason of using the Speech Recognition and Sound
Server are two folds. The first reason for this approach is as pointed
earlier speech recognition is not available on android emulator.
Secondly Android emulator eventhough supports speech recording it does
not record it in the format that CMU Sphinx needs it in, so instead of
adding one more library in between to convert Android speech format to
Sphinx speech format we have used this approach. The cloud in between depicts socket programming is being used to synchronize actions like record, stop recording, begin speech recognition between android emulator and the Server.
Note: We still need to determine the performance of speech recognition on android enabled phone.
Other Solutions -
Possible WAMI based Solution
Other solution
Solution Available for download