What language do you think in? How do you read in your mind – loud or soft? Are you one of those who enjoys enacting within the mind to find creative solutions?
Whatever you do, all of these engage a process named sub-vocalization. This is an auditory reassurance that involves speaking words in the mind. There have been multiple attempts to analyze these sub-vocals to decode internal monologue and translate them into digital commands. Such attempts include invasive and non-invasive methods to study silent, or sub-auditory speech, such as when a person silently reads or talks to oneself. Invasive systems include placing sensors on the tongue to measure tongue movements while thinking. A permanent magnet (PMA) sensor captures the movement of specific points on the tongue muscles used in speech articulation.
The non-invasive method includes multiple approaches to detect and recognize silent speech bringing into effect EEG sensors for silent speech recognition. Deep learning on video without acoustic vocalization, and instead placing cameras externally to decode language from lip movement is another non-invasive method under the research ambit. Facial muscle movement is also being deciphered via surface electromyography by using a phoneme-based acoustic model where the user has to explicitly mouth the words and have to use pronounced facial movements.
Today, conversational interfaces exist in various forms. Recent advances in speech recognition methods have enabled users to interact with computing devices in natural language. This has led to the advent of ubiquitous natural voice interfaces which are deployed currently in mobile devices as virtual assistants (e.g. Siri, Alexa, Cortana, etc.). These interfaces have also been integrated in devices such as smart wearables, dedicated hardware speakers such as Google Home and Amazon Echo, and in robots. However, use of these devices during interaction poses security concerns as users have to utter words while in communication. Also, these devices are not personal devices and any user can intentionally or unintentionally send valid voice inputs to the devices.
Research shows that scientists from the National Aeronautics and Space Administration (NASA) have analyzed human silent reading using nerve signals in the throat that control speech. In preliminary experiments, NASA scientists examined nerve signal data collected by small, button-sized sensors attached under the chin and on either side of the laryngeal prominence (Adam’s apple). These signals were converted to words using a computer program.
Researchers at Massachusetts Institute of Technology (MIT) have developed a wearable augmented intelligence headset called AlterEgo, a silent speech interface, that enables a discreet, seamless and bi-directional communication with a computing device in natural language without discernible movements or voice input.
Take for instance your need to know the ‘time’. Your facial and jaw muscles make micro-movements to sound out the word ‘time’ in your head. Electrodes on the underside of the AlterEgo headset placed against your face record these movements and transmit the recorded signals to an external computer via Bluetooth. A neural network processes the signals as a speech-to-text program and responds by prompting the ‘time’.
AlterEgo implements modular neural networks in a hierarchical manner to accommodate for simultaneous accessibility to applications. These applications are initiated by vocalizing corresponding trigger words such as the word ‘IoT’ for initiating wireless device control using the interface. Today, vocabulary sets are modeled as n-gram sequences where the recognition of a specific word assigns a probability distribution to subsequent vocabulary sets. The probability is assigned to vocabulary sets related to specific applications where each set is detected by a convolutional neural network. This table shows a hierarchical organization of vocabulary sets:
A patent study shows a granted patent and a published application on silent speech-based command to a computing device and use of invasive methods to capture movement of muscles used in speech articulation. Hewlett-Packard owns a granted patent US 8836638 B2 that discloses a method to execute a command on a computing device. The computing device receives a first command and a second command; here, the second command is silent speech detected from the user’s lip movement without sound. However, the method uses a device camera to detect the user’s silent speech command.
The patent application filed by Georgia Tech Research Corp. and numbered US 20140342324 A1 discloses a system for recording natural tongue movements in a 3D oral space. The method includes the attachment of a small magnetic tracer to the tongue, either temporarily or semi-permanently, along with an array of magnetic sensors around the mouth. Additionally, the system can record tongue trajectories and create an indexed library of such traces. The indexed library can be used as a tongue tracking silent speech interface.
Silent speech interface allows a user to communicate with computers, applications, and people seamlessly. These also include speech interfaces with telecommunication devices, speech based smart assistants, social robots and more via silent speech. Silent speech interfaces are more private and personal and do not conflict with existing verbal communication channels between people. While this technology is still being researched upon, it sure is a step towards personalization for users, and artificial intelligence/ machine learning will play a key role in furthering this as a generalized multi-user system that is user-independent.
(Featured image is for representative purpose only and has been sourced from https://pixabay.com/en/brain-thinking-idea-intelligence-494152/)