Results for ""
Traumatic Brain Injury (TBI) is a leading cause of death and disability in young people. According to the statistics, over 69 million people suffer from TBI annually. The initial trauma disrupts brain homeostasis resulting in metabolic destruction. Brain injury leaves many incapable of communication through speech, typing or gestures. There are instances where treating the patients is sometimes extremely difficult, as it involves risk at various levels.
However, will not the life of these people dramatically change if researchers developed a technology to decode language directly from noninvasive brain recordings? The years-long attempt of researchers worldwide has achieved a significant milestone in this arena.
Meta has recently shared that they had developed an AI model that can decode speech from noninvasive recordings of brain activity. Earlier, they had been attempting to build AI that processes language as people do. For this, they used Deep Learning which aided the researchers in analyzing complex brain signals. The approach highlighted where and when perpetual representations of words and sentences are generated in the brain when a volunteer reads or listens to a story.
The attempts to decode speech from brain activity have relied on invasive brain-recording techniques. These devices provide clearer signals than noninvasive methods. However, it also required neurosurgical interventions. Results from the work suggested that decoding speech with noninvasive approaches would provide a safer, more scalable solution that could ultimately benefit many more people.
In the recently published work, Meta overcame the previous challenges by creating a Deep Learning model trained with contrastive learning and then using it to maximally align noninvasive brain recordings and speech sounds. They used the model to identify the complex representation of speech in the brains of volunteers listening to audiobooks.
From three seconds of brain activity, their results showed that their model could decode the corresponding speech segments with up to 73 per cent top-10 accuracy from a vocabulary of 793 words, i.e., a large portion of the words we typically used on a day-to-day basis.
For the development of the AI model, Meta researchers focused on two noninvasive technologies: electroencephalography and magnetoencephalography (EEG and MEG for short), which measure the fluctuations of electric and magnetic fields elicited by neuronal activity, respectively. In practice, both systems can take approximately 1000 snapshots of macroscopic brain activity every second, using hundreds of sensors.
The EEG and MeG recordings are inputted into the brain model, which consists of a standard deep convolutional network. The recording ovary extensively across individuals because of the differences in anatomy. In previous studies, brain decoders were trained on a small number of recordings to predict a limited set of speech features, such as part-of-speech categories or words from a small vocabulary. In this research, a new subject embedding layer trained end-to-end to align all brain recordings in a common space was designed. Finally, the architecture learns to align the output of this brain model to the deep representations of the speed sounds presented to the participants.
After training, the system could perform ‘zero-shot classification’- given a snippet of brain activity, it could determine from a large pool of new audio clips which one the person heard.
The results of the Meta research are encouraging as they show that self-supervised trained AI can successfully decode perceived speech from noninvasive brain activity, despite the noise and variability inherent in those data. However, these results are only the first steps. The ultimate goal of Meta is to enable patient communication, which will require extending work to speech production. They believe that this could even reach beyond assisting patients to potentially include enabling new ways of interacting with computers.