Sphinx Unveiled: Exploring the groundbreaking speech recognition revolution

Pillars
IndiaAI Portal
Resources
Ecosystem
Sectors

Back

Results for ""

IndiaAI Recommends

Sphinx 3 was released in 2001 after Carnegie Mellon University's Sphinx team pledged to make several speech recognizer components public in 2000. Acoustic models and application samples accompany speech decoders. In addition to software for training acoustic models and compiling language models, the resources at one's disposal also consist of cmudict, a pronunciation dictionary in the public domain.

Sphinx

Sphinx is a speaker-independent, continuous-speech recognition system that employs an n-gram statistical language model and hidden Markov acoustic models (HMMs). The innovation came from Kai-Fu Lee. At the time (1986), the efficacy of Sphinx's feature of continuous-speech, speaker-independent large-vocabulary recognition was contested. Sphinx is merely of historical significance; subsequent iterations have surpassed its performance. An archival article describes the system in depth.

Sphinx 2

Sphinx 2 emphasizes real-time recognition that is appropriate for applications involving spoken language. As such, it includes end-pointing, partial hypothesis generation, dynamic language model transitioning, and other similar functionalities. Dialogue systems and language learning systems employ it. Asterisk and other computer-based PBX systems are capable of utilizing it. A considerable number of commercial products have also integrated Sphinx 2 code. It is presently not undergoing active development except for regular maintenance. Development of a real-time decoder is currently underway as part of the Pocket Sphinx initiative. Archival articles provide descriptions of systems.

Sphinx 3

For acoustic modelling, Sphinx 2 employed a semi-continuous representation (i.e., a single set of Gaussians is utilized for all models, and a weight vector over these Gaussians represents each model). The predominant continuous HMM representation was implemented in Sphinx 3 and used primarily for non-real-time, high-accuracy recognition. Sphinx 3 is "near" real-time due to recent advancements (in hardware and algorithms), but it is not yet ready for critical interactive applications. Sphinx 3 is currently undergoing active development and, when used in conjunction with SphinxTrain, grants access to several contemporary modelling techniques that enhance recognition accuracy. Examples of these techniques are described in the article on Speech Recognition.

Sphinx 4

Sphinx 4 is a complete rebuild of the Sphinx engine that aims to provide a more flexible framework for speech recognition research. It is developed entirely in Java. Sun Microsystems helped build Sphinx 4 and provided software engineering expertise to the project. Participants included people from MERL, MIT, and CMU. (Currently supported languages are C, C++, C#, Python, Ruby, Java, and JavaScript.)

Sources of Article

Image source: Unsplash

Want to publish your content?

Publish an article and share your insights to the world.