Human-Machine Interaction 3 Min Read Jun 28, 2024
Everything you need to know about Hume AI’s voice AI with Emotional Intelligence
Beyond only assembling transcription, LLMs, and text-to-speech, EVI performs a great deal more. With a new empathic LLM (eLLM) that analyzes your voice, EVI can generate more empathic language, know when to talk and modulate its own tune, rhythm, and timbre with intelligence.
Published By : Milin Stanly Hume AI, a New York-based startup founded in 2021 by Alan Cowen, a former researcher by Google DeepMind, recently launched the first voice AI with emotional intelligence, which can generate conversations for the emotional well-being of its users.
This Empathetic Voice Interface (EVI) is a sophisticated voice AI with emotional intelligence that can be downloaded as an iOS app. EVI is now powered by Anthropic’s Claude 3.5 Sonnet and has a new AI voice named Kora.
Beyond only assembling transcription, LLMs, and text-to-speech, EVI performs a great deal more. With a new empathic LLM (eLLM) that analyzes your voice, EVI can generate more empathic language, know when to talk and modulate its own tune, rhythm, and timbre with intelligence.
EVI is the first voice AI that sounds like it understands you. It can communicate meaning beyond words by changing the tone of its voice, which opens the door to more effective, seamless, and fulfilling AI interactions.
Accessing EVI
The primary method of interacting with EVI is using a WebSocket connection to broadcast and receive audio and receive responses in real-time. This allows for smooth, two-way communication in which people talk, EVI hears and interprets their speech, and EVI provides empathetically intelligent responses. Establishing a connection with the WebSocket and sending the user’s voice input to EVI initiates a dialogue.
The client can send EVI text to speak aloud while the user speaks to it; this is intelligently integrated into the conversation.
Empathic AI (eLLM) features
- Responds at the right time: Use your tone of voice for state-of-the-art end-of-turn detection — the actual bottleneck to responding rapidly without interrupting you.
- Understands users’ prosody: Provides streaming measurements of the tune, rhythm, and timbre of the user’s speech using Hume’s prosody model, integrated with our eLLM.
- Forms its own natural tone of voice: Guided by the users’ prosody and language, the model responds with an empathic, naturalistic tone of voice, matching the users’ nuanced “vibe” (calmness, interest, excitement, etc.). It responds to frustration with an apologetic tone, to sadness with sympathy, and more.
- Responds to the expression: Powered by its empathic large language model (eLLM), EVI crafts responses that are not just intelligent but attuned to what the user is expressing with their voice.
- Always interruptible: Stops rapidly whenever users interject, listen, and respond with the right context based on where it left off.
- Aligned with well-being: Trained in human reactions to optimize for positive expressions like happiness and satisfaction. EVI continuously learns from users’ responses.
Sources of Article