AI Voice Agents
AI Voice Agents - Exploring the Next Generation of Human-Machine Interaction! 🎙️🤖🎧
Table of Contents
Project List
Full Stack
Source |
Description |
Code |
Paper |
Model |
Bland AI |
Bland AI - Automate Phone Calls with Conversational AI. Transform your enterprise communication with Bland AI. Automate inbound and outbound phone calls using AI that sounds human. Bland is a platform for AI phone calling. Using our API, you can easily send or receive phone calls with a programmable voice agent. |
|
|
API |
GPT-4o |
GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. |
|
|
API |
Retell AI |
Retell AI -Build Advanced Voice AI, Powered by LLM. |
|
|
API |
^ Back to Contents ^
Text To Speech
Source |
Description |
Code |
Paper |
Model |
ChatTTS |
ChatTTS is a text-to-speech model designed specifically for dialogue scenario such as LLM assistant. |
GitHub |
|
Hugging Face |
CosyVoice |
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability. |
GitHub |
|
|
ElevenLabs |
ElevenLabs: Text to Speech & AI Voice Generator. |
|
|
API |
Matcha-TTS |
Matcha-TTS: A fast TTS architecture with conditional flow matching. |
GitHub |
arXiv |
|
StyleTTS 2 |
Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models. |
GitHub |
arXiv |
|
XTTS |
🐸TTS is a library for advanced Text-to-Speech generation. |
GitHub |
|
|
^ Back to Contents ^
Automatic Speech Recognition
Source |
Description |
Code |
Paper |
Model |
SenseVoice |
SenseVoice is a speech foundation model with multiple speech understanding capabilities, including automatic speech recognition (ASR), spoken language identification (LID), speech emotion recognition (SER), and audio event detection (AED). |
GitHub |
|
Hugging Face |
TeleSpeech-ASR |
Large speech model-super multi-dialect ASR. |
GitHub |
|
Hugging Face |
Whisper |
Whisper is a general-purpose speech recognition model. |
GitHub |
arXiv |
Hugging Face |
^ Back to Contents ^
Audio Generation
^ Back to Contents ^