ElevenLabs Cheat Sheet
Complete quick reference guide to AI voice generation
This comprehensive cheat sheet covers everything you need to know about ElevenLabs: text-to-speech (TTS), voice cloning, API usage, model selection, and advanced techniques for creating natural-sounding AI voices.
Quick Start Guide
Get results in 5 minutes
Complete Working Example
Copy this cURL command, replace API_KEY and run it to generate your first voice:
curl -X POST https://api.elevenlabs.io/v1/text-to-speech/21m00Tcm4TlvDq8ikWAM \ -H "xi-api-key: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "text": "Welcome to ElevenLabs. This is your first generated voice.", "model_id": "eleven_flash_v2_5", "voice_settings": { "stability": 0.5, "similarity_boost": 0.75 } }' \ --output speech.mp3
21m00Tcm4TlvDq8ikWAM (Rachel - default female voice) Get your API key: elevenlabs.io → Developer → API Key
Choose Your Model
eleven_turbo_v2_5eleven_flash_v2_5eleven_multilingual_v2eleven_v3Tune Voice Settings
0.0 - 1.0High (0.8): Consistent & predictable
0.0 - 1.0Sweet spot: 0.75 for most cases
0.0 - 1.0Start at 0.0, increase if needed
Models Comparison
Choose the right model for your use case
| Model | Best For | Latency | Languages | Char Limit | Key Features |
|---|---|---|---|---|---|
Flash v2.5 eleven_flash_v2_5 | Real-time applications Chatbots, live agents, conversational AI | 75ms Ultra-fast | 32 languages | 40,000 ~40 min audio | • Lowest latency • Optimized for speed • WebSocket streaming • Good quality/speed balance |
Turbo v2.5 eleven_turbo_v2_5 | Balanced use cases General purpose, narration, IVR | ~300ms Fast | 32 languages | 40,000 ~40 min audio | • Better quality than Flash • Still fast enough for real-time • More emotional range • Best all-rounder |
Multilingual v2 eleven_multilingual_v2 | Premium quality Audiobooks, podcasts, videos | ~1-2s Slower | 29 languages Premium quality | 5,000 ~5 min audio | • Highest quality • Best for long-form content • Natural prosody • Accent preservation |
Eleven v3 eleven_v3 | Emotional & expressive Storytelling, character voices, drama | ~1-2s Slower | 70+ languages Most supported | 3,000 ~3 min audio | • Multi-speaker dialogue • Audio tags [laughs] [whispers] • Most emotional depth • Character acting |
Scribe v1 scribe_v1 | Speech-to-text Transcription, voice cloning prep | Variable Depends on audio | 99 languages Most coverage | Audio file based | • Converts speech to text • Supports most languages • Use before voice cloning • High accuracy |
Text Control & Formatting
Control timing, pronunciation, and speech patterns
Pauses & Timing
Sentence. Next.Pronunciation Control
Madison actually Alias Tags
UN United Nations Speed & Pacing
0.7 - 1.2Pronunciation Dictionaries (.PLS)
- Upload custom pronunciation files
- Reusable across multiple requests
- Combine phonemes + aliases
- Case-sensitive matching
Emotion & Expression
Make AI voices sound human and expressive
Narrative Context
Add emotional context around dialogue for natural expression
"You're leaving?" she asked, her voice trembling with sadness."That's it!" he exclaimed triumphantly.v3 Audio Tags
Special tags for Eleven v3 model only
[laughs][whispers][sighs][exhales][sarcastic][curious][excited][crying][gunshot][applause][clapping]Special Tags & Effects
[strong French accent] Bonjour![sings]Singing voice[woo]ExclamationPunctuation Techniques
This is AMAZING!I don't know... maybe...You did what?Eleven v3 Mode Settings
Choose the right mode for your emotional needs
Voice Selection & Settings
Choose and configure the perfect voice
Voice Library
Rachel - Neutral femaleAdam - Deep maleBella - Soft femaleVoice Cloning Types
Voice Settings Presets
0.6-0.80.70.2-0.40.7-0.90.90.0-0.20.3-0.50.80.7-1.0Speaker Boost
Selection Tips
WebSocket Streaming
Real-time audio generation with ultra-low latency
Connection Setup
wss://api.elevenlabs.io/v1/text-to-speech/{voice_id}/streamxi-api-key: YOUR_API_KEYMessage Protocol
{ "text": "Hello world", "model_id": "eleven_flash_v2_5", "voice_settings": { "stability": 0.5, "similarity_boost": 0.75 } }Chunk Management
ws.send(JSON.stringify({ text: "First chunk. " })) ws.send(JSON.stringify({ text: "Second chunk." }))Flushing & Completion
ws.send(JSON.stringify({ flush: True }))Buffering Strategy
Error Handling
- 401 - Invalid API key
- 429 - Rate limited
- 400 - Invalid chunk format
Performance Tips
- Use Flash v2.5 model
- Send 200-500 char chunks
- Flush at sentence breaks
- Keep connection alive
- Buffer 300-500ms initially
- Use complete sentences
- Avoid mid-word chunks
- Proper punctuation
- Implement reconnection
- Monitor connection health
- Handle rate limits
- Log failed chunks
API Quick Reference
Essential endpoints and code examples
Text-to-Speech (POST)
POST https://api.elevenlabs.io/v1/text-to-speech/{voice_id}xi-api-key: YOUR_API_KEYContent-Type: application/json{ "text": "Your text here", "model_id": "eleven_turbo_v2_5", "voice_settings": { "stability": 0.5, "similarity_boost": 0.75, "style": 0.0, "use_speaker_boost": true } }Python Example
import requests url = f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}" headers = { "xi-api-key": API_KEY, "Content-Type": "application/json" } data = { "text": "Hello world", "model_id": "eleven_turbo_v2_5" } response = requests.post(url, json=data, headers=headers) with open("output.mp3", "wb") as f: f.write(response.content)
JavaScript Example
const response = await fetch(url, { method: 'POST', headers: { 'xi-api-key': API_KEY, 'Content-Type': 'application/json' }, body: JSON.stringify({ text: 'Hello world' }) }); const audioBuffer = await response.arrayBuffer(); const blob = new Blob([audioBuffer], { type: 'audio/mpeg' }); const urlObject = URL.createObjectURL(blob);
List Voices (GET)
GET https://api.elevenlabs.io/v1/voicesList Models (GET)
GET https://api.elevenlabs.io/v1/modelsVoice Cloning (POST)
POST https://api.elevenlabs.io/v1/voices/add- Upload audio files
- Provide descriptive name
- Choose cloning type (IVC/PVC)
- Optional: description, labels
Language Support
Multilingual voice generation across 70+ languages
Language Coverage by Model
Language Codes
{ "text": "Hola, ¿cómo estás?", "model_id": "eleven_multilingual_v2", "language_code": "es" }language_code for best accuracyMultilingual Tips
Regional Variants
- Spanish: es-ES, es-MX, es-AR
- Portuguese: pt-PT, pt-BR
- English: en-US, en-UK, en-AU
- French: fr-FR, fr-CA
Code Switching
"Welcome to Mexico City. Bienvenidos a la Ciudad de México."Asian Language Notes
- Simplified (zh-CN)
- Traditional (zh-TW)
- Tone-aware generation
- Character recognition
- Hiragana support
- Katakana support
- Kanji recognition
- Pitch accent aware
- Hangul support
- Hanja recognition
- Formal/informal tones
- Natural intonation
Special Characters
Automatic Detection
Troubleshooting
Common issues and solutions
Audio Quality Issues
- Lower stability (try 0.3-0.5)
- Increase similarity boost
- Add narrative context
- Try different voice
- Increase stability (0.7-0.8)
- Use Turbo v2 instead of v3
- Remove excessive breaks
- Simplify text formatting
- Enable speaker boost
- Reduce style exaggeration
- Fix text punctuation
- Regenerate (use 3x feature)
Voice Cloning Problems
- Use 2+ minutes of audio
- Ensure clean recording
- Single speaker only
- No background noise
- Match audio style to use case
- Adjust similarity boost
- Try PVC instead of IVC
- Use v3 model for IVC
- Check file format (MP3/WAV)
- Max 10 files per upload
- 1-2 min total duration
- Remove silence/pauses
Pronunciation Issues
- Use phoneme tags
- Add pronunciation dictionary
- Provide phonetic hints
- Spell out syllables
- Set language_code
- Use narrative context
- Try accent tags
- Break into syllables
Latency Issues
- Use Flash v2.5
- Shorten text chunks
- Avoid complex tags
- Use WebSocket streaming
- Retry after delay
- Check status page
- Use alternate region
- Batch non-urgent jobs
Emotion Not Working
- Add contextual cues
- Use narrative descriptions
- Lower stability
- Use v3 tags
- Increase stability
- Reduce style exaggeration
- Use calmer language
- Switch to Turbo
Character Limit Issues
- Flash/Turbo: 40k limit
- Multilingual: 5k limit
- v3: 3k limit
- Split by sentences
- Send 200-500 char chunks
- Flush after sections
- Use queue for long form
- Store progress markers
