Scribe model from the voice-AI company
Teams already using ElevenLabs for TTS who want to round-trip audio in the same dashboard.
ElevenLabs entered the ASR race with Scribe, a model that lands competitive WER scores on English and Spanish while inheriting the company's strong diarisation work from voice cloning. Cleanest if you already use ElevenLabs for TTS. Long-tail languages still favour Whisper.
ElevenLabs Scribe is the company's speech-to-text offering, aimed at developers building voice agents and creator tooling on the same vendor stack. Quality is strongest on clean studio audio, where it competes credibly with Deepgram Nova and Whisper-large-v3 on English. On Spanish it lands in the same competitive band. On long-tail languages and accented English, Whisper still tends to win, and on conversational noisy audio AssemblyAI's Universal models can outperform. The honest pitch is platform consistency. If you're already in the ElevenLabs ecosystem for voice cloning or TTS, Scribe slots into the same dashboard with the same billing and the same SDKs, which is real operational value over juggling two providers. Diarisation and speaker labels are good out of the box, a benefit of the company's prior work on voice identification. Word-level timestamps are included, which makes Scribe a viable backend for caption-generation tools and transcript editors. The gaps to watch are language coverage and the lack of a streaming endpoint, which rules it out for live captioning workloads. Documentation is clean and the API is straightforward to integrate. For podcasters who care about a clean studio transcript and already have an ElevenLabs account, it's worth a comparison test against your current ASR.
Real-time transcription and meeting notes with sharable highlights.
Voice AI API that developers reach for when accuracy and uptime actually matter.
Pay-per-minute transcription with human-grade accuracy when you actually need 99%.
Scribe model from the voice-AI company
ElevenLabs Speech-to-Text is shaped for teams already using elevenlabs for tts who want to round-trip audio in the same dashboard.. Its biggest strength: diarisation and speaker labels are solid. Cleanest if you already use ElevenLabs for TTS
newer than competitors, less battle-tested; limited non-english depth versus whisper. None of these are deal-breakers on their own, but they're worth knowing before you commit.
There's a free tier, and you can ship work on it before deciding to upgrade. Confirm what's included on their site.
Closest in the same category: Otter.ai, AssemblyAI, Rev. Each has its own shape — see the alternatives page for a side-by-side.