Unified speech model with mid-sentence translation across 60+ languages.
Multilingual voice apps
Soniox is what you reach for when you need mid-sentence language switching in production — code-switching across 60+ languages, two-way translation in the same stream, all included at one price. The async rate is brutally cheap (about $0.10/hr), which makes it competitive with raw Whisper. The flip side: it's a developer tool, and the consumer app is a recent and modest addition.
Soniox is a speech-to-text API built around a unified model that handles transcription, translation, diarization, and timestamps in a single stream — all priced together rather than as add-ons. The differentiating capability is true code-switching: if a speaker flips between English, Spanish, and French in one sentence, Soniox transcribes the mix correctly without you pre-selecting a language. 60+ languages are supported. Real-time streaming returns tokens within milliseconds, which is critical for voice AI agents and live captioning. Pricing is token-based but works out to roughly $0.10/hour for async file processing and $0.12/hour for real-time streaming — competitive with raw Whisper API and lower than Gladia's Starter tier. A consumer-facing Soniox app exists at $19.99/mo Pro for unlimited transcription, translation, summaries, and insights, but the product is clearly API-first and the consumer experience hasn't reached Otter or Notta polish levels. The cons: token-based pricing takes math to predict (input audio tokens, input text tokens, output text tokens — all priced separately), the developer community is smaller than around Whisper, and the documentation is solid but assumes engineering competence. Best for developers building voice products with multilingual or code-switching audio, AI meeting tools targeting global markets, contact centers with mixed-language traffic. Wrong fit for non-technical users wanting a finished transcription UI — pick Happy Scribe or Sonix instead.
Real-time transcription and meeting notes with sharable highlights.
Voice AI API that developers reach for when accuracy and uptime actually matter.
Pay-per-minute transcription with human-grade accuracy when you actually need 99%.
Unified speech model with mid-sentence translation across 60+ languages.
Soniox is shaped for multilingual voice apps. Its biggest strength: strongest code-switching across languages today. The async rate is brutally cheap (about $0
api-first, consumer app is bare; token-based pricing takes math. None of these are deal-breakers on their own, but they're worth knowing before you commit.
It's a paid tool in the $ range. Some plans have a free trial — check the latest on their pricing page.
Closest in the same category: Otter.ai, AssemblyAI, Rev. Each has its own shape — see the alternatives page for a side-by-side.