Voxtral
by Mistral AI
Open-weights text-to-speech model — multilingual, locally runnable ElevenLabs alternative
See https://huggingface.co/mistralai/Voxtral-4B-TTS-2603
Features
- 4B parameter TTS model — compact enough to run locally on consumer hardware
- 9+ language support — multilingual voice generation with natural-sounding output
- Open weights — fully open; self-host for privacy and offline use
- Voice agent compatible — designed for integration into voice agent pipelines
- Hugging Face deployment — standard HF model format; straightforward local install
Superpowers
Voxtral is Mistral’s answer to ElevenLabs — a high-quality, multilingual TTS model released as open weights. The key differentiator over proprietary TTS (ElevenLabs, OpenAI TTS) is cost and privacy: run locally at zero per-character cost with no data leaving your infrastructure. At 4B parameters, it’s viable on a modern GPU without cloud rendering. Particularly valuable for developers building voice agents or content pipelines who want a production-quality voice layer without ongoing API costs or vendor lock-in.
Pricing
- Open weights — free to self-host
- API access through Mistral La Plateforme (check current pricing)