Voxtral

by Mistral AI

Open-weights text-to-speech model — multilingual, locally runnable ElevenLabs alternative

See https://huggingface.co/mistralai/Voxtral-4B-TTS-2603

Features

  • 4B parameter TTS model — compact enough to run locally on consumer hardware
  • 9+ language support — multilingual voice generation with natural-sounding output
  • Open weights — fully open; self-host for privacy and offline use
  • Voice agent compatible — designed for integration into voice agent pipelines
  • Hugging Face deployment — standard HF model format; straightforward local install

Superpowers

Voxtral is Mistral’s answer to ElevenLabs — a high-quality, multilingual TTS model released as open weights. The key differentiator over proprietary TTS (ElevenLabs, OpenAI TTS) is cost and privacy: run locally at zero per-character cost with no data leaving your infrastructure. At 4B parameters, it’s viable on a modern GPU without cloud rendering. Particularly valuable for developers building voice agents or content pipelines who want a production-quality voice layer without ongoing API costs or vendor lock-in.

Pricing

  • Open weights — free to self-host
  • API access through Mistral La Plateforme (check current pricing)