LFM2-Audio (LFM2)
by Liquid AI
End-to-end audio foundation model (1.5B) for low-latency speech-to-speech and unified audio+text workflows
See Liquid AI LFM2 docs and repo
Summary
LFM2-Audio-1.5B (Oct 1, 2025) is an end-to-end audio foundation model that unifies audio and text in a single 1.5B parameter backbone. It emphasizes low latency, strong ASR/TTS quality, and supports interleaved and sequential generation modes for real-time and batch audio tasks.
Features
- Unified audio+text token architecture with FastConformer encoder and RQ-Transformer decoding
- 1.5B parameter model optimized for sub-100ms latency in short-turn interactions
- Interleaved generation mode for low-latency speech-to-speech
- Sequential generation for ASR/TTS and batch workflows
- Strong ASR benchmarks and competitive WER vs Whisper-large-v3 in some tests
Superpowers
LFM2 removes the need for separate ASR + LM + TTS pipelines by enabling direct speech-to-speech and multimodal audio interactions with minimal latency.
Known limitations & notes
- New release: expect rapid iteration and model/tooling updates
- Production-scale deployment requires audio-specialized infra and careful latency engineering
Sources / notes:
- Liquid AI release notes and community benchmarks.