LFM2-Audio (LFM2)

by Liquid AI

End-to-end audio foundation model (1.5B) for low-latency speech-to-speech and unified audio+text workflows

See Liquid AI LFM2 docs and repo

Summary

LFM2-Audio-1.5B (Oct 1, 2025) is an end-to-end audio foundation model that unifies audio and text in a single 1.5B parameter backbone. It emphasizes low latency, strong ASR/TTS quality, and supports interleaved and sequential generation modes for real-time and batch audio tasks.

Features

  • Unified audio+text token architecture with FastConformer encoder and RQ-Transformer decoding
  • 1.5B parameter model optimized for sub-100ms latency in short-turn interactions
  • Interleaved generation mode for low-latency speech-to-speech
  • Sequential generation for ASR/TTS and batch workflows
  • Strong ASR benchmarks and competitive WER vs Whisper-large-v3 in some tests

Superpowers

LFM2 removes the need for separate ASR + LM + TTS pipelines by enabling direct speech-to-speech and multimodal audio interactions with minimal latency.

Known limitations & notes

  • New release: expect rapid iteration and model/tooling updates
  • Production-scale deployment requires audio-specialized infra and careful latency engineering

Sources / notes:

  • Liquid AI release notes and community benchmarks.