Magenta RT — Infinite Real-Time Music Gen by Google DeepMind (Full Demo)



AI Summary

The video introduces Magenta RT, a real-time music generation system developed by Google DeepMind, open-sourced under the Apache 2.0 license. The host explores how Magenta RT streams music generation and allows real-time modifications using sliders to adjust music features. It generates music referencing the past 10 seconds of audio and has a real-time factor of 1.6, meaning it generates music faster than real-time. The presenter attempts to run it locally but was limited by high memory (VRAM) requirements, suggesting higher hardware needs. The system comprises various models including a discrete audio codec, a contrastive model embedding audio and text, and a transformer generating audio tokens contextually. The video demos the use of the system via a Google Colab TPU instance, producing different music styles such as synth-wave, piano, R&B, and experimenting with various prompts. The host highlights its potential for DJing, practicing musical instruments, and procedural video game music generation. Limitations include generating non-lexical vocalizations but no lyrics, with some risk of explicit content. The video concludes urging viewers to try the open Colab demo and looks forward to local inference and personal fine-tuning capabilities coming soon.