Sakana AI New Model Sparks a RL Revolution



AI Summary

The video discusses a groundbreaking new approach by Sakana AI for training AI models using reinforcement learning from a teaching perspective rather than the traditional solving-from-scratch method. Instead of training a student model directly via rewards for correct answers, they train a teacher model to generate step-by-step explanations that improve the student’s learning. This “reinforcement learned teacher” (RLT) model uses smaller, more efficient architectures (around 7 billion parameters) to teach larger student models and achieves better performance on complex math and science benchmarks compared to much larger models trained traditionally. This method dramatically reduces training cost and time — from months to less than a day on a single node — while producing superior reasoning and explanation quality. The approach could revolutionize AI training by making advanced models more accessible, affordable, and efficient. The video also mentions Sakana AI’s previous projects like the Darwin Goal machine, highlighting their innovative work in AI self-improvement and recursive learning.