Llama 4 Released - it is not what I expected at all



AI Summary

Summary of the Video ‘Llama for Released Today’

  • Model Releases:
    • Llama for Scout (baby model) releases today.
    • Llama for Maverick (medium model) is next in size.
    • Llama for Behemoth is still training.
  • Coding Benchmarks:
    • Scout version scores 32.88, compared to Gemini 2.0 Flashlight at 28.9.
    • Maverick scores 34.5, against Gemini 2.0 Flash’s 43.4.
  • Model Sizes and Parameters:
    • Scout: 109 billion parameters, 10 million token context window.
    • Maverick: 400 billion parameters.
  • Provider Performance Issues:
    • Experienced issues with different providers, notably Deep infra and Parasol.
    • Observed inconsistent results across tests for games (e.g., pool game simulator).
  • Coding Tasks and Performance:
    • Tests on Llama for Maverick led to numerous failures in one-shot coding.
    • Identified that Llama models struggle with coding tasks, contrasting with Gemini Flash performance.
  • Context Window Issues:
    • Challenges faced when attempting to test the 10 million token context window.
  • Future Expectations:
    • Skepticism about the models’ effectiveness in coding. Awaiting insights from the Behemoth model.
  • Open Source Limitations:
    • Mentioned restrictions on usage and branding requirements for models under Meta policies.

Overall, the video expresses disappointment with the performance of Llama models for coding tasks, contrasting them with other models, and highlights the need for further improvements.