Meta’s Llama 4 is a beast (includes 10 million token context)



AI Summary

Llama 4 Overview

  • Meta released Llama 4, their new open-source large language model family with three sizes: Behemoth, Maverick, and Scout.
  • It features a significant improvement in context window size, with Scout having a 10 million token context window (compared to current standards of 128,000 tokens).

Model Details

  1. Llama for Scout
    • Size: Small
    • 17 billion active parameters, 109 billion total parameters
    • Multimodal capabilities (text and images)
    • Efficient resource use: can run on a single Nvidia H100 GPU
    • Revolutionary context window handling up to 10 million tokens
  2. Llama for Maverick
    • Size: Medium
    • 400 billion total parameters, 128 experts
    • Efficient with 17 billion active parameters
    • Cost-efficient: 19 cents per million input/output tokens
    • Competitive performance in coding and reasoning tasks
  3. Llama for Behemoth
    • Size: Large
    • 2 trillion total parameters, 288 billion active parameters
    • Outperforms Gemini 2.0 Pro in benchmarks while still in preview training

Advantages of Open Source Models

  • Developers have flexibility and control with open-source models over closed-source platforms.
  • Opportunities for self-hosting and fine-tuning.

Resources to Try Llama 4

  • Meta AI website: meta.ai
  • Download links available on Hugging Face and direct requests for models if hardware permits.
  • Grock (chat.grock.com) platform for testing the models interactively.
  • Access through Meta’s owned applications like WhatsApp, Messenger, and Instagram.