ThirdBrAIn.tech

Tag: model-compression

5 items with this tag.

  • Feb 02, 2025

    https://i.ytimg.com/vi/h7DUpHPasME/hqdefault.jpg

    What is LLM Distillation ?

    • llm
    • distillation
    • machine-learning
    • artificial-intelligence
    • model-compression
    • natural-language-processing
    • ai-efficiency
    • deep-learning-models
    • knowledge-transfer
    • ai-applications
    • YT/2025/M02
    • YT/2025/W05
  • Dec 28, 2024

    https://i.ytimg.com/vi/K75j8MkwgJ0/hqdefault.jpg

    Optimize Your AI - Quantization Explained

    • AI
    • quantization
    • model-optimization
    • deep-learning
    • memory-reduction
    • neural-networks
    • AI-models
    • context-quantization
    • hardware-efficiency
    • model-compression
    • YT/2024/M12
    • YT/2024/W52
  • Oct 23, 2024

    https://i.ytimg.com/vi/KSltC4TXxZg/hqdefault.jpg

    Run LLAMA 3.1 405b on 8GB Vram

    • large-language-models
    • AI-optimization
    • GPU-memory
    • model-quantization
    • LLaMa-3-1
    • AI-hardware
    • inference-speed
    • model-compression
    • limited-hardware
    • AI-tools
    • YT/2024/M10
    • YT/2024/W43
  • Mar 02, 2024

    https://i.ytimg.com/vi/ZpxQec_3t38/hqdefault.jpg

    The Era of 1-bit LLMs by Microsoft | AI Paper Explained

    • large-language-models
    • AI-research
    • quantization
    • model-efficiency
    • machine-learning
    • neural-networks
    • transformer-models
    • model-compression
    • AI-hardware
    • Microsoft
    • YT/2024/M03
    • YT/2024/W09
  • Aug 18, 2023

    https://i.ytimg.com/vi/y7h_0Rfowz4/hqdefault.jpg

    GGML vs GPTQ in Simple Words

    • AI
    • machine-learning
    • natural-language-processing
    • model-compression
    • quantization
    • GGML
    • GPTQ
    • neural-networks
    • inference-speed
    • hardware-optimization
    • YT/2023/M08
    • YT/2023/W33

Created with Quartz v4.5.0 © 2025 for

  • GitHub
  • Discord Community
  • Obsidian