ThirdBrAIn.tech

Tag: inference-speed

4 items with this tag.

  • Oct 23, 2024

    https://i.ytimg.com/vi/KSltC4TXxZg/hqdefault.jpg

    Run LLAMA 3.1 405b on 8GB Vram

    • large-language-models
    • AI-optimization
    • GPU-memory
    • model-quantization
    • LLaMa-3-1
    • AI-hardware
    • inference-speed
    • model-compression
    • limited-hardware
    • AI-tools
    • YT/2024/M10
    • YT/2024/W43
  • Aug 30, 2024

    https://i.ytimg.com/vi/lzzp6gMhJjk/hqdefault.jpg

    Is Groq's Reign Over? Cerebras Sets a New Speed Record!

    • AI
    • artificial-intelligence
    • inference-speed
    • hardware-performance
    • language-models
    • machine-learning
    • GPU-cloud
    • model-benchmarking
    • inference-technology
    • cost-efficiency
    • YT/2024/M08
    • YT/2024/W35
  • Feb 28, 2024

    https://i.ytimg.com/vi/vKWtFVqr6Wc/hqdefault.jpg

    Groq API - Make your AI Applications Lighting Speed

    • AI
    • API
    • machine-learning
    • Python
    • JavaScript
    • real-time-applications
    • language-model
    • inference-speed
    • tutorial
    • cloud-computing
    • YT/2024/M02
    • YT/2024/W09
  • Aug 18, 2023

    https://i.ytimg.com/vi/y7h_0Rfowz4/hqdefault.jpg

    GGML vs GPTQ in Simple Words

    • AI
    • machine-learning
    • natural-language-processing
    • model-compression
    • quantization
    • GGML
    • GPTQ
    • neural-networks
    • inference-speed
    • hardware-optimization
    • YT/2023/M08
    • YT/2023/W33

Created with Quartz v4.5.0 © 2025 for

  • GitHub
  • Discord Community
  • Obsidian