DeepSeek

by DeepSeek

Open-source LLM family from DeepSeek, spanning V3.2-Exp, V4 Flash, and V4 Pro — offering frontier-competitive coding and reasoning at very low API cost.

See DeepSeek · GitHub · HuggingFace


DeepSeek V3.2-Exp

Released: 2025-09-29 (community announcement)

Summary

DeepSeek V3.2-Exp introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces compute while preserving model performance. Aimed at research access to next-generation attention mechanisms with substantial cost reductions for API users.

Features

  • DeepSeek Sparse Attention (DSA) to reduce compute and memory for attention
  • Maintains ~671B parameters on par with V3.1-Terminus
  • CUDA and TileLang kernel implementations for rapid research and optimized inference
  • HuggingFace availability and open-source deployment options
  • Significant API cost reductions (50%+ in many scenarios) due to efficiency and caching

Superpowers

V3.2-Exp gives researchers and engineers access to sparse-attention tech that enables large-context, cost-efficient deployments while maintaining strong performance on reasoning and conversation tasks.

Pricing & access

  • API cost reductions announced; cache-hit pricing and usage-based tiers available via DeepSeek’s API and partners. Self-hosting recommended via HuggingFace or official Docker images for custom infra.

Known limitations & notes

  • Experimental: V3.1-Terminus remained available until Oct 15, 2025 for migration and comparison
  • Community feedback window determined final V3.2 production rollout

Sources: DeepSeek release notes, GitHub, and community benchmarks.


DeepSeek V4

Released: April 2026 (Flash and Pro variants)

Summary

DeepSeek V4 is a major update to the DeepSeek family, shipping two tiers: V4 Flash (fast, lower quality) and V4 Pro (high quality, near-frontier). V4 Pro has been demonstrated producing outputs comparable to Claude Opus 4.7 on landing-page and coding tasks, at dramatically lower cost (~$0.46 for a full test session with OpenRouter).

Specs (claimed/reported)

  • Parameter count: ~284B (claimed)
  • Context window: 1M tokens
  • Variants: V4 Flash (speed-focused), V4 Pro (quality-focused)
  • Access: DeepSeek API, OpenRouter, HuggingFace

Flash vs Pro

  • V4 Flash struggles with complex prompts and multi-step coding tasks; useful for cheap experiments.
  • V4 Pro produces high-fidelity outputs — tested SVG generation, landing pages, and agentic coding via OpenCode + Superpowers plugin. Quality comparable to Claude Opus 4.7 on UI generation tasks.

Benchmark highlights

  • Tested against Claude Opus 4.7 on landing-page output quality: near-indistinguishable results
  • Confirmed working inside OpenCode (via OpenRouter) with Superpowers plugin
  • Used in agentic workflows: DeepSeek V4 + Claude Code enables ~100x cheaper coding pipelines (per Jack Roberts demo)
  • Hypothesis: Several Q2 2026 Chinese open-source models (Kimi K2.6, MiniMax M2.5) may be built on DeepSeek V4 base, raising the open-source baseline quality broadly

What changed from V3.2

  • Substantially larger context window (1M vs ~128K in V3.x)
  • Flash/Pro split offers better cost–quality choice
  • Greater agentic capability demonstrated through OpenCode integration
  • API costs remain very competitive with OpenRouter access

Pricing & access

  • OpenRouter: available via standard OpenRouter API (V4 Flash and Pro)
  • DeepSeek API: usage-based pricing with aggressive caching
  • HuggingFace: open weights available for self-hosting

Sources: