DeepSeek
by DeepSeek
Open-source LLM family from DeepSeek, spanning V3.2-Exp, V4 Flash, and V4 Pro — offering frontier-competitive coding and reasoning at very low API cost.
See DeepSeek · GitHub · HuggingFace
DeepSeek V3.2-Exp
Released: 2025-09-29 (community announcement)
Summary
DeepSeek V3.2-Exp introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces compute while preserving model performance. Aimed at research access to next-generation attention mechanisms with substantial cost reductions for API users.
Features
- DeepSeek Sparse Attention (DSA) to reduce compute and memory for attention
- Maintains ~671B parameters on par with V3.1-Terminus
- CUDA and TileLang kernel implementations for rapid research and optimized inference
- HuggingFace availability and open-source deployment options
- Significant API cost reductions (50%+ in many scenarios) due to efficiency and caching
Superpowers
V3.2-Exp gives researchers and engineers access to sparse-attention tech that enables large-context, cost-efficient deployments while maintaining strong performance on reasoning and conversation tasks.
Pricing & access
- API cost reductions announced; cache-hit pricing and usage-based tiers available via DeepSeek’s API and partners. Self-hosting recommended via HuggingFace or official Docker images for custom infra.
Known limitations & notes
- Experimental: V3.1-Terminus remained available until Oct 15, 2025 for migration and comparison
- Community feedback window determined final V3.2 production rollout
Sources: DeepSeek release notes, GitHub, and community benchmarks.
DeepSeek V4
Released: April 2026 (Flash and Pro variants)
Summary
DeepSeek V4 is a major update to the DeepSeek family, shipping two tiers: V4 Flash (fast, lower quality) and V4 Pro (high quality, near-frontier). V4 Pro has been demonstrated producing outputs comparable to Claude Opus 4.7 on landing-page and coding tasks, at dramatically lower cost (~$0.46 for a full test session with OpenRouter).
Specs (claimed/reported)
- Parameter count: ~284B (claimed)
- Context window: 1M tokens
- Variants: V4 Flash (speed-focused), V4 Pro (quality-focused)
- Access: DeepSeek API, OpenRouter, HuggingFace
Flash vs Pro
- V4 Flash struggles with complex prompts and multi-step coding tasks; useful for cheap experiments.
- V4 Pro produces high-fidelity outputs — tested SVG generation, landing pages, and agentic coding via OpenCode + Superpowers plugin. Quality comparable to Claude Opus 4.7 on UI generation tasks.
Benchmark highlights
- Tested against Claude Opus 4.7 on landing-page output quality: near-indistinguishable results
- Confirmed working inside OpenCode (via OpenRouter) with Superpowers plugin
- Used in agentic workflows: DeepSeek V4 + Claude Code enables ~100x cheaper coding pipelines (per Jack Roberts demo)
- Hypothesis: Several Q2 2026 Chinese open-source models (Kimi K2.6, MiniMax M2.5) may be built on DeepSeek V4 base, raising the open-source baseline quality broadly
What changed from V3.2
- Substantially larger context window (1M vs ~128K in V3.x)
- Flash/Pro split offers better cost–quality choice
- Greater agentic capability demonstrated through OpenCode integration
- API costs remain very competitive with OpenRouter access
Pricing & access
- OpenRouter: available via standard OpenRouter API (V4 Flash and Pro)
- DeepSeek API: usage-based pricing with aggressive caching
- HuggingFace: open weights available for self-hosting
Sources:
- DEEPSEEK V4 + OPENCODE + SUPERPOWERS IS ABSOLUTELY INSANE — Income Stream Surfers, 2026-04-24
- DeepSeekV4 + Claude Code = 100X Cheaper — Jack Roberts, 2026-04-30