DeepSeek is a family of AI models developed by Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd. The company prioritizes extreme cost efficiency through algorithmic innovation rather than massive compute, training frontier models at 10-20× lower cost than competitors. The family currently includes 8+ models spanning foundation models, reasoning models, specialized coding/math models, and OCR, with strengths in cost-performance ratio, long-context processing (128K tokens), and transparent reasoning capabilities.
Philosophy and Distinctive Approach
DeepSeek's core philosophy centers on achieving frontier-model quality through architectural efficiency rather than brute-force compute. The company trained DeepSeek-V3 (671B parameters) for approximately $5.5-6 million using constrained H800 GPUs under US export restrictions - driving innovations like fine-grained Mixture-of-Experts with only 5% parameter activation (37B of 671B active per token), Multi-head Latent Attention (MLA), and auxiliary-loss-free load balancing. The V3.2 release introduced DeepSeek Sparse Attention (DSA), reducing long-context inference costs by 50% through two-stage token selection.
Founded July 2023 by Liang Wenfeng (co-founder of High-Flyer Capital Management) and self-funded by the hedge fund, DeepSeek operates independently without external venture capital, focusing on long-term AGI research over immediate commercialization. This independence enables aggressive open-source releases under MIT License, with full commercial-use permissions and transparent research publications.
Key Family Strengths
Superior cost-performance: Matches frontier models (GPT-4, Claude 3.5) at 10-30× lower inference costs
Advanced MoE architecture: 256 experts with only 8 active per token, minimizing knowledge redundancy
Long-context capability: 128K context windows across flagship models with efficient sparse attention
Transparent reasoning: R1 series provides visible chain-of-thought for complex problem-solving
Full commercial licensing: MIT License with open weights and self-hosting support
Bilingual support: Native English and Chinese language understanding
High throughput: >50K tokens/second on 8×H800 GPUs, 200K+ pages/day for OCR on single GPU
When to Choose DeepSeek
Select DeepSeek for production applications requiring frontier-model quality at reduced cost, long-context processing up to 128K tokens, transparent reasoning workflows (R1 series), or on-premises deployment with full control. The family excels in code generation across 338 languages (Coder-v2), mathematical reasoning at AIME-level (Math/R1), agentic workflows with tool use, and document processing with optical compression (OCR).
Consider alternatives if you need vision capabilities beyond OCR (no multimodal chat models in current lineup), require guaranteed US-based data processing (DeepSeek routes through Chinese infrastructure), or need models optimized for edge/mobile deployment (flagship models require significant GPU resources).
Platform & Access
Official Platform: DeepSeek Platform at https://platform.deepseek.com provides API access with OpenAI-compatible endpoints, developer dashboard for key management, usage monitoring, and analytics. Web chat interface at https://chat.deepseek.com offers free unlimited access. Mobile apps available for iOS and Android.
Access Model: Dual access with both API and open weights. API available through DeepSeek Platform (OpenAI SDK-compatible), AWS Bedrock, Azure AI Foundry, Together.ai, and OpenRouter. Open weights released under MIT License on HuggingFace (https://huggingface.co/deepseek-ai) for V3.1, V3.2-Exp, R1-0528, Coder-v2, Math, and OCR models. Self-hosting fully supported with inference code, deployment guides, and compatibility with vLLM, SGLang, and TensorRT-LLM.
Pricing Model: Pay-as-you-go token-based pricing with no subscription required.
Tool Use / Function Calling: Excellent - Enhanced support for up to 128 functions per request, parallel tool calls, code agents with execution/debugging, search agents with multi-turn calling
Structured Output: JSON mode via response_format={'type': 'json_object'}, guided generation through prompt engineering
Notable Features: Dual-mode architecture switchable between thinking (chain-of-thought) and non-thinking (fast direct response) modes via chat template. Terminal-bench: 31.3. UE8M0 FP8 training format. Released August 21, 2025. Open weights ✓ + API access.
DeepSeek-V3.2-Exp
Parameters: 671B total, 37B activated per token (MoE)
Notable Features: First fine-grained sparse attention mechanism (DeepSeek Sparse Attention/DSA) reducing complexity from O(L²) to O(Lk). Two-stage mechanism: lightning indexer + fine-grained selection. Performance parity with V3.1 across benchmarks. API pricing 50%+ lower. Released September 29, 2025. Open weights ✓ (Base and Instruct variants) + API access.
deepseek-chat (API Endpoint)
Parameters: 671B total, 37B activated per token (currently maps to V3.2-Exp)
General conversational AI for chatbots and virtual assistants with cost efficiency
Customer support automation requiring tool/function integration
Short-to-medium output tasks with concise responses (default 4K output limit)
High-volume production use cases where per-token cost drives economics
Standard reasoning tasks like code generation (Codeforces 2121 rating), content generation, translation
Agentic Capabilities:
Tool Use / Function Calling: Excellent - Full OpenAI-compatible function calling with JSON schema validation, strict mode, multi-turn tool conversations
Structured Output: JSON mode via function calling strict mode, guided generation with required field validation
Notable Features: Standard chat API endpoint pointing to latest model (V3.2-Exp as of Sep 2025). Supports thinking/non-thinking modes via chat template. Context caching with discounted rates. Benchmarks: MMLU-Pro 85.0%, AIME 2025 89.3%, Codeforces 2121, LiveCodeBench 74.1%. Default 4K output, max 8K. Open weights ✓ (underlying V3.2-Exp) + API access.
Reasoning Models
DeepSeek-R1-0528
Parameters: 671B total, 37B activated per token (MoE)
Context Window: 128K tokens input, 64K tokens max output
Multi-step mathematical problem decomposition at AIME 2025 level (87.5% pass@1)
Complex competitive programming at Codeforces Div1 level (rating 1930)
Advanced code review and refactoring with comprehensive analysis (57.6% SWE-Verified, 71.6% Aider-Polyglot)
Scientific reasoning and hypothesis validation (81.0% GPQA-Diamond pass@1)
Strategic planning and complex task delegation with transparent reasoning chains
Multi-stage code analysis over large codebases requiring deep understanding
Agentic Capabilities:
Tool Use / Function Calling: Yes - Function calling newly added in 0528 update (enhanced vs previous R1 versions)
Structured Output: JSON mode (newly added), function calling with structured parameters, output format specification
Notable Features: Generates explicit reasoning tokens (average 23K tokens per complex problem). Self-verification of intermediate results, reflection on reasoning process, multi-step planning. System prompt support added in 0528. Auto-thinking mode (no longer requires forcing <think> tag). Maximum output: 64K tokens. Released May 28, 2025. Open weights ✓ + API access through DeepSeek Platform, Together AI, Azure AI Foundry, OpenRouter.
deepseek-reasoner (API Endpoint)
Parameters: 671B total, 37B activated per token (based on DeepSeek-R1)
Context Window: 64K tokens (default max: 32K, maximum max: 64K)
Advanced mathematical problem-solving with multi-step proofs and deductive reasoning
Complex code generation and debugging for full applications with comprehensive test coverage
Scientific reasoning and research including hypothesis validation and analysis
Legal document analysis and contract review requiring careful multi-step evaluation
Strategic problem decomposition into solvable sub-problems with explicit reasoning
Educational tutoring showing work for learning purposes with visible reasoning steps
Agentic Capabilities:
Tool Use / Function Calling: No - Function calling explicitly NOT supported (requests with tools parameter automatically fall back to deepseek-chat)
Structured Output: JSON mode supported, Chat Prefix Completion (Beta), limited guided generation (no function calling schema enforcement)
Notable Features: Outputs separate reasoning_content field with full thinking process. Transparent reasoning with both CoT and final answer accessible. Self-verification and reflection. Multi-turn conversations retain only final answers (CoT not concatenated). Trained purely via reinforcement learning without initial SFT. 27× cheaper than OpenAI o1 ($0.55 vs $15 input, $2.19 vs $60 output). Released January 20, 2025. API-only (original R1 weights available open-source). Unsupported parameters: temperature, top_p, presence_penalty, frequency_penalty, logprobs, top_logprobs.
Specialized Models - Coding and Mathematics
DeepSeek-Coder-V2
Parameters: 236B total (21B active) for full model, 16B total (2.4B active) for Lite variant (both MoE)
Context Window: 128K tokens
Multimodal: Text-only (specialized for code and mathematical text)
License: MIT License + DeepSeek Model License (commercial use permitted)
Algorithm implementation with performance optimization (73.7% Aider benchmark)
Agentic Capabilities:
Tool Use / Function Calling: Yes - Supported through API platform, can write and execute programs for problem-solving
Structured Output: Fill-in-Middle (FIM) with Prefix-Suffix-Middle (PSM) format achieving 86.4% mean score, Repository-level code completion, JSON mode via API
Notable Features: Trained on 10.2 trillion tokens (60% source code, 10% math, 30% natural language). 6T additional tokens beyond DeepSeek-V2 base. Supports 338 languages (full list at GitHub). Released June 2024. Open weights ✓ (Base and Instruct for both Lite-16B and Full-236B) + API access. Official chat: https://coder.deepseek.com. Local deployment requires 80GB×8 GPUs for BF16 full model.
DeepSeek-Math
Parameters: 7B (dense, non-MoE architecture)
Context Window: 4K tokens
Multimodal: Text-only (specialized for mathematical text, equations, LaTeX)
License: MIT License + DeepSeek Model License (commercial use permitted)
Step-by-step solutions across algebra, calculus, geometry, number theory, combinatorics, linear algebra, statistics
Competition mathematics at AMC, AIME, IMO, Putnam levels with formal proofs
Word problems and real-world math applications requiring semantic understanding (51.7% MATH benchmark)
Symbolic proofs with logical steps and formal notation
Formula derivation from first principles with intermediate steps
Program-of-Thought solving via Python code execution (60% MATH with tool use, 82.9% GSM8K)
Agentic Capabilities:
Tool Use / Function Calling: Yes - Strong Program-of-Thought (PoT) capabilities, can solve math problems by writing and executing Python code with calculators and symbolic solvers
Structured Output: Chain-of-Thought with step-by-step reasoning, boxed answers in \boxed{} notation, LaTeX output, Program-of-Thought intermixing code with reasoning
Notable Features: Initialized from DeepSeek-Coder-v1.5 7B with continued pre-training on 500B tokens (120B math-related). Three variants: Base, Instruct, RL (reinforcement learning). Outperforms Minerva 540B on competition problems. Released February 2024. Bilingual English/Chinese. Open weights ✓ for all variants. Fits on consumer GPUs. Compatible with Transformers, vLLM, llama.cpp.
Vision and Specialized Utility Models
DeepSeek-OCR
Parameters: ~3.4B total (380M encoder: 80M SAM-base + 300M CLIP-large; 3B decoder: DeepSeek3B-MoE-A570M with 6 of 64 experts active)
Context Window: Multiple vision token modes from 64 tokens (512×512 Tiny) to 400 tokens (1280×1280 Large), dynamic tiling with Gundam modes
Multimodal: Vision + Text (images up to 1280×1280, PDFs, screenshots, forms, tables, handwritten text, charts, formulas, diagrams, multi-page documents)
Document OCR and conversion including PDF to Markdown with layout preservation
Structured data extraction from tables, forms, invoices with field recognition
Formula recognition for mathematical and chemical equations in research papers
Vision-text compression achieving 7-20× token reduction with 97% precision at <10× compression for LLM context optimization
Production document parsing pipelines generating 200K+ pages/day on A100-40G
Multi-page document processing with automatic region splitting and adaptive tiling
OCR for noisy, low-quality, or handwritten text requiring robust recognition
Agentic Capabilities:
Tool Use / Function Calling: No
Structured Output: Yes - Markdown output with preserved structure (headings, tables, lists, inline formatting), grounding tags for spatial awareness (<|ref|>xxx<|/ref|>)
Notable Features: Two-stage encoder-decoder with DeepEncoder (SAM-base for local + CLIP-large for global) and 16× convolutional compressor. Context Optical Compression via 2D visual mapping with tiered compression mimicking memory fade. Outperforms GOT-OCR2.0 using only 100 tokens vs 256, and MinerU2.0 using <800 tokens vs 6000+ average. Released October 20, 2025. Open weights ✓ (self-hosted only, no API). Performance: ~2,500 tokens/sec on A100-40G. Requirements: ≥16GB VRAM, CUDA 11.8+, PyTorch 2.6.0, Flash Attention 2.7.3. Framework support: vLLM, HuggingFace Transformers.
Model Comparison Table
Model
Context
Parameters
Knowledge Cutoff
Agentic Usage
Best For
V3.2-Exp
128K
671B (37B active)
July 2024
⭐⭐⭐⭐⭐
Ultra-long context at 50% lower cost
V3.1
128K
671B (37B active)
July 2024
⭐⭐⭐⭐⭐
Dual-mode reasoning + fast inference
R1-0528
128K in / 64K out
671B (37B active)
July 2024
⭐⭐⭐⭐
Complex multi-step reasoning with tool use
deepseek-reasoner
64K
671B (37B active)
-
⭐⭐⭐
Transparent reasoning (no tool calling)
deepseek-chat
128K
671B (37B active)
Mid-2024
⭐⭐⭐⭐⭐
General chat, cost-effective production
Coder-V2
128K
236B (21B active) or 16B (2.4B active)
Early 2024
⭐⭐⭐⭐
338-language code generation
Math
4K
7B (dense)
2023
⭐⭐⭐
Competition-level math with PoT
OCR
64-400 vision tokens
3.4B
-
❌
Document processing, PDF to Markdown
Key Considerations
Model deprecation and versioning: DeepSeek updates models frequently. API endpoints like deepseek-chat automatically map to latest models (currently V3.2-Exp), while specific version identifiers (e.g., deepseek-chat-v3.1:thinking) lock to particular releases. Check https://api-docs.deepseek.com/news for version updates and migration notices. Open-weight models remain available indefinitely on HuggingFace even after API deprecation.
Function calling limitations:deepseek-reasoner does NOT support function calling - requests with tools parameter automatically fall back to deepseek-chat. Use R1-0528 or V3.1/V3.2-Exp for reasoning workflows requiring tool use. deepseek-reasoner also lacks support for temperature, top_p, presence_penalty, frequency_penalty, logprobs, and top_logprobs parameters.
Context window tradeoffs: V3.1/V3.2 offer 128K contexts while deepseek-reasoner provides 64K. R1-0528 supports 128K input but emphasizes 64K maximum output for reasoning chains. Math model limited to 4K context (designed for focused problem-solving). OCR uses vision tokens (64-400 depending on mode) rather than text tokens. Plan context allocation based on use case.
Multimodal gaps: Current deepseek-chat and deepseek-reasoner are text-only. For vision tasks, use DeepSeek-OCR (document/PDF processing). No audio, video, or image generation capabilities in current lineup. No multimodal chat models (vision + conversation) available yet.
Hardware requirements for self-hosting: Full 671B models require ~1,500GB RAM (2× parameters + 10% overhead). Recommended: 8× H100/H200 GPUs with tensor parallelism. Lite variants (Coder-V2-Lite 16B, Math 7B) fit on consumer hardware. Quantized GGUF versions available via community for budget setups. V3.2-Exp optimized for EP/DP mode: DP=8, EP=8, TP=1. OCR requires ≥16GB VRAM (single GPU).
Data residency and privacy: DeepSeek's privacy policy states personal data is sent to servers in China. Government/enterprise restrictions exist in Italy (nationwide ban), Australia (federal devices), Taiwan (government agencies), and multiple US states for official government use due to data privacy and national security concerns. Private individuals generally permitted access. For strict data residency requirements, use self-hosted open-weight deployments or third-party platforms (AWS Bedrock, Azure AI Foundry).
Rate limits and quotas: DeepSeek API documentation does not specify public rate limits for paid tiers (as of November 2025). Free web interface (chat.deepseek.com) has no disclosed usage limits. Third-party platforms (OpenRouter, Together.ai, AWS Bedrock) implement their own rate limiting - consult provider documentation. For high-throughput production, consider self-hosting or enterprise agreements.
Safety and content filtering: Models include content filtering for harmful outputs. Safety training may affect sensitive use cases (medical, legal, controversial topics). For applications requiring unfiltered outputs, use base models (V3.1-Base, V3.2-Exp-Base, Coder-V2-Base) rather than instruct-tuned variants, though base models require careful prompting and may produce inconsistent formats.
Integration with existing workflows: DeepSeek API is OpenAI SDK-compatible - migration from GPT models requires only changing base_url and api_key. LangChain supports native ChatDeepSeek integration. For structured outputs and function calling, use the same schemas as OpenAI (minor compatibility differences may exist; test thoroughly). Context caching automatically optimizes repeated prefixes with no code changes. Streaming responses supported for both standard and reasoning modes.
Cost optimization strategies: Enable context caching for repeated queries (90% discount on cached inputs). Use non-thinking mode (deepseek-chat) for simple tasks to avoid reasoning token overhead. For very high volume, self-host open-weight models to eliminate per-token costs entirely. Select appropriate model size: Coder-V2-Lite-16B and Math-7B offer 85-95% of full-model quality at dramatically lower inference costs. V3.2-Exp reduces long-context costs 50% vs V3.1 through sparse attention.