Kamil Józwik
LLM model logo

DeepSeek

DeepSeek is a family of AI models developed by Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd. The company prioritizes extreme cost efficiency through algorithmic innovation rather than massive compute, training frontier models at 10-20× lower cost than competitors. The family currently includes 8+ models spanning foundation models, reasoning models, specialized coding/math models, and OCR, with strengths in cost-performance ratio, long-context processing (128K tokens), and transparent reasoning capabilities.

Philosophy and Distinctive Approach

DeepSeek's core philosophy centers on achieving frontier-model quality through architectural efficiency rather than brute-force compute. The company trained DeepSeek-V3 (671B parameters) for approximately $5.5-6 million using constrained H800 GPUs under US export restrictions - driving innovations like fine-grained Mixture-of-Experts with only 5% parameter activation (37B of 671B active per token), Multi-head Latent Attention (MLA), and auxiliary-loss-free load balancing. The V3.2 release introduced DeepSeek Sparse Attention (DSA), reducing long-context inference costs by 50% through two-stage token selection.

Founded July 2023 by Liang Wenfeng (co-founder of High-Flyer Capital Management) and self-funded by the hedge fund, DeepSeek operates independently without external venture capital, focusing on long-term AGI research over immediate commercialization. This independence enables aggressive open-source releases under MIT License, with full commercial-use permissions and transparent research publications.

Key Family Strengths

When to Choose DeepSeek

Select DeepSeek for production applications requiring frontier-model quality at reduced cost, long-context processing up to 128K tokens, transparent reasoning workflows (R1 series), or on-premises deployment with full control. The family excels in code generation across 338 languages (Coder-v2), mathematical reasoning at AIME-level (Math/R1), agentic workflows with tool use, and document processing with optical compression (OCR).

Consider alternatives if you need vision capabilities beyond OCR (no multimodal chat models in current lineup), require guaranteed US-based data processing (DeepSeek routes through Chinese infrastructure), or need models optimized for edge/mobile deployment (flagship models require significant GPU resources).

Platform & Access

Official Platform: DeepSeek Platform at https://platform.deepseek.com provides API access with OpenAI-compatible endpoints, developer dashboard for key management, usage monitoring, and analytics. Web chat interface at https://chat.deepseek.com offers free unlimited access. Mobile apps available for iOS and Android.

Access Model: Dual access with both API and open weights. API available through DeepSeek Platform (OpenAI SDK-compatible), AWS Bedrock, Azure AI Foundry, Together.ai, and OpenRouter. Open weights released under MIT License on HuggingFace (https://huggingface.co/deepseek-ai) for V3.1, V3.2-Exp, R1-0528, Coder-v2, Math, and OCR models. Self-hosting fully supported with inference code, deployment guides, and compatibility with vLLM, SGLang, and TensorRT-LLM.

Pricing Model: Pay-as-you-go token-based pricing with no subscription required.

SDK Support: OpenAI Python SDK compatible (change base_url and api_key only). Native integrations: LangChain (ChatDeepSeek), LlamaIndex, Haystack. Framework support: Transformers, vLLM, SGLang, TensorRT-LLM, llama.cpp.

Foundation Models / Flagship LLMs

DeepSeek-V3.1

Primary Use Cases:

Agentic Capabilities:


DeepSeek-V3.2-Exp

Primary Use Cases:

Agentic Capabilities:


deepseek-chat (API Endpoint)

Primary Use Cases:

Agentic Capabilities:

Reasoning Models

DeepSeek-R1-0528

Primary Use Cases:

Agentic Capabilities:


deepseek-reasoner (API Endpoint)

Primary Use Cases:

Agentic Capabilities:

Specialized Models - Coding and Mathematics

DeepSeek-Coder-V2

Primary Use Cases:

Agentic Capabilities:


DeepSeek-Math

Primary Use Cases:

Agentic Capabilities:

Vision and Specialized Utility Models

DeepSeek-OCR

Primary Use Cases:

Agentic Capabilities:

Model Comparison Table

ModelContextParametersKnowledge CutoffAgentic UsageBest For
V3.2-Exp128K671B (37B active)July 2024⭐⭐⭐⭐⭐Ultra-long context at 50% lower cost
V3.1128K671B (37B active)July 2024⭐⭐⭐⭐⭐Dual-mode reasoning + fast inference
R1-0528128K in / 64K out671B (37B active)July 2024⭐⭐⭐⭐Complex multi-step reasoning with tool use
deepseek-reasoner64K671B (37B active)-⭐⭐⭐Transparent reasoning (no tool calling)
deepseek-chat128K671B (37B active)Mid-2024⭐⭐⭐⭐⭐General chat, cost-effective production
Coder-V2128K236B (21B active) or 16B (2.4B active)Early 2024⭐⭐⭐⭐338-language code generation
Math4K7B (dense)2023⭐⭐⭐Competition-level math with PoT
OCR64-400 vision tokens3.4B-Document processing, PDF to Markdown

Key Considerations

Model deprecation and versioning: DeepSeek updates models frequently. API endpoints like deepseek-chat automatically map to latest models (currently V3.2-Exp), while specific version identifiers (e.g., deepseek-chat-v3.1:thinking) lock to particular releases. Check https://api-docs.deepseek.com/news for version updates and migration notices. Open-weight models remain available indefinitely on HuggingFace even after API deprecation.

Function calling limitations: deepseek-reasoner does NOT support function calling - requests with tools parameter automatically fall back to deepseek-chat. Use R1-0528 or V3.1/V3.2-Exp for reasoning workflows requiring tool use. deepseek-reasoner also lacks support for temperature, top_p, presence_penalty, frequency_penalty, logprobs, and top_logprobs parameters.

Context window tradeoffs: V3.1/V3.2 offer 128K contexts while deepseek-reasoner provides 64K. R1-0528 supports 128K input but emphasizes 64K maximum output for reasoning chains. Math model limited to 4K context (designed for focused problem-solving). OCR uses vision tokens (64-400 depending on mode) rather than text tokens. Plan context allocation based on use case.

Multimodal gaps: Current deepseek-chat and deepseek-reasoner are text-only. For vision tasks, use DeepSeek-OCR (document/PDF processing). No audio, video, or image generation capabilities in current lineup. No multimodal chat models (vision + conversation) available yet.

Hardware requirements for self-hosting: Full 671B models require ~1,500GB RAM (2× parameters + 10% overhead). Recommended: 8× H100/H200 GPUs with tensor parallelism. Lite variants (Coder-V2-Lite 16B, Math 7B) fit on consumer hardware. Quantized GGUF versions available via community for budget setups. V3.2-Exp optimized for EP/DP mode: DP=8, EP=8, TP=1. OCR requires ≥16GB VRAM (single GPU).

Data residency and privacy: DeepSeek's privacy policy states personal data is sent to servers in China. Government/enterprise restrictions exist in Italy (nationwide ban), Australia (federal devices), Taiwan (government agencies), and multiple US states for official government use due to data privacy and national security concerns. Private individuals generally permitted access. For strict data residency requirements, use self-hosted open-weight deployments or third-party platforms (AWS Bedrock, Azure AI Foundry).

Rate limits and quotas: DeepSeek API documentation does not specify public rate limits for paid tiers (as of November 2025). Free web interface (chat.deepseek.com) has no disclosed usage limits. Third-party platforms (OpenRouter, Together.ai, AWS Bedrock) implement their own rate limiting - consult provider documentation. For high-throughput production, consider self-hosting or enterprise agreements.

Safety and content filtering: Models include content filtering for harmful outputs. Safety training may affect sensitive use cases (medical, legal, controversial topics). For applications requiring unfiltered outputs, use base models (V3.1-Base, V3.2-Exp-Base, Coder-V2-Base) rather than instruct-tuned variants, though base models require careful prompting and may produce inconsistent formats.

Integration with existing workflows: DeepSeek API is OpenAI SDK-compatible - migration from GPT models requires only changing base_url and api_key. LangChain supports native ChatDeepSeek integration. For structured outputs and function calling, use the same schemas as OpenAI (minor compatibility differences may exist; test thoroughly). Context caching automatically optimizes repeated prefixes with no code changes. Streaming responses supported for both standard and reasoning modes.

Cost optimization strategies: Enable context caching for repeated queries (90% discount on cached inputs). Use non-thinking mode (deepseek-chat) for simple tasks to avoid reasoning token overhead. For very high volume, self-host open-weight models to eliminate per-token costs entirely. Select appropriate model size: Coder-V2-Lite-16B and Math-7B offer 85-95% of full-model quality at dramatically lower inference costs. V3.2-Exp reduces long-context costs 50% vs V3.1 through sparse attention.

Resources

Official Documentation:

Model Weights and Code:

Technical Papers:

Third-Party Platform Access: