GPT (Generative Pre-trained Transformer) is a family of AI models developed by OpenAI. The family prioritizes iterative deployment with extensive safety measures, releasing systems through careful real-world testing and alignment research. The GPT-5 generation launched in August 2025, representing a unified approach spanning ultra-lightweight models to extended reasoning systems with context windows up to 400,000 tokens.
The family currently includes foundation reasoning models, lightweight variants, specialized coding models, open-weight models, and speech-to-speech systems. Key differentiators include state-of-the-art performance across reasoning and multimodal tasks, comprehensive model range addressing cost-performance trade-offs, native multimodal support, extended context capabilities, and production-ready infrastructure with extensive tooling.
OpenAI's first open-weight models since GPT-2 (the gpt-oss series under Apache 2.0 license) provide on-premises deployment options alongside API-only flagship models. The family excels at complex reasoning, code generation, long-context understanding, and real-time audio interactions, backed by enterprise-grade APIs and strong built-in safety systems.
What It Offers: API keys for programmatic access, interactive Playground for testing, model management with version control, fine-tuning capabilities, embeddings API, free Moderation API, real-time usage dashboards, Assistants API for agent-like experiences, Batch API (50% discount for large volumes), and tools including web search, code interpreter, function calling, and structured outputs.
Access Model:
API-only: Most models (GPT-5 family, gpt-realtime series) available exclusively through API
Open weights: gpt-oss-120b and gpt-oss-20b under Apache 2.0 license via Hugging Face and GitHub
Open source: Whisper under MIT License with full model weights
Pricing Model: Token-based pay-as-you-go pricing charged per million tokens (separate input/output rates). Audio models use per-token or per-minute pricing. No free tier after initial credits expire. Usage limits scale through tiered system based on spending.
Healthcare question answering requiring nuanced medical reasoning (46.2% on HealthBench Hard)
Mathematical reasoning and competition-level problem solving (94.6% on AIME 2025)
Agentic Capabilities:
Tool Use / Function Calling: Excellent - Enhanced tool intelligence with sequential and parallel calls, 96.7% on τ²-bench, custom tools feature for plaintext (not just JSON), improved error handling
Structured Output: JSON mode, Context-Free Grammars for format enforcement, improved schema validation
Notable Features: Real-time router auto-switches between fast and reasoning modes, "GPT-5 thinking" mode, extended chain-of-thought reasoning, verbosity parameter (low/medium/high), reasoning_effort parameter (minimal/low/medium/high), 90% cache discount
GPT-5 Pro
Parameters: Undisclosed
Context Window: Extended reasoning with parallel compute (no separate limit specified)
Mission-critical enterprise applications requiring maximum reliability
Drug discovery and complex scientific problems needing extended deliberation
High-stakes financial modeling and risk assessment with 22% fewer major errors
Comprehensive legal document analysis and contract review
Agentic Capabilities:
Tool Use / Function Calling: Enhanced - Only supports reasoning_effort: high (default), extended thinking time
Structured Output: Full support, highest accuracy among GPT-5 variants (88.4% on GPQA)
Notable Features: Parallel test-time compute for maximum reliability, extended reasoning mode (thinks longer than standard GPT-5), 22% fewer major errors than GPT-5 thinking mode
Real-time experiences where reasoning matters but cost is constrained
Moderate complexity coding tasks without full codebase context
Fallback model when usage limits reached on primary models
Agentic Capabilities:
Tool Use / Function Calling: Yes - Same capabilities as GPT-5 with reduced performance, supports reasoning levels (minimal to high)
Structured Output: JSON mode and structured outputs, higher hallucination rates than GPT-5 (trade-off for speed)
Notable Features: Used as fallback in ChatGPT when limits reached, supports verbosity and reasoning_effort parameters, suitable for shorter simpler agentic tasks
GPT-5 nano
Parameters: Undisclosed
Context Window: 400,000 tokens (generates up to 128,000 tokens)
Agentic coding in CLI and IDE extensions requiring autonomous operation for 7+ hours
Large-scale code refactoring across entire repositories (51.3% vs 33.9% for GPT-5)
Conducting code reviews with reduced incorrect comments (4.4% vs 13.7% for GPT-5)
Building full projects from scratch with iterative planning → implementation → validation loop
Agentic Capabilities:
Tool Use / Function Calling: Excellent - Optimized for developer tools, excels at chaining tool calls for complex coding workflows, better at handling tool errors than base GPT-5
Structured Output: Full support, specialized for code generation and structured development workflows, follows AGENTS.md instructions
Notable Features: Adaptive reasoning dynamically adjusts thinking time based on task complexity, works autonomously for 7+ hours, "less is more" prompting philosophy requiring minimal instructions, excels at understanding large codebases
Open-Weight Reasoning Models
gpt-oss-120b
Parameters: 117B total (5.1B active per token, Mixture-of-Experts)
Production general-purpose applications requiring on-premises deployment
Competition coding and mathematics (Codeforces, AIME) requiring complex reasoning
Fine-tuning on single H100 node for domain-specific applications
Agentic tool use workflows requiring web browsing and Python execution (TauBench)
Agentic Capabilities:
Tool Use / Function Calling: Excellent - Native support for web browsing (search, open, find), Python code execution (stateful Jupyter), arbitrary developer-defined functions
Structured Output: Fully supported
Notable Features: 36 layers, 128 MoE experts (Top-4 routing), native MXFP4 quantization, fits on single 80GB GPU (H100 or MI300X), configurable reasoning levels (low/medium/high), full chain-of-thought access, requires Harmony response format, achieves near-parity with o4-mini
gpt-oss-20b
Parameters: 21B total (3.6B active per token, Mixture-of-Experts)
Local inference and on-device deployment on consumer hardware
Consumer hardware deployment (Snapdragon devices, Apple Silicon)
Rapid iteration without costly infrastructure
Edge use cases requiring running within 16GB memory constraints
Agentic Capabilities:
Tool Use / Function Calling: Yes - Native support for web browsing, Python code execution, arbitrary developer-defined functions
Structured Output: Fully supported
Notable Features: 24 layers, 32 MoE experts (Top-4 routing), native MXFP4 quantization, runs within 16GB memory, configurable reasoning levels, requires Harmony response format, delivers results similar to o3-mini, available through Hugging Face Transformers, vLLM, Ollama, LM Studio
gpt-oss-safeguard-20b + gpt-oss-safeguard-120b
gpt-oss-safeguard is a new set of open source models built for flexible security classification. The models come in two sizes, 120b and 20b, and are available under the Apache 2.0 license for anyone to use and modify. Unlike traditional classifiers that need to be retrained whenever safety rules change, these models can interpret policies in real time, according to OpenAI. This lets organizations update their rules instantly, without retraining the model.
The models are designed to be more transparent as well. Developers can see exactly how the models make decisions, making it easier to understand and audit how security is enforced. gpt-oss-safeguard is based on OpenAI's gpt-oss open source model and is part of a larger collaboration with ROOST, an open source platform focused on building tools and infrastructure for AI safety, security, and governance.
Real-Time Audio Models
gpt-realtime
Parameters: Undisclosed
Context Window: 32,000 tokens (max output: 4,096 tokens per request)
Customer support voice agents with real-time tool calling and asynchronous function execution
Phone calling via SIP (Session Initiation Protocol) integration
Interactive Voice Response (IVR) telephony systems
Real-time translation with multilingual switching mid-sentence
Agentic Capabilities:
Tool Use / Function Calling: Excellent - Enhanced precision (66.5% on ComplexFuncBench vs 49.7% previous), asynchronous function calling (long-running calls don't disrupt conversation), MCP Server Support for remote tool integration
Structured Output: Supported
Notable Features: 10 voices including Cedar and Marin, non-verbal cue detection (laughter, pauses), tone adaptation, average 0.81s to first audio, WebSocket/WebRTC/SIP support, server-side VAD with configurable sensitivity, semantic VAD, response interruption support, prompt caching (90-95% cost reduction), 30-minute max session
Accessibility tools requiring real-time captions for hearing impaired users
Medical transcription requiring high accuracy on specialized terminology
Call center transcription and analysis for customer service insights
Content indexing and searchable audio archives for media companies
Agentic Capabilities:
Tool Use / Function Calling: N/A - Specialized speech recognition model
Structured Output: Output formats include text, JSON, VTT, SRT, TSV
Notable Features: Supports 99 languages for transcription and translation to English, transformer encoder-decoder architecture, trained on 680,000 hours (large-v3: 1M hours + 4M pseudo-labeled), ~92% average accuracy (8.06% WER), language identification and voice activity detection built-in, available as open weights AND via API (whisper-1)
Model Comparison Table
These are just the latest and most relevant models. See all available models in the OpenAI Model Index.
Model
Context
Parameters
Knowledge Cutoff
Agentic Usage
Best For
GPT-5
400K
Undisclosed
Sept 2024
⭐⭐⭐⭐⭐
Complex reasoning, long context
GPT-5 Pro
400K+
Undisclosed
Sept 2024
⭐⭐⭐⭐⭐
Mission-critical, extended reasoning
GPT-5 mini
400K
Undisclosed
May 2024
⭐⭐⭐
Cost-sensitive, moderate complexity
GPT-5 nano
400K
Undisclosed
May 2024
⭐⭐
High-volume, latency-sensitive
GPT-5-Codex
400K
Undisclosed
Based on GPT-5
⭐⭐⭐⭐⭐
Autonomous coding, large refactoring
gpt-oss-120b
128K
117B (5.1B act.)
June 2024
⭐⭐⭐⭐
On-premises, production deployment
gpt-oss-20b
128K
21B (3.6B act.)
June 2024
⭐⭐⭐
Edge/local, consumer hardware
gpt-realtime
32K
Undisclosed
Oct 2023
⭐⭐⭐⭐
Real-time voice, complex tool use
gpt-realtime-mi
32K
Undisclosed
Oct 2023
⭐⭐
Cost-effective voice interactions
Whisper large
30s
1,550M
N/A
N/A
High-accuracy transcription, 99 langs
Key Considerations
GPT-5 Pro Not API-Accessible: Available only through ChatGPT Pro subscription ($200/month), not via standard API. Use GPT-5 with high reasoning_effort for API-based extended reasoning.
Tool Definition Size Limitations: GPT-5 family has known issue where large tool/function definitions exceeding 300K tokens cause failures even within 1M context limit. Keep tool definitions concise.
gpt-oss Harmony Format Required: Both open-weight models require custom Harmony response format with special tokens for correct operation. Not compatible with standard chat templates.
Rate Limits Multi-Dimensional: Limits measured across RPM, RPD, TPM, TPD, IPM, and batch queue limits. Can hit any dimension first. Limits automatically increase through usage tiers based on spending.
Whisper Limitations: 30-second processing window, potential hallucinations due to weak supervision training, turbo model does NOT support translation tasks (only multilingual variants support translation).
Regional Availability: Available in 161 countries. Not available in 20+ countries due to legal/regulatory compliance and U.S. sanctions.