Nov 7, 2025

GPT

GPT (Generative Pre-trained Transformer) is a family of AI models developed by OpenAI. The family prioritizes iterative deployment with extensive safety measures, releasing systems through careful real-world testing and alignment research. The GPT-5 generation launched in August 2025, representing a unified approach spanning ultra-lightweight models to extended reasoning systems with context windows up to 400,000 tokens.

The family currently includes foundation reasoning models, lightweight variants, specialized coding models, open-weight models, and speech-to-speech systems. Key differentiators include state-of-the-art performance across reasoning and multimodal tasks, comprehensive model range addressing cost-performance trade-offs, native multimodal support, extended context capabilities, and production-ready infrastructure with extensive tooling.

OpenAI's first open-weight models since GPT-2 (the gpt-oss series under Apache 2.0 license) provide on-premises deployment options alongside API-only flagship models. The family excels at complex reasoning, code generation, long-context understanding, and real-time audio interactions, backed by enterprise-grade APIs and strong built-in safety systems.

Platform & Access

Platform Name: OpenAI Platform

Platform URL: https://platform.openai.com

What It Offers: API keys for programmatic access, interactive Playground for testing, model management with version control, fine-tuning capabilities, embeddings API, free Moderation API, real-time usage dashboards, Assistants API for agent-like experiences, Batch API (50% discount for large volumes), and tools including web search, code interpreter, function calling, and structured outputs.

Access Model:

API-only: Most models (GPT-5 family, gpt-realtime series) available exclusively through API
Open weights: gpt-oss-120b and gpt-oss-20b under Apache 2.0 license via Hugging Face and GitHub
Open source: Whisper under MIT License with full model weights

Pricing Model: Token-based pay-as-you-go pricing charged per million tokens (separate input/output rates). Audio models use per-token or per-minute pricing. No free tier after initial credits expire. Usage limits scale through tiered system based on spending.

Foundation Reasoning Models

GPT-5

Parameters: Undisclosed
Context Window: 400,000 tokens (272,000 input + 128,000 output including reasoning)
Multimodal: Text, vision (images - PNG, JPEG, WebP, GIF up to 50MB)
License: Proprietary (API access)
Knowledge Cutoff: September 30, 2024
Link: https://platform.openai.com/docs/models/gpt-5

Primary Use Cases:

Complex software engineering and debugging large repositories (74.9% on SWE-bench Verified)
Multi-step agentic tasks requiring sequential tool orchestration
Healthcare question answering requiring nuanced medical reasoning (46.2% on HealthBench Hard)
Mathematical reasoning and competition-level problem solving (94.6% on AIME 2025)

Agentic Capabilities:

Tool Use / Function Calling: Excellent - Enhanced tool intelligence with sequential and parallel calls, 96.7% on τ²-bench, custom tools feature for plaintext (not just JSON), improved error handling
Structured Output: JSON mode, Context-Free Grammars for format enforcement, improved schema validation
Notable Features: Real-time router auto-switches between fast and reasoning modes, "GPT-5 thinking" mode, extended chain-of-thought reasoning, verbosity parameter (low/medium/high), reasoning_effort parameter (minimal/low/medium/high), 90% cache discount

GPT-5 Pro

Parameters: Undisclosed
Context Window: Extended reasoning with parallel compute (no separate limit specified)
Multimodal: Text, vision
License: Proprietary (ChatGPT subscription only)
Knowledge Cutoff: September 30, 2024 (presumed)
Link: https://platform.openai.com/docs/models/gpt-5-pro

Primary Use Cases:

Mission-critical enterprise applications requiring maximum reliability
Drug discovery and complex scientific problems needing extended deliberation
High-stakes financial modeling and risk assessment with 22% fewer major errors
Comprehensive legal document analysis and contract review

Agentic Capabilities:

Tool Use / Function Calling: Enhanced - Only supports reasoning_effort: high (default), extended thinking time
Structured Output: Full support, highest accuracy among GPT-5 variants (88.4% on GPQA)
Notable Features: Parallel test-time compute for maximum reliability, extended reasoning mode (thinks longer than standard GPT-5), 22% fewer major errors than GPT-5 thinking mode

Lightweight Models

GPT-5 mini

Parameters: Undisclosed
Context Window: 400,000 tokens
Multimodal: Text, vision (images)
License: Proprietary (API access)
Knowledge Cutoff: May 30, 2024
Link: https://platform.openai.com/docs/models/gpt-5-mini

Primary Use Cases:

High-volume cost-sensitive applications requiring basic reasoning
Real-time experiences where reasoning matters but cost is constrained
Moderate complexity coding tasks without full codebase context
Fallback model when usage limits reached on primary models

Agentic Capabilities:

Tool Use / Function Calling: Yes - Same capabilities as GPT-5 with reduced performance, supports reasoning levels (minimal to high)
Structured Output: JSON mode and structured outputs, higher hallucination rates than GPT-5 (trade-off for speed)
Notable Features: Used as fallback in ChatGPT when limits reached, supports verbosity and reasoning_effort parameters, suitable for shorter simpler agentic tasks

GPT-5 nano

Parameters: Undisclosed
Context Window: 400,000 tokens (generates up to 128,000 tokens)
Multimodal: Text, vision (images)
License: Proprietary (API access)
Knowledge Cutoff: May 30, 2024
Link: https://platform.openai.com/docs/models/gpt-5-nano

Primary Use Cases:

Mobile applications and edge devices requiring minimal latency
High-volume latency-sensitive deployments where milliseconds matter
Embedded systems with resource constraints
Cost-optimized high-throughput scenarios with simple classification tasks

Agentic Capabilities:

Tool Use / Function Calling: Basic support - Lower performance than mini/main variants in complex scenarios
Structured Output: Supported, highest hallucination rate among GPT-5 variants (7.3% on FActScore)
Notable Features: Fastest variant with minimal latency, retains key instruction-following and safety features, suitable for lightweight agentic workflows, limited reasoning depth

Specialized Coding Models

GPT-5-Codex

Parameters: Undisclosed
Context Window: 400,000 tokens
Multimodal: Text, vision (for code review and UI generation)
License: Proprietary (API access)
Knowledge Cutoff: Based on GPT-5 (not separately specified)
Link: https://platform.openai.com/docs/models/gpt-5-codex

Primary Use Cases:

Agentic coding in CLI and IDE extensions requiring autonomous operation for 7+ hours
Large-scale code refactoring across entire repositories (51.3% vs 33.9% for GPT-5)
Conducting code reviews with reduced incorrect comments (4.4% vs 13.7% for GPT-5)
Building full projects from scratch with iterative planning → implementation → validation loop

Agentic Capabilities:

Tool Use / Function Calling: Excellent - Optimized for developer tools, excels at chaining tool calls for complex coding workflows, better at handling tool errors than base GPT-5
Structured Output: Full support, specialized for code generation and structured development workflows, follows AGENTS.md instructions
Notable Features: Adaptive reasoning dynamically adjusts thinking time based on task complexity, works autonomously for 7+ hours, "less is more" prompting philosophy requiring minimal instructions, excels at understanding large codebases

Open-Weight Reasoning Models

gpt-oss-120b

Parameters: 117B total (5.1B active per token, Mixture-of-Experts)
Context Window: 128,000 tokens
Multimodal: Text only
License: Apache 2.0
Knowledge Cutoff: June 2024
Link: https://huggingface.co/openai/gpt-oss-120b | https://github.com/openai/gpt-oss

Primary Use Cases:

Production general-purpose applications requiring on-premises deployment
Competition coding and mathematics (Codeforces, AIME) requiring complex reasoning
Fine-tuning on single H100 node for domain-specific applications
Agentic tool use workflows requiring web browsing and Python execution (TauBench)

Agentic Capabilities:

Tool Use / Function Calling: Excellent - Native support for web browsing (search, open, find), Python code execution (stateful Jupyter), arbitrary developer-defined functions
Structured Output: Fully supported
Notable Features: 36 layers, 128 MoE experts (Top-4 routing), native MXFP4 quantization, fits on single 80GB GPU (H100 or MI300X), configurable reasoning levels (low/medium/high), full chain-of-thought access, requires Harmony response format, achieves near-parity with o4-mini

gpt-oss-20b

Parameters: 21B total (3.6B active per token, Mixture-of-Experts)
Context Window: 128,000 tokens
Multimodal: Text only
License: Apache 2.0
Knowledge Cutoff: June 2024
Link: https://huggingface.co/openai/gpt-oss-20b | https://github.com/openai/gpt-oss

Primary Use Cases:

Local inference and on-device deployment on consumer hardware
Consumer hardware deployment (Snapdragon devices, Apple Silicon)
Rapid iteration without costly infrastructure
Edge use cases requiring running within 16GB memory constraints

Agentic Capabilities:

Tool Use / Function Calling: Yes - Native support for web browsing, Python code execution, arbitrary developer-defined functions
Structured Output: Fully supported
Notable Features: 24 layers, 32 MoE experts (Top-4 routing), native MXFP4 quantization, runs within 16GB memory, configurable reasoning levels, requires Harmony response format, delivers results similar to o3-mini, available through Hugging Face Transformers, vLLM, Ollama, LM Studio

gpt-oss-safeguard-20b + gpt-oss-safeguard-120b

gpt-oss-safeguard is a new set of open source models built for flexible security classification. The models come in two sizes, 120b and 20b, and are available under the Apache 2.0 license for anyone to use and modify. Unlike traditional classifiers that need to be retrained whenever safety rules change, these models can interpret policies in real time, according to OpenAI. This lets organizations update their rules instantly, without retraining the model.

The models are designed to be more transparent as well. Developers can see exactly how the models make decisions, making it easier to understand and audit how security is enforced. gpt-oss-safeguard is based on OpenAI's gpt-oss open source model and is part of a larger collaboration with ROOST, an open source platform focused on building tools and infrastructure for AI safety, security, and governance.

Real-Time Audio Models

gpt-realtime

Parameters: Undisclosed
Context Window: 32,000 tokens (max output: 4,096 tokens per request)
Multimodal: Audio input (native speech-to-speech, PCM16), audio output (native synthesis), text input/output, image input
License: Proprietary (API access)
Knowledge Cutoff: October 2023
Link: https://platform.openai.com/docs/models/gpt-realtime

Primary Use Cases:

Customer support voice agents with real-time tool calling and asynchronous function execution
Phone calling via SIP (Session Initiation Protocol) integration
Interactive Voice Response (IVR) telephony systems
Real-time translation with multilingual switching mid-sentence

Agentic Capabilities:

Tool Use / Function Calling: Excellent - Enhanced precision (66.5% on ComplexFuncBench vs 49.7% previous), asynchronous function calling (long-running calls don't disrupt conversation), MCP Server Support for remote tool integration
Structured Output: Supported
Notable Features: 10 voices including Cedar and Marin, non-verbal cue detection (laughter, pauses), tone adaptation, average 0.81s to first audio, WebSocket/WebRTC/SIP support, server-side VAD with configurable sensitivity, semantic VAD, response interruption support, prompt caching (90-95% cost reduction), 30-minute max session

gpt-realtime-mini

Parameters: Undisclosed
Context Window: 32,000 tokens
Multimodal: Audio input (native speech-to-speech), audio output, text input/output, image input
License: Proprietary (API access)
Knowledge Cutoff: October 2023
Link: https://platform.openai.com/docs/models/gpt-realtime-mini

Primary Use Cases:

High-volume customer support requiring cost-effective basic voice interactions
Educational applications at scale with voice-driven interfaces
Prototyping voice agents before scaling to full gpt-realtime
Voice-enabled chatbots where cost matters more than maximum reasoning

Agentic Capabilities:

Tool Use / Function Calling: Function calling removed in current version, MCP Server Support available
Structured Output: Not explicitly documented
Notable Features: Updated voice models (Cedar, Marin), non-verbal cue detection, multilingual support, 0.81s to first audio (improved from 1.27s), WebSocket/WebRTC/SIP support, server-side VAD, semantic VAD, prompt caching, can pair with GPT-4.1 for hybrid architectures

Speech Recognition

Whisper

Parameters: 1,550M (large), 809M (turbo), 769M (medium), 244M (small), 74M (base), 39M (tiny)
Context Window: Processes 30-second audio chunks
Multimodal: Audio input → Text output (transcription/translation)
License: MIT License (open source)
Knowledge Cutoff: N/A (pre-trained model)
Link: https://github.com/openai/whisper | https://platform.openai.com/docs/models/whisper-1

Primary Use Cases:

Accessibility tools requiring real-time captions for hearing impaired users
Medical transcription requiring high accuracy on specialized terminology
Call center transcription and analysis for customer service insights
Content indexing and searchable audio archives for media companies

Agentic Capabilities:

Tool Use / Function Calling: N/A - Specialized speech recognition model
Structured Output: Output formats include text, JSON, VTT, SRT, TSV
Notable Features: Supports 99 languages for transcription and translation to English, transformer encoder-decoder architecture, trained on 680,000 hours (large-v3: 1M hours + 4M pseudo-labeled), ~92% average accuracy (8.06% WER), language identification and voice activity detection built-in, available as open weights AND via API (whisper-1)

Model Comparison Table

These are just the latest and most relevant models. See all available models in the OpenAI Model Index.

Model	Context	Parameters	Knowledge Cutoff	Agentic Usage	Best For
GPT-5	400K	Undisclosed	Sept 2024	⭐⭐⭐⭐⭐	Complex reasoning, long context
GPT-5 Pro	400K+	Undisclosed	Sept 2024	⭐⭐⭐⭐⭐	Mission-critical, extended reasoning
GPT-5 mini	400K	Undisclosed	May 2024	⭐⭐⭐	Cost-sensitive, moderate complexity
GPT-5 nano	400K	Undisclosed	May 2024	⭐⭐	High-volume, latency-sensitive
GPT-5-Codex	400K	Undisclosed	Based on GPT-5	⭐⭐⭐⭐⭐	Autonomous coding, large refactoring
gpt-oss-120b	128K	117B (5.1B act.)	June 2024	⭐⭐⭐⭐	On-premises, production deployment
gpt-oss-20b	128K	21B (3.6B act.)	June 2024	⭐⭐⭐	Edge/local, consumer hardware
gpt-realtime	32K	Undisclosed	Oct 2023	⭐⭐⭐⭐	Real-time voice, complex tool use
gpt-realtime-mi	32K	Undisclosed	Oct 2023	⭐⭐	Cost-effective voice interactions
Whisper large	30s	1,550M	N/A	N/A	High-accuracy transcription, 99 langs

Key Considerations

GPT-5 Pro Not API-Accessible: Available only through ChatGPT Pro subscription ($200/month), not via standard API. Use GPT-5 with high reasoning_effort for API-based extended reasoning.

Tool Definition Size Limitations: GPT-5 family has known issue where large tool/function definitions exceeding 300K tokens cause failures even within 1M context limit. Keep tool definitions concise.

gpt-oss Harmony Format Required: Both open-weight models require custom Harmony response format with special tokens for correct operation. Not compatible with standard chat templates.

Rate Limits Multi-Dimensional: Limits measured across RPM, RPD, TPM, TPD, IPM, and batch queue limits. Can hit any dimension first. Limits automatically increase through usage tiers based on spending.

Whisper Limitations: 30-second processing window, potential hallucinations due to weak supervision training, turbo model does NOT support translation tasks (only multilingual variants support translation).

Regional Availability: Available in 161 countries. Not available in 20+ countries due to legal/regulatory compliance and U.S. sanctions.

Resources

Official Documentation: https://platform.openai.com/docs/models
Platform/Console: https://platform.openai.com
API Reference: https://platform.openai.com/docs/api-reference
Developer Cookbook: https://cookbook.openai.com
Deprecations: https://platform.openai.com/docs/deprecations