Gemma is a family of open-weight AI models developed by Google DeepMind. Built from the same research foundation as Gemini, these models prioritize efficient deployment and responsible AI while maintaining competitive performance. The family currently includes 30+ model variants spanning foundation models, edge/mobile optimization, vision-language, code generation, medical/therapeutic AI, and safety classification, with strengths in on-device deployment, multilingual support, and specialized domain applications. With over 200 million downloads, the "Gemmaverse" represents one of AI's most active open-source ecosystems.
Family Overview
Developer: Google DeepMind builds and maintains Gemma models using the same research foundation as Gemini. The family spans variants from 270 million to 27 billion parameters across multiple specializations.
Philosophy: Gemma prioritizes efficient, responsible AI deployment with complete developer control. Unlike closed APIs, these models provide open weights for download, modification, and local deployment without internet connectivity. The family balances cutting-edge capabilities with practical constraints: Gemma 3 27B matches GPT-4-class performance on many tasks while fitting on consumer GPUs. Architectural innovations like Quantization-Aware Training (QAT) reduce memory 3x while preserving quality, and mobile-first designs like Gemma 3n achieve 60 FPS video processing on smartphones.
Key strengths emerge from specialized architectures and training approaches. Gemma 3 processes 128K token contexts across 140+ languages with multimodal vision capabilities. Mobile optimization through Per-Layer Embedding caching enables real-time on-device inference. Domain specialists like MedGemma, CodeGemma, and TxGemma apply transfer learning to healthcare, programming, and therapeutic development. The DataGemma variants ground responses in external statistical databases to reduce hallucinations.
When to choose Gemma: Select this family for local/edge deployment, data privacy requirements, multilingual applications, or specialized domains (medical, code, therapeutics). The open-weight model with custom commercial license suits developers needing control over inference infrastructure. Consider alternatives for consumer-facing factual Q&A - Google explicitly warns these models target developers and researchers, not end-users, and may produce inaccurate or hallucinated content.
The Gemmaverse names the global ecosystem of 60,000+ community fine-tuned variants. Notable examples include AI Singapore's SEA-LION v3 (11 Southeast Asian languages), INSAIT's BgGPT (Bulgarian-first), and Nexa AI's OmniAudio (edge audio-language). This community demonstrates Gemma's adaptability across languages and domains.
Platform & Access
Google AI Studio provides the primary platform - a free interface for prompt testing, API key generation, and model evaluation. Unfortunately, Gemma is not present there anymore. For enterprise needs, Vertex AI offers managed deployment with MLOps capabilities.
Access model: Dual-path access allows both self-hosted and cloud-hosted deployment. Download open weights from Kaggle, Hugging Face, or Ollama for self-hosted deployment with zero API costs. Alternatively, use the Gemini API for cloud-hosted inference with token-based pricing.
Pricing model: Free tier provides 5 requests/minute, 25 requests/day, 32K tokens/minute. Paid tier (requires Cloud Billing) dramatically increases limits. Gemma models via API appear free based on community reports, with only rate limits as constraints. Open-weight self-hosting eliminates all usage fees. New Google Cloud users receive $300 credits; academic researchers can access up to $10,000 through the Gemma 3 Academic Program.
Framework support includes Keras, JAX, PyTorch, TensorFlow, Hugging Face Transformers, Ollama, vLLM, and mobile deployment via Google AI Edge.
Foundation Models
Gemma 3 27B
Parameters: 27 billion
Context Window: 128,000 tokens
Multimodal: Text + images (896×896px, 256 tokens per image)
License: Gemma Terms of Use (custom commercial license)
Visual data analysis combining OCR, charts, and document layout understanding
Agentic Capabilities:
Tool Use / Function Calling: Yes, via prompt engineering - define functions in JSON schema and parse structured outputs
Structured Output: JSON mode through prompting with strong instruction-following
Notable Features: Ranks with GPT-4 on Chatbot Arena despite 10x fewer parameters; runs on single consumer GPU with quantization; trained on 14 trillion tokens
Task-specific fine-tuning for classification (sentiment analysis, entity extraction, query routing)
Unstructured-to-structured text processing at scale
High-volume production systems requiring extreme efficiency
Rapid prototyping with fast fine-tuning on T4 GPUs
Agentic Capabilities:
Tool Use / Function Calling: Limited (text-only, not designed for function calling)
Structured Output: Excellent - designed for text structuring with strong instruction following
Notable Features: Most power-efficient Gemma (0.75% battery for 25 conversations on Pixel 9 Pro); 6 trillion training tokens; 256K vocabulary; not for general conversation
Edge & Mobile Optimized
Gemma 3n E4B
Parameters: ~4B effective (8B total with parameter skipping)
Context Window: 32,000 tokens
Multimodal: Text + images + audio + video (multiple resolutions: 256×256, 512×512, 768×768)
Laptop inference with limited VRAM (12B on RTX 4060 8GB)
High-volume inference with cost constraints
Local AI prioritizing privacy with hardware limitations
Agentic Capabilities:
Tool Use / Function Calling: Same as base models
Structured Output: Same as base models
Notable Features: Quantization-Aware Training reduces memory 3x vs BF16; INT4/Q4_0 formats; 54% less perplexity drop vs post-training quantization; VRAM: 1B ~892MB, 4B ~3.4GB, 12B ~8.7GB, 27B ~21GB; native Ollama support
Notable Features: State-of-the-art OCR on HierText; 94.8% exact match on MolScribe; 99.18 S-TEDS on PubTables-1M; 3 resolutions for efficiency/detail trade-offs; not a chatbot - designed for single-round image+text input
Medical image classification for research and triage support
Clinical decision support systems requiring multimodal context
Electronic Health Record (FHIR) analysis and generation
Agentic Capabilities:
Tool Use / Function Calling: Yes - integrates with agentic systems, FHIR generators/interpreters
Structured Output: Structured medical reports; chat template with system/user/assistant roles
Notable Features: SigLIP encoder pre-trained on de-identified medical data; FHIR-based EHR training; test-time scaling for improved reasoning; not clinical-grade - requires developer validation
MedGemma 27B Text-Only
Parameters: 27 billion (instruction-tuned only)
Context Window: 128,000 tokens (output 8,192)
Multimodal: Text-only
License: Health AI Developer Foundations Terms of Use (gated access)
Notable Features: Trained on 7M examples from Therapeutics Data Commons (66 tasks); outperforms Tx-LLM on 64/66 tasks; achieves state-of-the-art on Chemistry/Biology benchmarks with Agentic-Tx
TxGemma 27B-Predict
Parameters: 27 billion
Context Window: 8,192 tokens
Multimodal: Text-only (molecular structures as text)
License: Health AI Developer Foundations Terms of Use (gated access)
Retrieval Augmented Generation (RAG) pipelines for local AI systems
Semantic search across 100+ languages
Privacy-sensitive applications requiring offline embeddings (personal file search, mobile apps)
Code retrieval and question answering systems
Agentic Capabilities:
Tool Use / Function Calling: Not applicable (embedding model)
Structured Output: 768-dimensional vectors (truncatable to 512, 256, 128 via Matryoshka Representation Learning)
Notable Features: Highest-ranking open multilingual embedding model under 500M parameters on MTEB; sub-15ms inference on EdgeTPU; under 200MB RAM with quantization; shares tokenizer with Gemma 3n
Text-only content moderation for LLM inputs and outputs
Custom safety policy enforcement at inference time
Hate speech and harassment detection
Content filtering for applications with specific safety requirements
Agentic Capabilities:
Tool Use / Function Calling: Not applicable
Structured Output: Probability scores with optional rationale
Notable Features: 4 categories (sexually explicit, dangerous content, hate speech, harassment); LLM-as-a-judge technique; fine-tunable for custom policies
Model Comparison Table
Model
Parameters
Context
Multimodal
Knowledge Cutoff
Tool Use
Best For
Foundation Models
Gemma 3 27B
27B
128K
Text + Images
Aug 2024
⭐⭐⭐⭐
Complex reasoning, long context, multilingual
Gemma 3 12B
12B
128K
Text + Images
Aug 2024
⭐⭐⭐⭐
Balanced multimodal, production
Gemma 3 4B
4B
128K
Text + Images
Aug 2024
⭐⭐⭐
Real-time multimodal, mid-range GPUs
Gemma 3 1B
1B
32K
Text
Aug 2024
⭐⭐
Lightweight text, mobile
Gemma 3 270M
270M
32K
Text
Aug 2024
⭐
Task-specific fine-tuning
Edge/Mobile
Gemma 3n E4B
~4B
32K
Text + Image + Audio + Video
Aug 2024
⭐⭐⭐⭐
Real-time video (60 FPS), mobile
Gemma 3n E2B
~1.91B
32K
Text + Image + Audio + Video
Aug 2024
⭐⭐⭐
Ultra-efficient mobile
Vision
PaliGemma 2 28B
28B
512-4K
Text + Images
N/A
⭐⭐
Document analysis, OCR, research
PaliGemma 2 10B
10B
512-4K
Text + Images
N/A
⭐⭐
Balanced vision tasks
PaliGemma 2 3B
3B
256-4K
Text + Images
N/A
⭐
Lightweight vision
Code
CodeGemma 7B-IT
7B
8K
Text/Code
N/A
⭐
Conversational code generation
CodeGemma 7B
7B
8K
Text/Code
N/A
N/A
Code completion
CodeGemma 2B
2B
8K
Text/Code
N/A
N/A
Fast code infilling (FIM)
Medical
MedGemma 27B MM
27B
128K
Text + Medical Images
N/A
⭐⭐⭐⭐
Radiology, clinical decision support
TxGemma 27B-Chat
27B
8K
Text (molecules)
N/A
⭐⭐⭐⭐⭐
Drug discovery with reasoning
TxGemma 27B-Predict
27B
8K
Text (molecules)
N/A
⭐⭐⭐⭐
High-accuracy therapeutic predictions
Specialized
DataGemma-RAG-27B
27B
Up to 348K
Text
N/A
⭐⭐⭐⭐
Statistical Q&A (research only)
EmbeddingGemma
308M
2K
Text
N/A
N/A
RAG pipelines, semantic search
ShieldGemma 2 4B
4B
N/A
Image + Text
N/A
N/A
Multimodal safety classification
Key Considerations
Content quality limitations: All Gemma models may produce inaccurate, offensive, or hallucinated content. Google explicitly warns against relying on LLMs for medical, legal, financial, or professional advice. These models target developers and researchers, not consumer-facing factual Q&A - review outputs before publication.
Licensing restrictions: The custom "Gemma Terms of Use" differs significantly from standard open-source licenses. Key concerns include remote shutdown clauses (Google can restrict usage remotely for violations), prohibited use policy enforcement on downstream users, required updates to latest versions (impacts production stability), and derivative work restrictions. Some enterprises consider this license unsuitable for certain commercial scenarios - have legal teams review for specific use cases.
Hardware requirements vary dramatically. Gemma 3 270M runs on smartphones with minimal battery drain (0.75% for 25 conversations). Gemma 3 27B requires 60GB BF16 or 15GB with INT4 quantization - fits on consumer GPUs like RTX 3090 using QAT models. VRAM: 1B QAT ~892MB, 4B QAT ~3.4GB, 12B QAT ~8.7GB, 27B QAT ~21GB. Fine-tuning demands significantly higher resources than inference.
Specialized model considerations: MedGemma and TxGemma require Health AI Developer Foundations Terms with gated access and are not clinical-grade - they need developer validation before medical use. DataGemma remains research/academic only, not production-ready. CodeGemma optimizes for completion over conversation (use 7B-IT for chat). PaliGemma 2 is not a chatbot - it handles single-round image+text input only.
Multimodal constraints: Only Gemma 3 4B/12B/27B, Gemma 3n, PaliGemma 2, MedGemma multimodal, and ShieldGemma 2 support images. Fixed resolutions (896×896 for Gemma 3, multiple for PaliGemma 2) may challenge non-square aspect ratios. Gemma 3 270M and 1B remain text-only.
Language support: Gemma 3 supports 140+ languages, but evaluation occurred only with English prompts - quality may vary significantly across languages. Earlier Gemma 1/2 focused primarily on English.
Data privacy: Free tier data may improve Google products per standard service terms. Paid tier data does NOT improve Google products. Self-hosted open weights provide complete data control - ideal for privacy-critical applications.
Platform stability: In October 2024, Gemma temporarily disappeared from AI Studio after producing problematic content, highlighting risks of relying solely on cloud-hosted versions. Save local copies for critical projects.