Mar 25, 2025

Gemma

Gemma is a family of open-weight AI models developed by Google DeepMind. Built from the same research foundation as Gemini, these models prioritize efficient deployment and responsible AI while maintaining competitive performance. The family currently includes 30+ model variants spanning foundation models, edge/mobile optimization, vision-language, code generation, medical/therapeutic AI, and safety classification, with strengths in on-device deployment, multilingual support, and specialized domain applications. With over 200 million downloads, the "Gemmaverse" represents one of AI's most active open-source ecosystems.

Family Overview

Developer: Google DeepMind builds and maintains Gemma models using the same research foundation as Gemini. The family spans variants from 270 million to 27 billion parameters across multiple specializations.

Philosophy: Gemma prioritizes efficient, responsible AI deployment with complete developer control. Unlike closed APIs, these models provide open weights for download, modification, and local deployment without internet connectivity. The family balances cutting-edge capabilities with practical constraints: Gemma 3 27B matches GPT-4-class performance on many tasks while fitting on consumer GPUs. Architectural innovations like Quantization-Aware Training (QAT) reduce memory 3x while preserving quality, and mobile-first designs like Gemma 3n achieve 60 FPS video processing on smartphones.

Key strengths emerge from specialized architectures and training approaches. Gemma 3 processes 128K token contexts across 140+ languages with multimodal vision capabilities. Mobile optimization through Per-Layer Embedding caching enables real-time on-device inference. Domain specialists like MedGemma, CodeGemma, and TxGemma apply transfer learning to healthcare, programming, and therapeutic development. The DataGemma variants ground responses in external statistical databases to reduce hallucinations.

When to choose Gemma: Select this family for local/edge deployment, data privacy requirements, multilingual applications, or specialized domains (medical, code, therapeutics). The open-weight model with custom commercial license suits developers needing control over inference infrastructure. Consider alternatives for consumer-facing factual Q&A - Google explicitly warns these models target developers and researchers, not end-users, and may produce inaccurate or hallucinated content.

The Gemmaverse names the global ecosystem of 60,000+ community fine-tuned variants. Notable examples include AI Singapore's SEA-LION v3 (11 Southeast Asian languages), INSAIT's BgGPT (Bulgarian-first), and Nexa AI's OmniAudio (edge audio-language). This community demonstrates Gemma's adaptability across languages and domains.

Platform & Access

Google AI Studio provides the primary platform - a free interface for prompt testing, API key generation, and model evaluation. Unfortunately, Gemma is not present there anymore. For enterprise needs, Vertex AI offers managed deployment with MLOps capabilities.

Access model: Dual-path access allows both self-hosted and cloud-hosted deployment. Download open weights from Kaggle, Hugging Face, or Ollama for self-hosted deployment with zero API costs. Alternatively, use the Gemini API for cloud-hosted inference with token-based pricing.

Pricing model: Free tier provides 5 requests/minute, 25 requests/day, 32K tokens/minute. Paid tier (requires Cloud Billing) dramatically increases limits. Gemma models via API appear free based on community reports, with only rate limits as constraints. Open-weight self-hosting eliminates all usage fees. New Google Cloud users receive $300 credits; academic researchers can access up to $10,000 through the Gemma 3 Academic Program.

Framework support includes Keras, JAX, PyTorch, TensorFlow, Hugging Face Transformers, Ollama, vLLM, and mobile deployment via Google AI Edge.

Foundation Models

Gemma 3 27B

Parameters: 27 billion
Context Window: 128,000 tokens
Multimodal: Text + images (896×896px, 256 tokens per image)
License: Gemma Terms of Use (custom commercial license)
Knowledge Cutoff: August 2024
Link: https://ai.google.dev/gemma/docs/core/model_card_3

Primary Use Cases:

Processing multi-document analysis with cross-referencing (legal briefs, research compilations)
Multilingual content understanding across 140+ languages in single context
Complex mathematical reasoning requiring extended chain-of-thought
Visual data analysis combining OCR, charts, and document layout understanding

Agentic Capabilities:

Tool Use / Function Calling: Yes, via prompt engineering - define functions in JSON schema and parse structured outputs
Structured Output: JSON mode through prompting with strong instruction-following
Notable Features: Ranks with GPT-4 on Chatbot Arena despite 10x fewer parameters; runs on single consumer GPU with quantization; trained on 14 trillion tokens

Gemma 3 12B

Parameters: 12 billion
Context Window: 128,000 tokens
Multimodal: Text + images (896×896px, 256 tokens)
License: Gemma Terms of Use
Knowledge Cutoff: August 2024
Link: https://ai.google.dev/gemma/docs/core/model_card_3

Primary Use Cases:

Production deployments requiring multimodal with balanced latency
Function calling workflows where response time matters
Real-time document processing with visual elements
Multilingual customer support requiring image context

Agentic Capabilities:

Tool Use / Function Calling: Yes (prompt-based, recommended for balanced performance/speed)
Structured Output: JSON via prompting
Notable Features: Optimal balance for production; 12 trillion training tokens

Gemma 3 4B

Parameters: 4 billion
Context Window: 128,000 tokens
Multimodal: Text + images (896×896px, 256 tokens)
License: Gemma Terms of Use
Knowledge Cutoff: August 2024
Link: https://ai.google.dev/gemma/docs/core/model_card_3

Primary Use Cases:

Real-time low-latency multimodal applications (chatbots, visual Q&A)
Resource-constrained environments requiring vision capabilities
High-throughput inference at scale with image inputs
Multilingual applications on mid-range GPUs

Agentic Capabilities:

Tool Use / Function Calling: Yes (prompt engineering)
Structured Output: JSON mode
Notable Features: Entry point to multimodal Gemma; 4 trillion training tokens; runs on mid-range GPUs

Gemma 3 1B

Parameters: 1 billion
Context Window: 32,000 tokens
Multimodal: Text-only
License: Gemma Terms of Use
Knowledge Cutoff: August 2024
Link: https://ai.google.dev/gemma/docs/core/model_card_3

Primary Use Cases:

Small-footprint general text tasks on constrained hardware
Mobile deployment requiring 32K context
Lightweight assistants for specific domains after fine-tuning
Edge inference where multimodal is unnecessary

Agentic Capabilities:

Tool Use / Function Calling: Yes (prompt-based)
Structured Output: JSON via prompting
Notable Features: 2 trillion training tokens; text-only alternative to 4B

Gemma 3 270M

Parameters: 270 million (170M embedding + 100M transformer)
Context Window: 32,000 tokens
Multimodal: Text-only
License: Gemma Terms of Use
Knowledge Cutoff: August 2024
Link: https://developers.googleblog.com/en/introducing-gemma-3-270m/

Primary Use Cases:

Task-specific fine-tuning for classification (sentiment analysis, entity extraction, query routing)
Unstructured-to-structured text processing at scale
High-volume production systems requiring extreme efficiency
Rapid prototyping with fast fine-tuning on T4 GPUs

Agentic Capabilities:

Tool Use / Function Calling: Limited (text-only, not designed for function calling)
Structured Output: Excellent - designed for text structuring with strong instruction following
Notable Features: Most power-efficient Gemma (0.75% battery for 25 conversations on Pixel 9 Pro); 6 trillion training tokens; 256K vocabulary; not for general conversation

Edge & Mobile Optimized

Gemma 3n E4B

Parameters: ~4B effective (8B total with parameter skipping)
Context Window: 32,000 tokens
Multimodal: Text + images + audio + video (multiple resolutions: 256×256, 512×512, 768×768)
License: Gemma Terms of Use
Link: https://ai.google.dev/gemma/docs/gemma-3n/model_card

Primary Use Cases:

Real-time video analysis at 60 FPS on mobile devices
On-device multimodal AI assistants without cloud connectivity
Privacy-first applications processing sensitive visual/audio data locally
Robotics with multimodal understanding and real-time constraints

Agentic Capabilities:

Tool Use / Function Calling: Yes (optimized for on-device agents)
Structured Output: JSON through prompting
Notable Features: MatFormer architecture with nested E2B model; Per-Layer Embedding caching reduces memory; 2x prefill improvement vs Gemma 3 4B; Universal Speech Model encoder (6 tokens/second audio); MobileNet-V5-300M vision encoder

Gemma 3n E2B

Parameters: ~1.91B effective (5B+ total)
Context Window: 32,000 tokens
Multimodal: Text + images + audio + video
License: Gemma Terms of Use
Link: https://ai.google.dev/gemma/docs/gemma-3n/model_card

Primary Use Cases:

Ultra-efficient mobile AI with maximum battery preservation
Nested deployment within E4B for flexible resource usage
Low-power IoT devices requiring multimodal understanding
Edge devices with strict memory constraints

Agentic Capabilities:

Tool Use / Function Calling: Yes (on-device agents)
Structured Output: JSON via prompting
Notable Features: Shares architecture with next-gen Gemini Nano; conditional parameter loading skips unused modalities

Quantized Optimization

Gemma 3 QAT (1B, 4B, 12B, 27B)

Parameters: Available for 1B, 4B, 12B, 27B sizes
Context Window: 32K for 1B; 128K for 4B/12B/27B
Multimodal: Same as base models
License: Gemma Terms of Use
Link: https://developers.googleblog.com/en/gemma-3-quantized-aware-trained-state-of-the-art-ai-to-consumer-gpus/

Primary Use Cases:

Consumer GPU deployment (RTX 3090 runs 27B model)
Laptop inference with limited VRAM (12B on RTX 4060 8GB)
High-volume inference with cost constraints
Local AI prioritizing privacy with hardware limitations

Agentic Capabilities:

Tool Use / Function Calling: Same as base models
Structured Output: Same as base models
Notable Features: Quantization-Aware Training reduces memory 3x vs BF16; INT4/Q4_0 formats; 54% less perplexity drop vs post-training quantization; VRAM: 1B ~892MB, 4B ~3.4GB, 12B ~8.7GB, 27B ~21GB; native Ollama support

Vision & Multimodal

PaliGemma 2 28B

Parameters: 28 billion (27B Gemma 2 + 400M SigLIP vision encoder)
Context Window: 512-4,096 tokens (text); 256-4,096 image tokens (resolution-dependent)
Multimodal: Text + images (224×224, 448×448, 896×896px); video frames; output text-only
License: Gemma Terms of Use
Link: https://ai.google.dev/gemma/docs/paligemma/model-card-2

Primary Use Cases:

Document analysis requiring OCR, table extraction, and layout understanding
Scientific research with molecular recognition or medical imaging
Automated captioning for accessibility (DOCCI fine-tuned variants generate 7+ sentence descriptions)
E-commerce product detection and spatial reasoning tasks

Agentic Capabilities:

Tool Use / Function Calling: No explicit support (fine-tune for structured outputs)
Structured Output: Bounding boxes (4 location tokens 0-1023 scale), segmentation masks (16 tokens decoded by VQ-VAE)
Notable Features: State-of-the-art OCR on HierText; 94.8% exact match on MolScribe; 99.18 S-TEDS on PubTables-1M; 3 resolutions for efficiency/detail trade-offs; not a chatbot - designed for single-round image+text input

PaliGemma 2 10B

Parameters: 10 billion (9B Gemma 2 + 400M SigLIP)
Context Window: 512-4,096 tokens
Multimodal: Text + images (multiple resolutions)
License: Gemma Terms of Use
Link: https://ai.google.dev/gemma/docs/paligemma/model-card-2

Primary Use Cases:

Same as 28B with better latency for production deployments
Visual Q&A systems requiring fast response times
Real-time document processing pipelines
Batch processing of images at scale

Agentic Capabilities:

Tool Use / Function Calling: No explicit support
Structured Output: Bounding boxes, segmentation masks
Notable Features: Optimal balance for vision tasks; DOCCI 448px variant available

PaliGemma 2 3B

Parameters: 3 billion (2B Gemma 2 + 400M SigLIP)
Context Window: 256-4,096 tokens
Multimodal: Text + images (multiple resolutions)
License: Gemma Terms of Use
Link: https://ai.google.dev/gemma/docs/paligemma/model-card-2

Primary Use Cases:

Lightweight vision-language applications with tight resource constraints
Mobile vision applications requiring on-device inference
High-throughput image analysis at maximum efficiency
Edge deployment for visual understanding tasks

Agentic Capabilities:

Tool Use / Function Calling: No explicit support
Structured Output: Bounding boxes, segmentation masks
Notable Features: DOCCI 448px variant available; fastest inference in family

Code Specialists

CodeGemma 7B-IT

Parameters: 7 billion (instruction-tuned)
Context Window: 8,192 tokens
Multimodal: Text/code only
License: Gemma Terms of Use
Link: https://ai.google.dev/gemma/docs/codegemma/model_card

Primary Use Cases:

Conversational code generation from natural language descriptions
Multi-language programming support (Python, JavaScript, Java, Kotlin, C++, C#, Rust, Go)
Mathematical reasoning in code contexts
Code education with explanatory reasoning

Agentic Capabilities:

Tool Use / Function Calling: Limited (not designed for function calling)
Structured Output: Standard Gemma chat template with turn delimiters
Notable Features: Version 1.1 achieves 60.4% HumanEval; trained on 500-1000B tokens of code/math; provides reasoning explanations

CodeGemma 7B

Parameters: 7 billion (pretrained)
Context Window: 8,192 tokens
Multimodal: Text/code only
License: Gemma Terms of Use
Link: https://ai.google.dev/gemma/docs/codegemma/model_card

Primary Use Cases:

Code completion at cursor position in IDEs
Multi-language code generation with repository context
Autocomplete systems requiring fast inference
Code synthesis from partial implementations

Agentic Capabilities:

Tool Use / Function Calling: No
Structured Output: No (completion model)
Notable Features: Optimized for code completion; repository-level context support

CodeGemma 2B

Parameters: 2 billion
Context Window: 8,192 tokens
Multimodal: Text/code only
License: Gemma Terms of Use
Link: https://ai.google.dev/gemma/docs/codegemma/model_card

Primary Use Cases:

Fill-in-the-Middle (FIM) code completion in IDEs
Low-latency single-line and multi-line code infilling (up to 2x faster than 7B)
High-volume code completion systems
Lightweight IDE integration with minimal resource usage

Agentic Capabilities:

Tool Use / Function Calling: No
Structured Output: Special FIM tokens (<|fim_prefix|>, <|fim_suffix|>, <|fim_middle|>, <|file_separator|>)
Notable Features: Version 1.1 achieves 37.8% HumanEval; 80-90% FIM training; syntactically natural splitting; repository-level understanding

Medical & Therapeutics

MedGemma 27B Multimodal

Parameters: 27 billion
Context Window: 128,000 tokens (output 8,192)
Multimodal: Text + medical images (896×896px, 256 tokens); chest X-rays, dermatology, ophthalmology, histopathology
License: Health AI Developer Foundations Terms of Use (gated access)
Link: https://developers.google.com/health-ai-developer-foundations/medgemma/model-card

Primary Use Cases:

Radiology report generation from medical imaging
Medical image classification for research and triage support
Clinical decision support systems requiring multimodal context
Electronic Health Record (FHIR) analysis and generation

Agentic Capabilities:

Tool Use / Function Calling: Yes - integrates with agentic systems, FHIR generators/interpreters
Structured Output: Structured medical reports; chat template with system/user/assistant roles
Notable Features: SigLIP encoder pre-trained on de-identified medical data; FHIR-based EHR training; test-time scaling for improved reasoning; not clinical-grade - requires developer validation

MedGemma 27B Text-Only

Parameters: 27 billion (instruction-tuned only)
Context Window: 128,000 tokens (output 8,192)
Multimodal: Text-only
License: Health AI Developer Foundations Terms of Use (gated access)
Link: https://developers.google.com/health-ai-developer-foundations/medgemma/model-card

Primary Use Cases:

Medical question answering for research and education
Clinical document summarization and analysis
Medical knowledge retrieval from literature
Patient interviewing support systems

Agentic Capabilities:

Tool Use / Function Calling: Yes
Structured Output: Medical report generation
Notable Features: Based on Gemma 3 with medical fine-tuning; requires clinical validation before deployment

MedGemma 4B Multimodal

Parameters: 4 billion (pretrained and instruction-tuned variants)
Context Window: 128,000 tokens (output 8,192)
Multimodal: Text + medical images
License: Health AI Developer Foundations Terms of Use (gated access)
Link: https://developers.google.com/health-ai-developer-foundations/medgemma/model-card

Primary Use Cases:

Edge medical applications requiring on-device processing
Privacy-preserving local medical AI systems
Resource-constrained clinical environments
Medical education tools with multimodal support

Agentic Capabilities:

Tool Use / Function Calling: Yes
Structured Output: Medical reports
Notable Features: Pretrained (-pt) supports specific report generation prompts; instruction-tuned (-it) for conversational use

TxGemma 27B-Chat

Parameters: 27 billion
Context Window: 8,192 tokens
Multimodal: Text-only (processes SMILES, amino acid sequences, nucleotide sequences)
License: Health AI Developer Foundations Terms of Use (gated access)
Link: https://developers.google.com/health-ai-developer-foundations/txgemma/model-card

Primary Use Cases:

Drug property prediction (toxicity, BBB permeability, lipophilicity) with reasoning explanations
Drug-target interaction analysis for lead optimization
Clinical trial outcome prediction with interpretable rationale
Conversational drug discovery requiring multi-step reasoning

Agentic Capabilities:

Tool Use / Function Calling: Excellent - integrates with Agentic-Tx system (18 specialized tools), works with Gemini 2.0 Pro for complex reasoning
Structured Output: TDC prompt structure (instructions, context, question, input, answer); classification (A/B/C), regression (numeric), generation (molecules)
Notable Features: Trained on 7M examples from Therapeutics Data Commons (66 tasks); outperforms Tx-LLM on 64/66 tasks; achieves state-of-the-art on Chemistry/Biology benchmarks with Agentic-Tx

TxGemma 27B-Predict

Parameters: 27 billion
Context Window: 8,192 tokens
Multimodal: Text-only (molecular structures as text)
License: Health AI Developer Foundations Terms of Use (gated access)
Link: https://developers.google.com/health-ai-developer-foundations/txgemma/model-card

Primary Use Cases:

High-accuracy therapeutic predictions without conversational overhead
Batch molecular property screening at scale
Integration as prediction tool in larger agentic workflows
Retrosynthesis and molecular classification tasks

Agentic Capabilities:

Tool Use / Function Calling: Yes (as tool in agentic systems)
Structured Output: TDC formats (multiple choice, regression, generation); 66 JSON prompt templates available
Notable Features: Higher accuracy than Chat variant on prediction tasks

TxGemma 9B-Chat / 9B-Predict

Parameters: 9 billion
Context Window: 8,192 tokens
Multimodal: Text-only (molecular structures)
License: Health AI Developer Foundations Terms of Use (gated access)
Link: https://developers.google.com/health-ai-developer-foundations/txgemma/model-card

Primary Use Cases:

Same as 27B variants with better efficiency for most therapeutic tasks
Balanced performance/latency for drug discovery workflows
Resource-constrained research environments
Production deployments requiring fast inference

Agentic Capabilities:

Tool Use / Function Calling: Yes
Structured Output: TDC formats
Notable Features: Optimal balance for most therapeutic applications

TxGemma 2B-Predict

Parameters: 2 billion
Context Window: 8,192 tokens
Multimodal: Text-only
License: Health AI Developer Foundations Terms of Use (gated access)
Link: https://developers.google.com/health-ai-developer-foundations/txgemma/model-card

Primary Use Cases:

Fast therapeutic predictions with limited computational resources
High-throughput molecular screening
Mobile or edge therapeutic applications
Rapid prototyping of drug discovery pipelines

Agentic Capabilities:

Tool Use / Function Calling: Yes
Structured Output: TDC formats
Notable Features: Data-efficient competitive performance; smallest TxGemma variant

Data Grounding & Factuality

DataGemma-RAG-27B-IT

Parameters: 27 billion (Gemma 2 based)
Context Window: Up to 348,000 tokens (via Gemini 1.5 Pro integration); average 38K tokens
Multimodal: Text-only
License: Gemma Terms of Use
Link: https://huggingface.co/google/datagemma-rag-27b-it

Primary Use Cases:

Statistical question answering with authoritative citations
Trend analysis across demographics, economics, health, education
Data journalism requiring verified numerical facts
Policy analysis grounded in government statistics

Agentic Capabilities:

Tool Use / Function Calling: Yes - generates natural language queries for Data Commons API (250+ billion data points from trusted sources)
Structured Output: Generates query lists for Data Commons; creates extended prompts with footnotes and citations
Notable Features: Research/academic only - not production-ready; RAG approach pre-fetches relevant data; 99% accuracy on generated numbers; accesses continuously updated external data

DataGemma-RIG-27B-IT

Parameters: 27 billion (Gemma 2 based)
Context Window: 8,192 tokens
Multimodal: Text-only
License: Gemma Terms of Use
Link: https://huggingface.co/google/datagemma-rig-27b-it

Primary Use Cases:

Same as RAG variant with faster post-processing
Fact-checking numerical claims in generated content
Research applications requiring citation verification
Statistical analysis with lower latency requirements

Agentic Capabilities:

Tool Use / Function Calling: Yes - fine-tuned to recognize when to query Data Commons and annotate statistics
Structured Output: Annotates responses with [__DC__("query") → "statistic"] format for external verification
Notable Features: Research/academic only; RIG approach post-processes outputs by fact-checking; faster than RAG but less detailed

Specialized Architectures

EmbeddingGemma

Parameters: 308 million (100M model + 200M embedding)
Context Window: 2,048 tokens
Multimodal: Text-only
License: Gemma Terms of Use
Link: https://ai.google.dev/gemma/docs/embeddinggemma

Primary Use Cases:

Retrieval Augmented Generation (RAG) pipelines for local AI systems
Semantic search across 100+ languages
Privacy-sensitive applications requiring offline embeddings (personal file search, mobile apps)
Code retrieval and question answering systems

Agentic Capabilities:

Tool Use / Function Calling: Not applicable (embedding model)
Structured Output: 768-dimensional vectors (truncatable to 512, 256, 128 via Matryoshka Representation Learning)
Notable Features: Highest-ranking open multilingual embedding model under 500M parameters on MTEB; sub-15ms inference on EdgeTPU; under 200MB RAM with quantization; shares tokenizer with Gemma 3n

VaultGemma 1B

Parameters: 1 billion
Context Window: 1,024 tokens
Multimodal: Text-only
License: Gemma Terms of Use
Link: https://research.google/blog/vaultgemma-the-worlds-most-capable-differentially-private-llm/

Primary Use Cases:

Privacy-preserving NLP research requiring formal guarantees
Training on sensitive data (healthcare, finance, government) with provable privacy
Research baseline for privacy-preserving AI development
Applications requiring zero detectable memorization

Agentic Capabilities:

Tool Use / Function Calling: Not applicable (pre-trained base model)
Structured Output: Standard text generation
Notable Features: World's largest and most capable differentially private LLM; formal privacy guarantee (ε ≤ 2.0, δ ≤ 1.1e-10 at sequence level); trained from scratch using DP-SGD; novel DP scaling laws; performance comparable to GPT-2 1.5B

Safety & Content Moderation

ShieldGemma 2 4B

Parameters: 4 billion (Gemma 3 based)
Context Window: Not applicable (classification task)
Multimodal: Image + text (evaluates image safety with text policy descriptions)
License: Gemma Terms of Use
Link: https://ai.google.dev/gemma/docs/shieldgemma

Primary Use Cases:

Input filtering for screening user prompts to vision-language models
Output filtering for evaluating generated visual content
Dataset curation for training data safety
Synthetic and natural image safety classification

Agentic Capabilities:

Tool Use / Function Calling: Not applicable (classifier, not generative)
Structured Output: Probability scores for Yes/No tokens; optional JSON with label + rationale
Notable Features: State-of-the-art vs LlavaGuard 7B, GPT-4o mini; customizable safety policies at inference; 3 categories (sexually explicit, dangerous content, violence/gore); Optimal F1: 88.6% sexually explicit, 93.7% dangerous, 85.0% violence

ShieldGemma 1 (2B, 9B, 27B)

Parameters: 2, 9, 27 billion (Gemma 2 based)
Context Window: Not applicable
Multimodal: Text-only
License: Gemma Terms of Use
Link: https://ai.google.dev/gemma/docs/shieldgemma

Primary Use Cases:

Text-only content moderation for LLM inputs and outputs
Custom safety policy enforcement at inference time
Hate speech and harassment detection
Content filtering for applications with specific safety requirements

Agentic Capabilities:

Tool Use / Function Calling: Not applicable
Structured Output: Probability scores with optional rationale
Notable Features: 4 categories (sexually explicit, dangerous content, hate speech, harassment); LLM-as-a-judge technique; fine-tunable for custom policies

Model Comparison Table

Model	Parameters	Context	Multimodal	Knowledge Cutoff	Tool Use	Best For
Foundation Models
Gemma 3 27B	27B	128K	Text + Images	Aug 2024	⭐⭐⭐⭐	Complex reasoning, long context, multilingual
Gemma 3 12B	12B	128K	Text + Images	Aug 2024	⭐⭐⭐⭐	Balanced multimodal, production
Gemma 3 4B	4B	128K	Text + Images	Aug 2024	⭐⭐⭐	Real-time multimodal, mid-range GPUs
Gemma 3 1B	1B	32K	Text	Aug 2024	⭐⭐	Lightweight text, mobile
Gemma 3 270M	270M	32K	Text	Aug 2024	⭐	Task-specific fine-tuning
Edge/Mobile
Gemma 3n E4B	~4B	32K	Text + Image + Audio + Video	Aug 2024	⭐⭐⭐⭐	Real-time video (60 FPS), mobile
Gemma 3n E2B	~1.91B	32K	Text + Image + Audio + Video	Aug 2024	⭐⭐⭐	Ultra-efficient mobile
Vision
PaliGemma 2 28B	28B	512-4K	Text + Images	N/A	⭐⭐	Document analysis, OCR, research
PaliGemma 2 10B	10B	512-4K	Text + Images	N/A	⭐⭐	Balanced vision tasks
PaliGemma 2 3B	3B	256-4K	Text + Images	N/A	⭐	Lightweight vision
Code
CodeGemma 7B-IT	7B	8K	Text/Code	N/A	⭐	Conversational code generation
CodeGemma 7B	7B	8K	Text/Code	N/A	N/A	Code completion
CodeGemma 2B	2B	8K	Text/Code	N/A	N/A	Fast code infilling (FIM)
Medical
MedGemma 27B MM	27B	128K	Text + Medical Images	N/A	⭐⭐⭐⭐	Radiology, clinical decision support
TxGemma 27B-Chat	27B	8K	Text (molecules)	N/A	⭐⭐⭐⭐⭐	Drug discovery with reasoning
TxGemma 27B-Predict	27B	8K	Text (molecules)	N/A	⭐⭐⭐⭐	High-accuracy therapeutic predictions
Specialized
DataGemma-RAG-27B	27B	Up to 348K	Text	N/A	⭐⭐⭐⭐	Statistical Q&A (research only)
EmbeddingGemma	308M	2K	Text	N/A	N/A	RAG pipelines, semantic search
ShieldGemma 2 4B	4B	N/A	Image + Text	N/A	N/A	Multimodal safety classification

Key Considerations

Content quality limitations: All Gemma models may produce inaccurate, offensive, or hallucinated content. Google explicitly warns against relying on LLMs for medical, legal, financial, or professional advice. These models target developers and researchers, not consumer-facing factual Q&A - review outputs before publication.

Licensing restrictions: The custom "Gemma Terms of Use" differs significantly from standard open-source licenses. Key concerns include remote shutdown clauses (Google can restrict usage remotely for violations), prohibited use policy enforcement on downstream users, required updates to latest versions (impacts production stability), and derivative work restrictions. Some enterprises consider this license unsuitable for certain commercial scenarios - have legal teams review for specific use cases.

Hardware requirements vary dramatically. Gemma 3 270M runs on smartphones with minimal battery drain (0.75% for 25 conversations). Gemma 3 27B requires 60GB BF16 or 15GB with INT4 quantization - fits on consumer GPUs like RTX 3090 using QAT models. VRAM: 1B QAT ~892MB, 4B QAT ~3.4GB, 12B QAT ~8.7GB, 27B QAT ~21GB. Fine-tuning demands significantly higher resources than inference.

Specialized model considerations: MedGemma and TxGemma require Health AI Developer Foundations Terms with gated access and are not clinical-grade - they need developer validation before medical use. DataGemma remains research/academic only, not production-ready. CodeGemma optimizes for completion over conversation (use 7B-IT for chat). PaliGemma 2 is not a chatbot - it handles single-round image+text input only.

Multimodal constraints: Only Gemma 3 4B/12B/27B, Gemma 3n, PaliGemma 2, MedGemma multimodal, and ShieldGemma 2 support images. Fixed resolutions (896×896 for Gemma 3, multiple for PaliGemma 2) may challenge non-square aspect ratios. Gemma 3 270M and 1B remain text-only.

Language support: Gemma 3 supports 140+ languages, but evaluation occurred only with English prompts - quality may vary significantly across languages. Earlier Gemma 1/2 focused primarily on English.

Data privacy: Free tier data may improve Google products per standard service terms. Paid tier data does NOT improve Google products. Self-hosted open weights provide complete data control - ideal for privacy-critical applications.

Platform stability: In October 2024, Gemma temporarily disappeared from AI Studio after producing problematic content, highlighting risks of relying solely on cloud-hosted versions. Save local copies for critical projects.

Resources

Official Documentation:

Gemma Overview: https://ai.google.dev/gemma/docs
DeepMind Gemma Models: https://deepmind.google/models/gemma/
Google Blog: https://blog.google/technology/developers/gemma-3/

Model Cards & Technical Reports:

Gemma 3 Technical Report: https://arxiv.org/abs/2503.19786
Gemma 3 Model Card: https://ai.google.dev/gemma/docs/core/model_card_3
PaliGemma 2 Technical Report: https://arxiv.org/abs/2412.03555
TxGemma Technical Paper: https://arxiv.org/abs/2504.06196

Platform Access:

Google AI Studio: https://aistudio.google.com/
Vertex AI: https://cloud.google.com/vertex-ai
Kaggle Models: https://www.kaggle.com/models/google/gemma-3
Hugging Face Collections: https://huggingface.co/collections

Specialized Variants:

MedGemma Documentation: https://developers.google.com/health-ai-developer-foundations/medgemma
CodeGemma Documentation: https://ai.google.dev/gemma/docs/codegemma
TxGemma Documentation: https://developers.google.com/health-ai-developer-foundations/txgemma
DataGemma Documentation: https://ai.google.dev/gemma/docs/datagemma
ShieldGemma Documentation: https://ai.google.dev/gemma/docs/shieldgemma
EmbeddingGemma Documentation: https://ai.google.dev/gemma/docs/embeddinggemma