Kamil Józwik
LLM model logo

Gemma

Gemma is a family of open-weight AI models developed by Google DeepMind. Built from the same research foundation as Gemini, these models prioritize efficient deployment and responsible AI while maintaining competitive performance. The family currently includes 30+ model variants spanning foundation models, edge/mobile optimization, vision-language, code generation, medical/therapeutic AI, and safety classification, with strengths in on-device deployment, multilingual support, and specialized domain applications. With over 200 million downloads, the "Gemmaverse" represents one of AI's most active open-source ecosystems.

Family Overview

Developer: Google DeepMind builds and maintains Gemma models using the same research foundation as Gemini. The family spans variants from 270 million to 27 billion parameters across multiple specializations.

Philosophy: Gemma prioritizes efficient, responsible AI deployment with complete developer control. Unlike closed APIs, these models provide open weights for download, modification, and local deployment without internet connectivity. The family balances cutting-edge capabilities with practical constraints: Gemma 3 27B matches GPT-4-class performance on many tasks while fitting on consumer GPUs. Architectural innovations like Quantization-Aware Training (QAT) reduce memory 3x while preserving quality, and mobile-first designs like Gemma 3n achieve 60 FPS video processing on smartphones.

Key strengths emerge from specialized architectures and training approaches. Gemma 3 processes 128K token contexts across 140+ languages with multimodal vision capabilities. Mobile optimization through Per-Layer Embedding caching enables real-time on-device inference. Domain specialists like MedGemma, CodeGemma, and TxGemma apply transfer learning to healthcare, programming, and therapeutic development. The DataGemma variants ground responses in external statistical databases to reduce hallucinations.

When to choose Gemma: Select this family for local/edge deployment, data privacy requirements, multilingual applications, or specialized domains (medical, code, therapeutics). The open-weight model with custom commercial license suits developers needing control over inference infrastructure. Consider alternatives for consumer-facing factual Q&A - Google explicitly warns these models target developers and researchers, not end-users, and may produce inaccurate or hallucinated content.

The Gemmaverse names the global ecosystem of 60,000+ community fine-tuned variants. Notable examples include AI Singapore's SEA-LION v3 (11 Southeast Asian languages), INSAIT's BgGPT (Bulgarian-first), and Nexa AI's OmniAudio (edge audio-language). This community demonstrates Gemma's adaptability across languages and domains.

Platform & Access

Google AI Studio provides the primary platform - a free interface for prompt testing, API key generation, and model evaluation. Unfortunately, Gemma is not present there anymore. For enterprise needs, Vertex AI offers managed deployment with MLOps capabilities.

Access model: Dual-path access allows both self-hosted and cloud-hosted deployment. Download open weights from Kaggle, Hugging Face, or Ollama for self-hosted deployment with zero API costs. Alternatively, use the Gemini API for cloud-hosted inference with token-based pricing.

Pricing model: Free tier provides 5 requests/minute, 25 requests/day, 32K tokens/minute. Paid tier (requires Cloud Billing) dramatically increases limits. Gemma models via API appear free based on community reports, with only rate limits as constraints. Open-weight self-hosting eliminates all usage fees. New Google Cloud users receive $300 credits; academic researchers can access up to $10,000 through the Gemma 3 Academic Program.

Framework support includes Keras, JAX, PyTorch, TensorFlow, Hugging Face Transformers, Ollama, vLLM, and mobile deployment via Google AI Edge.

Foundation Models

Gemma 3 27B

Primary Use Cases:

Agentic Capabilities:


Gemma 3 12B

Primary Use Cases:

Agentic Capabilities:


Gemma 3 4B

Primary Use Cases:

Agentic Capabilities:


Gemma 3 1B

Primary Use Cases:

Agentic Capabilities:


Gemma 3 270M

Primary Use Cases:

Agentic Capabilities:

Edge & Mobile Optimized

Gemma 3n E4B

Primary Use Cases:

Agentic Capabilities:


Gemma 3n E2B

Primary Use Cases:

Agentic Capabilities:

Quantized Optimization

Gemma 3 QAT (1B, 4B, 12B, 27B)

Primary Use Cases:

Agentic Capabilities:

Vision & Multimodal

PaliGemma 2 28B

Primary Use Cases:

Agentic Capabilities:


PaliGemma 2 10B

Primary Use Cases:

Agentic Capabilities:


PaliGemma 2 3B

Primary Use Cases:

Agentic Capabilities:

Code Specialists

CodeGemma 7B-IT

Primary Use Cases:

Agentic Capabilities:


CodeGemma 7B

Primary Use Cases:

Agentic Capabilities:


CodeGemma 2B

Primary Use Cases:

Agentic Capabilities:

Medical & Therapeutics

MedGemma 27B Multimodal

Primary Use Cases:

Agentic Capabilities:


MedGemma 27B Text-Only

Primary Use Cases:

Agentic Capabilities:


MedGemma 4B Multimodal

Primary Use Cases:

Agentic Capabilities:


TxGemma 27B-Chat

Primary Use Cases:

Agentic Capabilities:


TxGemma 27B-Predict

Primary Use Cases:

Agentic Capabilities:


TxGemma 9B-Chat / 9B-Predict

Primary Use Cases:

Agentic Capabilities:


TxGemma 2B-Predict

Primary Use Cases:

Agentic Capabilities:

Data Grounding & Factuality

DataGemma-RAG-27B-IT

Primary Use Cases:

Agentic Capabilities:


DataGemma-RIG-27B-IT

Primary Use Cases:

Agentic Capabilities:

Specialized Architectures

EmbeddingGemma

Primary Use Cases:

Agentic Capabilities:


VaultGemma 1B

Primary Use Cases:

Agentic Capabilities:

Safety & Content Moderation

ShieldGemma 2 4B

Primary Use Cases:

Agentic Capabilities:


ShieldGemma 1 (2B, 9B, 27B)

Primary Use Cases:

Agentic Capabilities:

Model Comparison Table

ModelParametersContextMultimodalKnowledge CutoffTool UseBest For
Foundation Models
Gemma 3 27B27B128KText + ImagesAug 2024⭐⭐⭐⭐Complex reasoning, long context, multilingual
Gemma 3 12B12B128KText + ImagesAug 2024⭐⭐⭐⭐Balanced multimodal, production
Gemma 3 4B4B128KText + ImagesAug 2024⭐⭐⭐Real-time multimodal, mid-range GPUs
Gemma 3 1B1B32KTextAug 2024⭐⭐Lightweight text, mobile
Gemma 3 270M270M32KTextAug 2024Task-specific fine-tuning
Edge/Mobile
Gemma 3n E4B~4B32KText + Image + Audio + VideoAug 2024⭐⭐⭐⭐Real-time video (60 FPS), mobile
Gemma 3n E2B~1.91B32KText + Image + Audio + VideoAug 2024⭐⭐⭐Ultra-efficient mobile
Vision
PaliGemma 2 28B28B512-4KText + ImagesN/A⭐⭐Document analysis, OCR, research
PaliGemma 2 10B10B512-4KText + ImagesN/A⭐⭐Balanced vision tasks
PaliGemma 2 3B3B256-4KText + ImagesN/ALightweight vision
Code
CodeGemma 7B-IT7B8KText/CodeN/AConversational code generation
CodeGemma 7B7B8KText/CodeN/AN/ACode completion
CodeGemma 2B2B8KText/CodeN/AN/AFast code infilling (FIM)
Medical
MedGemma 27B MM27B128KText + Medical ImagesN/A⭐⭐⭐⭐Radiology, clinical decision support
TxGemma 27B-Chat27B8KText (molecules)N/A⭐⭐⭐⭐⭐Drug discovery with reasoning
TxGemma 27B-Predict27B8KText (molecules)N/A⭐⭐⭐⭐High-accuracy therapeutic predictions
Specialized
DataGemma-RAG-27B27BUp to 348KTextN/A⭐⭐⭐⭐Statistical Q&A (research only)
EmbeddingGemma308M2KTextN/AN/ARAG pipelines, semantic search
ShieldGemma 2 4B4BN/AImage + TextN/AN/AMultimodal safety classification

Key Considerations

Content quality limitations: All Gemma models may produce inaccurate, offensive, or hallucinated content. Google explicitly warns against relying on LLMs for medical, legal, financial, or professional advice. These models target developers and researchers, not consumer-facing factual Q&A - review outputs before publication.

Licensing restrictions: The custom "Gemma Terms of Use" differs significantly from standard open-source licenses. Key concerns include remote shutdown clauses (Google can restrict usage remotely for violations), prohibited use policy enforcement on downstream users, required updates to latest versions (impacts production stability), and derivative work restrictions. Some enterprises consider this license unsuitable for certain commercial scenarios - have legal teams review for specific use cases.

Hardware requirements vary dramatically. Gemma 3 270M runs on smartphones with minimal battery drain (0.75% for 25 conversations). Gemma 3 27B requires 60GB BF16 or 15GB with INT4 quantization - fits on consumer GPUs like RTX 3090 using QAT models. VRAM: 1B QAT ~892MB, 4B QAT ~3.4GB, 12B QAT ~8.7GB, 27B QAT ~21GB. Fine-tuning demands significantly higher resources than inference.

Specialized model considerations: MedGemma and TxGemma require Health AI Developer Foundations Terms with gated access and are not clinical-grade - they need developer validation before medical use. DataGemma remains research/academic only, not production-ready. CodeGemma optimizes for completion over conversation (use 7B-IT for chat). PaliGemma 2 is not a chatbot - it handles single-round image+text input only.

Multimodal constraints: Only Gemma 3 4B/12B/27B, Gemma 3n, PaliGemma 2, MedGemma multimodal, and ShieldGemma 2 support images. Fixed resolutions (896×896 for Gemma 3, multiple for PaliGemma 2) may challenge non-square aspect ratios. Gemma 3 270M and 1B remain text-only.

Language support: Gemma 3 supports 140+ languages, but evaluation occurred only with English prompts - quality may vary significantly across languages. Earlier Gemma 1/2 focused primarily on English.

Data privacy: Free tier data may improve Google products per standard service terms. Paid tier data does NOT improve Google products. Self-hosted open weights provide complete data control - ideal for privacy-critical applications.

Platform stability: In October 2024, Gemma temporarily disappeared from AI Studio after producing problematic content, highlighting risks of relying solely on cloud-hosted versions. Save local copies for critical projects.

Resources

Official Documentation:

Model Cards & Technical Reports:

Platform Access:

Specialized Variants: