Kamil Józwik

Open-weights vs. open-source LLMs

Understanding the key differences between open-weights and open-source models.

llm

The distinction between "open weights" and "open source" matters more than you might think. These terms are often used interchangeably in marketing materials, but they describe fundamentally different levels of access and freedom. Understanding this difference affects what you can do with a model, whether you can understand how it was built, and whether you can reproduce or improve upon it.

What "Open Source" Traditionally Means

In traditional software, "open source" has a well-established meaning defined by the Open Source Initiative. Open source software provides:

  • The source code itself: The human-readable instructions that define how the software works. Not just a compiled binary, but the actual code you can read, modify, and learn from.

  • The right to modify and redistribute: Licenses that explicitly grant you permission to change the software and share your modifications. This is what enables community contributions and derivative works.

  • The build process and dependencies: Information about how to go from source code to a working program. This reproducibility is crucial - you can verify the software does what it claims and build it yourself.

This model has worked well for traditional software because the "source" is a relatively simple concept: text files containing code. With machine learning models, especially large language models, things get more complex.

What Are "Weights" in an LLM?

A language model's weights are the learned parameters that define its behavior. Think of them as the model's "memory" of what it learned during training.

When you train a neural network, you're essentially finding the right values for millions or billions of numerical parameters. For a model like Llama 2 70B, that's 70 billion parameters. These parameters determine how the model processes input and generates output. They're the result of training - feeding vast amounts of text through the model architecture and adjusting these numbers to minimize prediction errors.

The weights are stored as files containing these numerical values. For large models, this might be 100+ GB of floating-point numbers. When you "run" a model, you load these weights into memory and use them with the model's architecture (the code that defines the network structure) to process text.

Crucially, the weights alone don't tell you how they were created. You can't look at 70 billion floating-point numbers and understand what training data was used, what hyperparameters were chosen, or what techniques were applied. The weights are the end product, not the recipe.

"Open Weights": What You Get and What You Don't

When a model is described as "open weights" (sometimes called "open model" or misleadingly "open source"), you typically get:

  • The model weights themselves: The trained parameters you can download and use. This is the 100+ GB file that contains the model's learned knowledge.

  • The inference code: Software to load and run the model - the architecture definition and code to generate text from prompts. Often this is minimal: a model architecture class and loading utilities.

  • Basic documentation: Information about the model's capabilities, intended uses, and limitations. Usually includes example code for common tasks.

  • A license: Permission to use the model, with varying restrictions. This might allow commercial use, might limit it to research, or might have other constraints.

What you typically don't get with open weights:

  • Training data: The text corpus used to train the model. This might be for legal reasons (copyright concerns), practical reasons (it's terabytes of data), or competitive reasons (the dataset is considered proprietary). You know the model was trained on "web data" or "books" but not the specific sources or filtering applied.

  • Training code: The actual scripts and systems used to train the model. This includes data processing pipelines, training loops, distributed training setup, and all the engineering that went into making training work at scale.

  • Training hyperparameters and decisions: The specific choices made during training - learning rates, batch sizes, curriculum decisions, when to stop training, how to handle failures. These details profoundly affect the final model but are rarely shared.

  • Intermediate checkpoints: Models saved at various points during training. These can be valuable for understanding how the model learned and for research purposes.

This means you can use an open weights model, fine-tune it on your data, and deploy it in your applications. What you can't do is fully understand how it was created, reproduce the training process, or meaningfully audit what data influenced its behavior.

"Open Source" for LLMs

A truly open source LLM provides everything needed to reproduce the model from scratch:

  • Training data: The complete dataset, or at least detailed documentation of how to reconstruct it. This might mean curated datasets released publicly, or precise descriptions of how public data was filtered and processed.

  • Training code: All scripts, configurations, and systems used during training. This includes data preprocessing, the training loop, distributed training setup, and evaluation code.

  • Model architecture: The code defining the neural network structure. For open weights models, you get this too, but open source models emphasize that it's part of a complete package.

  • Weights at multiple stages: Not just the final trained model, but checkpoints throughout training. This enables research into how models learn and allows others to continue training from intermediate points.

  • Documentation of decisions: Explanation of why particular choices were made - architecture decisions, hyperparameter selection, training duration, and how problems were solved.

The goal is reproducibility: someone else should be able to follow your process and get a similar model. This is the standard that traditional open source software achieves, but it's much harder for LLMs due to the computational costs and data complexities involved.

Why the Distinction Exists

The gap between open weights and open source isn't usually about philosophical disagreement - it's about practical realities and competitive dynamics.

Training data is legally complex: Many LLMs are trained on data scraped from the internet. The legal status of using copyrighted material for training is unsettled and varies by jurisdiction. Companies often prefer not to document exactly what data they used to avoid legal exposure. Even when data is clearly legal to use, organizing and releasing it in a usable form is substantial work.

Training is expensive: Training a frontier LLM costs millions of dollars in compute. Companies that make this investment often view their training methodology as a competitive advantage. Releasing complete training details could help competitors catch up more quickly.

Engineering is complex: The systems built to train LLMs at scale involve substantial engineering - custom distributed training frameworks, data pipelines, monitoring systems. This code may be tightly coupled to internal infrastructure or may contain other proprietary elements.

Open weights provides most user value: For many developers, having the weights is sufficient. You can use the model, fine-tune it, and build applications. The training details don't affect your ability to use the model effectively. Companies can get credit for "openness" while protecting competitive advantages.

Reproducibility is expensive: To make training truly reproducible, you need to document everything carefully, clean up code for external use, and test that others can follow your process. This is significant additional work beyond just releasing weights.

This creates a spectrum rather than a binary: some projects release more information than others, and the definition of "open source" in AI is still being negotiated by the community.

Key Takeaways

Open weights gives you the model: You can use it, fine-tune it, and deploy it. This meets most application development needs.

Open source gives you the process: You can understand how it was built, reproduce it, and verify claims. This enables research and deep understanding.

The gap exists for practical reasons: Training data is legally complex, training is expensive, and engineering is proprietary. Companies balance openness with competitive interests.

Licenses matter as much as technical access: A model with open weights but restrictive licensing may be less useful than one with slightly less technical openness but permissive terms.

Your needs determine what matters: Application developers usually need just weights. Researchers, auditors, and those building critical infrastructure may need more.

When evaluating models, look beyond marketing claims. Check what's actually available, read the license, and decide if it provides what you need. The words "open weights" and "open source" aren't quality judgments - they're descriptions of what you're getting. Choose based on your requirements, not terminology.