Qwen represents Alibaba Cloud's foundational AI model initiative. Launched initially in April 2023, Qwen isn't a single model but a comprehensive suite of LLMs designed to tackle a diverse range of tasks, from natural language understanding and generation to code creation and even processing visual and auditory information.
Built upon the proven transformer architecture, the majority of Qwen models are released under the permissive Apache 2.0 license, readily available on platforms like Hugging Face. This open approach allows us to experiment, customize, and deploy these models locally or within our own infrastructure, complementing the API access provided via Alibaba Cloud.
We are all familiar with models like GPT, Claude, and Gemini, but Qwen looks like an underrated gem in the LLM landscape. So let's take a look at it.
Qwen2.5
- the foundationAt the heart of the family lies Qwen2.5, a series of powerful and efficient text-based LLMs. It’s a dense, decoder-only transformer-based LLM with improved capabilities over Qwen2
. These serve as the foundation for many specialized models and are excellent general-purpose tools.
Here are some key features of Qwen2.5
:
Qwen2.5
boasts strong capabilities in 29+ languages, including English, Chinese, French, Spanish, Arabic, and many others.Qwen2.5
can be used with agent frameworks, follow instructions and generate structured outputs, particularly JSON
.Coder
and Math
- specialized expertsBuilding on the Qwen2.5
foundation, Alibaba has released specialized models fine-tuned for coding and
mathematics.
Below you can find key details about these models.
Qwen2.5-Coder
Qwen2.5-Math
Qwen2.5
), but it’s optimized for the two above.TIR
). The model can decide to invoke external tools (like a calculator or symbolic solver) during its reasoning process, incorporating the results to improve accuracy.VL
and Omni
- multimodalitiesQwen extends beyond text, offering models that can understand and interact with visual and auditory information.
Qwen2.5-VL
: understanding images and videosQwen2.5-VL
is a vision-language model series that extends Qwen2.5
with visual understanding, enabling image and video comprehension alongside text generation. This model excels at analyzing visual content and describing or reasoning about it in text form.
JSON
) for objects.VQA
), Optical Character Recognition (OCR
), document/chart analysis, video content analysis, multimedia chatbots.Qwen2.5-VL
is an open-weight model, but there are also two proprietary models available via Alibaba API -
qwen-vl-max
(enhanced capabilities of visual reasoning and instruction following compared with qwen-vl-plus. Best for complex tasks.) and qwen-vl-plus
(Enhanced detail and text recognition capabilities, supporting images with over one million pixel resolution and any aspect ratio. Exceptional performance for various visual tasks.)
Qwen2.5-Omni
: real-time multimodal interactionQwen2.5-Omni
is Qwen’s superior end-to-end multimodal model
, capable of perceiving and generating across text, vision, and audio modalities in real time. It introduces a novel Thinker-Talker
architecture for simultaneous understanding and response generation.
Qwen2.5-7B
backbone.You can check out this demo to see Qwen2.5-Omni
it in action.
QVQ
and QwQ
- advanced reasoning modelsBeyond standard multimodal capabilities, Qwen offers models specifically enhanced for complex reasoning, both visually and textually.
QVQ-Preview
and QVQ-Max
): deep visual reasoningQVQ is a vision-language model series focused on Visual Question Answering and reasoning with visual evidence. It builds upon Qwen2.5-VL
but emphasizes reasoning steps (thinking
) about images and videos. The initial release was QVQ-72B-Preview, demonstrating the concept of a model that can not only describe an image but also reason about it to solve complex tasks.
QVQ-Max
is the successor to the QVQ-Preview
and is accessible via API only.
Qwen2.5-VL
. QVQ-Max
employs optimizations like MoE
for enhanced scalability and efficiency.Qwen2.5-VL
if needed.Here is another demo prepared by Qwen team, showing the capabilities of QVQ-Max
QwQ
: reinforced textual reasoningQwQ
(Qwen with Questions) is a specialized model in the Qwen family focusing on improving reasoning via reinforcement learning. Based on the Qwen2.5 32B
model, QwQ
underwent intensive training (including multi-stage RL
) to enhance its performance on challenging reasoning tasks across domains like math and coding. The result is a model that can tackle complex questions with deeper thinking and better accuracy than the base model.
Qwen2.5-32B
.Qwen2.5-32B
.Qwen2.5-Max
- the flagshipQwen2.5-Max
is the large-scale Mixture-of-Experts (MoE) version of Qwen, representing Alibaba’s most advanced LLM in the Qwen 2.5 generation. It scales the model capacity dramatically (hundreds of billions of parameters) while using experts to keep inference efficient. Qwen2.5-Max
is positioned to compete with top-tier models like GPT-4-class systems in capability.
Qwen-Plus
and Qwen-Turbo
On the Alibaba Cloud Model Studio we can find two more flagship models - Qwen-Plus
and Qwen-Turbo
. There is not much information available about them, but they are positioned as lighter and faster variants of the Qwen2.5-Max
model, and both are available via API only. You can find them on OpenRouter as well: Qwen-Plus, Qwen-Turbo.
Here is some key available information about them:
Qwen-Plus
Qwen2.5
base modelQwen-Turbo
Qwen2.5
base model.Building upon the success of its predecessors, Alibaba introduced Qwen3, the latest generation of large language models in the Qwen series. Announced late April 2025,
Qwen3
offers a comprehensive suite of dense and MoE
models, delivering some advancements in reasoning, instruction-following, agent capabilities, and multilingual support.
Qwen3
continues the Transformer-based architecture but introduces significant enhancements and a diverse lineup:
MoE
models:
Qwen3-30B-A3B
: A compact MoE
model with 30B total parameters, but only 3B active during inference, offering good performance for its computational footprint.Qwen3-235B-A22B
: The flagship MoE
model with 235B total parameters and 22B active, designed to compete at the highest level.Qwen3
models are currently text-only, focusing on text input and text output (including code). They do not natively support image or audio modalities in this release series.Qwen3
significantly expands its linguistic reach, supporting 119 languages and dialects.Qwen3
models are optimized for coding and agentic tasks, with strengthened support for tool calling and MCP.Qwen3
is the introduction of hybrid thinking modes:
Chain-of-Thought
).Qwen3
models are suitable for a wide range of applications, building on the strengths of Qwen2.5 but with enhanced performance, particularly in coding, math, and multilingual scenarios. Their improved agentic capabilities make them excellent choices for building sophisticated AI agents and tool-using applications. The Hybrid Thinking feature adds a layer of control for developers needing to balance complexity and speed.You can also check out this nice video overview of Qwen3
here.
This table summarizes the key characteristics of the main Qwen models discussed (without the Qwen-Plus
and Qwen-Turbo
models):
Model | Parameter Sizes | Primary Modality | Context Window | Key Feature / Specialization | License | Multilingual |
---|---|---|---|---|---|---|
Qwen2.5 (Base) | 0.5B - 72B (Dense) | Text | 128k (Input) / 8k (Output) | Foundational, General Purpose | Apache 2.0 | 29+ Languages |
Qwen2.5-Coder | 1.5B, 7B (Dense) | Text (Code) | ~Tens of Thousands | Code Generation & Assistance | Apache 2.0 | Many Prog. Langs + Eng/Chi |
Qwen2.5-Math | 1.5B, 7B, 72B (Dense) | Text (Math) | ~128k | Mathematical Reasoning, Tool Use (TIR) | Apache 2.0 | English & Chinese |
Qwen2.5-VL | 3B - 72B (Dense) | Image/Video -> Text | Long Video + Large Text | Vision-Language Understanding | Apache 2.0 | Yes (Eng/Chi focused) |
Qwen2.5-Omni | 7B (Dense) | Text/Image/Audio/Video -> Text/Speech | Streaming / Real-time | End-to-End Multimodal Interaction | Apache 2.0 | Yes (Speech Eng/Chi) |
QVQ-Max | ~72B (Optimized/MoE?) | Image/Video -> Text (w/ Reasoning) | Extended Visual & Text | Deep Visual Reasoning (CoT) | Proprietary API | Yes (Eng/Chi focused) |
QwQ | 32B (Dense) | Text (Reasoning) | ~32k | Reinforced Reasoning (Math/Logic/Code) | Apache 2.0 | Yes (Eng/Chi) |
Qwen2.5-Max | ~325B (MoE) | Text | 32k | Flagship Scale & Performance (MoE) | Proprietary API | Yes (Broad) |
Qwen3 (Dense) | 0.6B, 1.7B, 4B, 8B, 14B, 32B | Text | 32K (for ≤4B), 128K (for ≥8B) | Hybrid Thinking, Strong Coding/Math | Apache 2.0 | 119 Languages |
Qwen3-30B-A3B (MoE) | 30B Total / 3B Active | Text | 128K | Efficient MoE, Strong Coding/Math | Apache 2.0 | 119 Languages |
Qwen3-235B-A22B (MoE) | 235B Total / 22B Active | Text | 128K | Flagship MoE, Top Performance | Apache 2.0 | 119 Languages |
(Note: Context window sizes can sometimes vary based on specific implementation or fine-tuning. The table provides typical or maximum advertised values.)
Qwen Chat is an UI chat interface for the Qwen family of models. Chat allows users to interact with the models in a conversational manner, making it easy to test and explore their capabilities for free.
If you want to try out the models locally, you can find them on Hugging Face and Ollama.
The Qwen family from Alibaba Cloud presents a compelling and versatile suite of large language models. With a strong emphasis on open-source releases for many core and specialized models, we have access to really powerful AI capabilities. No matter what you are building, there is likely a Qwen model well-suited for your application needs 🤞