OpenAI Debuts gpt-oss-120b and gpt-oss-20b, Defining Open-Weight Language Model Innovation
OpenAI is debuting two state-of-the-art, open-weight language models—gpt-oss-120b and gpt-oss-20b—now freely available for download on Hugging Face. Engineered to deliver strong, real-world performance at a low cost, gpt-oss-120b and gpt-oss-20b “push the frontier of open-weight reasoning models,” according to OpenAI.
The gpt-oss models, trained with OpenAI’s most advanced pre-training and post-training techniques, outperform similarly sized open models on reasoning tasks. Trained with a particular emphasis on reasoning, efficiency, and real-world usability for a range of deployment environments, OpenAI’s latest models are ideal for on-device use cases, local inference, or fast iteration without costly infrastructure.
OpenAI’s gpt-oss models are the company’s first open-weight language models since GPT-2, representing significant advancement in the space. Based on OpenAI’s evaluations across standard academic benchmarks for areas such as coding, competition math, health, and agentic tool use, the company found that:
- gpt-oss-120b outperforms OpenAI o3-mini and matches or exceeds OpenAI o4-mini on competition coding, general problem solving, and tool calling.
- gpt-oss-120b exceeds o4-mini’s performance on health-related queries.
- Despite its small size, gpt-oss-20b matches or exceeds OpenAI o3-mini on these same evaluations, even outperforming it on competition mathematics and health.
The gpt-oss models are compatible with Responses API, OpenAI's most advanced interface for generating model responses. The models are meant to be used within agentic workflows due to its exceptional instruction following, tool use—such as web search or Python code execution—and reasoning capabilities, according to OpenAI. Additionally, the models are fully customizable, provide full chain-of-thought (CoT), and support Structured Outputs—an OpenAI feature that ensures model responses adhere to a JSON schema.
Both models come natively quantized in MXFP4, enabling the gpt-oss-120B model to run within 80GB of memory and the gpt-oss-20b to run within a mere 16GB. Partnering with leading deployment platforms such as Azure, Hugging Face, vLLM, Ollama, llama.cpp, and more, the models are designed to be flexible and easy to run anywhere, including locally, on-device, or via third-party inference providers.
Additionally, NVIDIA has optimized the company’s new open-source gpt-oss models for NVIDIA GPUs, delivering smart and fast inference from the cloud to the PC. These new reasoning models enable agentic AI applications such as web search, in-depth research and much more.
“For developers who want fully customizable models they can fine-tune and deploy in their own environments, gpt-oss is a great fit,” commented OpenAI. “Releasing gpt-oss-120b and gpt-oss-20b marks a significant step forward for open-weight models. At their size, these models deliver meaningful advancements in both reasoning capabilities and safety. Open models complement our hosted models, giving developers a wider range of tools to accelerate leading edge research, foster innovation and enable safer, more transparent AI development across a wide range of use cases.”
“These open models also lower barriers for emerging markets, resource-constrained sectors, and smaller organizations that may lack the budget or flexibility to adopt proprietary models. With powerful, accessible tools in their hands, people around the world can build, innovate, and create new opportunities for themselves and others. Broad access to these capable open-weights models created in the US helps expand democratic AI rails,” OpenAI continued.
To learn more, please visit https://openai.com/.