Finally! OpenAI Enters the Open-Source Arena with Two New Models

Good morning, everyone! Dimitri Bellini here, and welcome back to Quadrata. For a while now, I've been waiting for something genuinely new to discuss in the world of artificial intelligence. The on-premise, open-source scene has been buzzing, but largely dominated by excellent models from the East. I was waiting for a major American player to make a move, and finally, the moment has arrived. OpenAI, the minds behind ChatGPT, have released not one, but two completely open-source models. This is a big deal, and in this post, I’m going to break down what they are, what they can do, and put them to the test myself.

What's New from OpenAI? A Revolution in the Making

OpenAI has released two "open-weight" models, which means we have access to the model's core infrastructure and the data it was trained on. This is fantastic news for developers, researchers, and hobbyists like us, as it allows for deep customization. The two new models are:

GPT-OSS-120B: A massive 120-billion parameter model.

GPT-OSS-20B: A more accessible 20-billion parameter model.

This move is a significant step, especially with a permissive Apache 2.0 license, which allows for commercial use. You can build on top of these models, fine-tune them with your own data, and deploy them in your applications without the heavy licensing restrictions we often see.

Key Features That Matter

So, what makes these models stand out? Here are the highlights:

Truly Open License: The Apache 2.0 license gives you immense freedom to innovate and even commercialize your work.

Designed for Agentic Tasks: These models are built to be "agents" that can interact with tools and perform complex, multi-step tasks. While the term "agentic" is a bit of a buzzword lately, the potential is there.

Deeply Customizable: With open weights, you can perform post-training to tailor the model to your specific needs, creating a specialized LLM for your unique use case.

Full Chain of Thought: A major point of contention with closed models is their "black box" nature. You get an answer but can't see the reasoning. These models expose their entire thought process, allowing you to understand why they reached a certain conclusion. This transparency is crucial for debugging and trust.

Choosing Your Model: Hardware and Performance

The two models cater to very different hardware capabilities.

The Powerhouse: GPT-OSS-120B

This is the star of the show, with performance comparable to the closed GPT-3.5-Turbo model. However, running it is no small feat. You'll need some serious hardware, like an NVIDIA H100 GPU with at least 80GB of VRAM. This is not something most of us have at home, but it’s a game-changer for businesses and researchers with the right infrastructure.

The People's Model: GPT-OSS-20B

This is the model most of us can experiment with. It's designed to be more "human-scale" and offers performance roughly equivalent to the `o3-mini` model. The hardware requirements are much more reasonable:

At least 16GB of VRAM on a dedicated NVIDIA GPU.

A tool like Ollama or vLLM to run it (at the time of writing, Ollama already has full support!).

This is the model I’ll be focusing my tests on today.

My Hands-On Test: Putting GPT-OSS-20B to Work with Zabbix

Benchmarks are one thing, but real-world performance is what truly counts. I decided to throw a few complex, Zabbix-related challenges at the 20B model to see how it would handle them. I used LM Arena to compare its output side-by-side with another strong model of a similar size, Qwen2.

Test 1: Zabbix JavaScript Preprocessing

My first test was a niche one: I asked the model to write a Zabbix JavaScript preprocessing script to modify the output of a low-level discovery rule by adding a custom user macro. This isn't a simple "hello world" prompt; it requires an understanding of Zabbix's specific architecture, LLD, and JavaScript context.

The Result: I have to say, both models did an impressive job. They understood the context of Zabbix, preprocessing, and discovery macros. The JavaScript they generated was coherent and almost perfect. The GPT-OSS model's code needed a slight tweak—it wrapped the code in a function, which isn't necessary in Zabbix, and made a small assumption about input parameters. However, with a minor correction, the code worked. Not bad at all for a model running locally!

Test 2: Root Cause Analysis of IT Events

Next, I gave the model a set of correlated IT events with timestamps and asked it to identify the root cause. The events were:

Filesystem full on a host

Database instance down

CRM application down

Host unreachable

The Result: This is where the model's reasoning really shone. It correctly identified that the "Filesystem full" event was the most likely root cause. It reasoned that a full disk could cause the database to crash, which in turn would bring down the CRM application that depends on it. It correctly identified the chain of dependencies. Both GPT-OSS and Qwen2 passed this test with flying colors, demonstrating strong logical reasoning.

Test 3: The Agentic Challenge

For my final test, I tried to push the "agentic" capabilities. I provided the model with a tool to interact with the Zabbix API and asked it to fetch a list of active problems. Unfortunately, this is where it stumbled. While it understood the request and even defined the tool it needed to use, it failed to actually execute the API call, instead getting stuck or hallucinating functions. This shows that while the potential for tool use is there, the implementation isn't quite seamless yet, at least in my initial tests.

Conclusion: A Welcome and Necessary Step Forward

So, what's my final verdict? The release of these open-source models by OpenAI is a fantastic and much-needed development. It provides a powerful, transparent, and highly customizable alternative from a Western company in a space that was becoming increasingly dominated by others. The 20B model is a solid performer, capable of impressive reasoning and coding, even if it has some rough edges with more advanced agentic tasks.

For now, it stands as another great option alongside models from Mistral and others. The true power here lies in the community. With open weights and an open license, I'm excited to see how developers will improve, fine-tune, and build upon this foundation. This is a very interesting time for local and on-premise AI.

What do you think? Have you tried the new models? What are your impressions? Let me know your thoughts in the comments below!

Stay Connected with Me:

YouTube Channel: Quadrata

Zabbix Italia Telegram: Join the conversation!

Cookie Preferences

Strictly Necessary Cookies

Performance / Analytics Cookies

Advertising / Marketing Cookies

Master Observability.Become Zabbix Certified.

Built in Italy.Scaled to Dubai.

Engineering Notes

Finally! OpenAI Enters the Open-Source Arena with Two New Models