Finally! OpenAI Enters the Open-Source Arena with Two New Models

Good morning, everyone! Dimitri Bellini here, and welcome back to Quadrata. For a while now, I've been waiting for something genuinely new to discuss in the world of artificial intelligence. The on-premise, open-source scene has been buzzing, but largely dominated by excellent models from the East. I was waiting for a major American player to make a move, and finally, the moment has arrived. OpenAI, the minds behind ChatGPT, have released not one, but two completely open-source models. This is a big deal, and in this post, I’m going to break down what they are, what they can do, and put them to the test myself.

What's New from OpenAI? A Revolution in the Making

OpenAI has released two "open-weight" models, which means we have access to the model's core infrastructure and the data it was trained on. This is fantastic news for developers, researchers, and hobbyists like us, as it allows for deep customization. The two new models are:



This move is a significant step, especially with a permissive Apache 2.0 license, which allows for commercial use. You can build on top of these models, fine-tune them with your own data, and deploy them in your applications without the heavy licensing restrictions we often see.

Key Features That Matter

So, what makes these models stand out? Here are the highlights:


Choosing Your Model: Hardware and Performance

The two models cater to very different hardware capabilities.

The Powerhouse: GPT-OSS-120B

This is the star of the show, with performance comparable to the closed GPT-3.5-Turbo model. However, running it is no small feat. You'll need some serious hardware, like an NVIDIA H100 GPU with at least 80GB of VRAM. This is not something most of us have at home, but it’s a game-changer for businesses and researchers with the right infrastructure.

The People's Model: GPT-OSS-20B

This is the model most of us can experiment with. It's designed to be more "human-scale" and offers performance roughly equivalent to the `o3-mini` model. The hardware requirements are much more reasonable:


This is the model I’ll be focusing my tests on today.

My Hands-On Test: Putting GPT-OSS-20B to Work with Zabbix

Benchmarks are one thing, but real-world performance is what truly counts. I decided to throw a few complex, Zabbix-related challenges at the 20B model to see how it would handle them. I used LM Arena to compare its output side-by-side with another strong model of a similar size, Qwen2.

Test 1: Zabbix JavaScript Preprocessing

My first test was a niche one: I asked the model to write a Zabbix JavaScript preprocessing script to modify the output of a low-level discovery rule by adding a custom user macro. This isn't a simple "hello world" prompt; it requires an understanding of Zabbix's specific architecture, LLD, and JavaScript context.

The Result: I have to say, both models did an impressive job. They understood the context of Zabbix, preprocessing, and discovery macros. The JavaScript they generated was coherent and almost perfect. The GPT-OSS model's code needed a slight tweak—it wrapped the code in a function, which isn't necessary in Zabbix, and made a small assumption about input parameters. However, with a minor correction, the code worked. Not bad at all for a model running locally!

Test 2: Root Cause Analysis of IT Events

Next, I gave the model a set of correlated IT events with timestamps and asked it to identify the root cause. The events were:



  1. Filesystem full on a host

  2. Database instance down

  3. CRM application down

  4. Host unreachable

The Result: This is where the model's reasoning really shone. It correctly identified that the "Filesystem full" event was the most likely root cause. It reasoned that a full disk could cause the database to crash, which in turn would bring down the CRM application that depends on it. It correctly identified the chain of dependencies. Both GPT-OSS and Qwen2 passed this test with flying colors, demonstrating strong logical reasoning.

Test 3: The Agentic Challenge

For my final test, I tried to push the "agentic" capabilities. I provided the model with a tool to interact with the Zabbix API and asked it to fetch a list of active problems. Unfortunately, this is where it stumbled. While it understood the request and even defined the tool it needed to use, it failed to actually execute the API call, instead getting stuck or hallucinating functions. This shows that while the potential for tool use is there, the implementation isn't quite seamless yet, at least in my initial tests.

Conclusion: A Welcome and Necessary Step Forward

So, what's my final verdict? The release of these open-source models by OpenAI is a fantastic and much-needed development. It provides a powerful, transparent, and highly customizable alternative from a Western company in a space that was becoming increasingly dominated by others. The 20B model is a solid performer, capable of impressive reasoning and coding, even if it has some rough edges with more advanced agentic tasks.

For now, it stands as another great option alongside models from Mistral and others. The true power here lies in the community. With open weights and an open license, I'm excited to see how developers will improve, fine-tune, and build upon this foundation. This is a very interesting time for local and on-premise AI.

What do you think? Have you tried the new models? What are your impressions? Let me know your thoughts in the comments below!




Stay Connected with Me: