Introducing Axiom 1, the Best Computer Use Model in the World.
Aug 19, 2025
Computer use foundation models are still at the very beginning. Today’s systems fail often, act too slowly, and are not reliable enough for most real-world tasks. At Induction Labs, our core focus is building better computer use models - models that can see software, plan, and act directly.
Today, we are announcing Axiom 1, the strongest computer use model in the world.

Figure 1: OSWorld-Verified performance. External model results provided are our internal reproduction or the publicly reported results, whichever is higher. "GPT-5 w/ Custom Framework" and "o3 w/ Custom Framework" refers to the previous state-of-the-art Agent S2.5 framework.
Axiom 1 achieves state-of-the-art results on OSWorld-Verified, the industry standard benchmark for computer use models. Our model surpasses OpenAI’s GPT-5, o3, and Anthropic’s Claude 4 Sonnet. The core technical innovation behind Axiom 1 is scaling and learning directly from human actions instead of text-based reasoning; Axiom 1 is trained on demonstrations of experts completing their real work tasks on computers.

Figure 2: Average time spent per OSWorld action on OSWorld-Verified. Requests to OpenAI were made with the priority processing service tier.
One of the main blockers for deploying computer use models in production is latency. Models that take tens of seconds per action are unusable for real work. Axiom 1 beats GPT-5 with 16x lower time per action.
Over time, our goal is to push latency low enough to enable faster streaming video perception, where models see the interface continuously, just as humans do. This would unlock applications current models cannot handle — such as motion design, time-sensitive trading, or playing video games end-to-end.
We are releasing example completions from Axiom 1 to demonstrate its behavior.
Fill in all the blank cells with the value above it (6x speed) |
---|
Help me to set up an initial web extension project with help of the web tool, tagging it "happy-extension v0.0.1". Leave description blank for now. Include a background script and browser action, while other features are not required. Remember to unzip the auto-generated folder into "~/Projects". (6x speed) |
---|
A unified model for seeing, planning, and acting
Unlike current state-of-the-art systems such as GPT-5 (Agent S2.5, as reported in the OSWorld chart), which rely on multi-stage pipelines, Axiom 1 is a single unified model: it sees the screen, performs planning, and outputs actions in one step. We believe this architecture avoids inductive biases that emerge from decomposing the problem into vision-language perception plus external bounding-box or tool-based control.
A key benefit of this design is reliable planning under failure. In practice, Axiom 1 can recover gracefully when unexpected outcomes occur, instead of cascading into errors. Comparative demonstrations (Axiom 1 vs. GPT-5) highlight this robustness.
Please search the Internet for a tutorial on adding absolute line numbers in Vim and setting it as default for my local Vim |
Axiom 1 (6x Speed) |
---|
GPT-5 (60x Speed) |
Agents that use all software
We believe computer use models should serve as the general agents, not tools confined behind text interfaces. Axiom 1 has learned to use the same tools humans do to solve tasks. For example, we observe that Axiom 1 learned to use ChatGPT through a web browser for some tasks.
Please help me modify VS Code setting to hide all "__pycache__" folders in the explorer view (6x speed) |
This shift makes it possible for agents to work across the full range of human software, not just predefined tools.
Current Limitations
Despite the progress, Axiom 1 is far from perfect. Our model still struggles with complex UI elements and extreme long-horizon tasks. Scoring 60.2% on OSWorld marks an important milestone for unified computer use models, but OSWorld alone can’t capture the full spectrum of capabilities needed for reliable agents.
That’s why we're funding external research on evaluations, and welcome collaboration on developing more comprehensive benchmarks for computer use. Our focus remains alignment and research until these models and their evaluations reach the level of robustness required for safe and productive use.
* * *
Axiom 1 demonstrates that unified computer use models can outperform the best existing systems while achieving practical latency. At Induction Labs, we are committed to scaling these models, building the infrastructure to support them, and pursuing the research necessary to make them safe and reliable. If this work seems interesting to you, send us a message at hiring@inductionlabs.com.