Tired: Agent orchestration; Wired: Agent evolution.

The Palo Alto, California-based Hexo Labs has got plenty of ambition. The start-up is focused on creating what it calls “superintelligence,” an intelligence so mighty it will easily surpass even Artificial General Intelligence (AGI) itself. 

How will such a capability be built? By bootstrapping itself, obviously. 

The company has released, as open source, a multi-agent framework called “Self-Improving AI,” which the company claims can learn by itself, and at a rate that far surpasses humans. 

SIA showed 350x performance improvement over having a human-in-the-loop to smash the thumbs up/thumbs down button, as measured by OpenAI’s MLE-bench benchmark, by Hexo’s own tests. This benchmark measures how well an agent performed at machine learning engineering. 

The MLE-bench leaderboard is temporarily offline for maintenance, so you will have to take Hexo’s word on this. 

SIA operates in continuous loops of learning and adaptation. It generates a hypothesis, runs an experiment, evaluates the outcome. If the outcome is successful, then it updates its approach. Rinse, lather, repeat, repeat, repeat. 

“Superintelligence will not emerge from static models. SIA learns from itself through execution and compounds its capability with every cycle,” said Kunal Bhatia, CEO and Co-Founder of Hexo Labs, in a statement.  

Hexo Labs has partnered with scientists at Stanford University, the University of Oxford, and the University of California Santa Barbara to investigate how SIA could be used to speed scientific discovery. The company has also started a grant program for other researchers to explore their own domains with SIA. 

Hexo is not alone in its pursuit of self-learning agents.

Andrej Karpathy has worked on what he calls a second-brain, a series of agent actions that compile knowledge they have learned, so it can be used later. 

Likewise, researchers have cobbled together Memento Skills, built on “reflective” read-write loops. The Massachusetts Institute of Technology developed a way for agents to fact check their own work

Hexo expands on this approach by building an agent system that not only modifies its own harness, but can rewrite the model it uses as well. 

Two Models to Run

The SIA framework is built on three continuously interoperating agents, working in an iterative fashion. A meta-agent reads the task and builds an agent to do the job.  This task agent executes the task to the best of its abilities, and records the results. Reading the logs of actions taken, a third agent reviews the performance, looks for ways to improve the task. It then updates the meta agent with these new findings.

This posse of agents consults one of two different models, each performing a different task. One builds the scaffolding, and rewrites the instructions for better results.  The meta agent and the evaluation agent run Claude Sonnet by default, but users can also swap in Claude Haiku or Claude Opus. 

The task agents, however, must run on a second open weight model (either locally or on a private cloud) that will specialize in the domain expertise of the user. This model can be molded by new information. The underlying weights can be jiggled to address a failed task. This implementation uses OpenAI’s open weight gpt-oss-120b, though developers can swap in their own model. 

The two-model approach is essential for allowing the agent to improve itself. 

“These two silos operate in isolation. Harness work leaves the model fixed; test-time training

leaves the harness fixed,” they write in their explanatory paper.  A single model doing all the thinking would just result in confusion, they argue. 

The software package includes four built-in tasks to test SIA’s mettle across a variety of different workloads. GPQA (Graduate-Level Google-Proof Q&A) tests how well the agent does at multi-step reasoning in complicated fields. There are also pre-built tasks for measuring reasoning capabilities in law (Lawbench), chess (LongCot Chess) and data science analysis (Spaceship Titanic).

The Future Should be Open Source

Time will tell if this two-model approach would be the secret sauce for AI to evolve itself into an intelligence to make AGI look like last year’s edition of Siri. But if superintelligence is a possibility, then Hexo wants to ensure that it is built with open source. 

For one it is too dangerous for so much power to be concentrated in a handful of firms, Bhatia explained in a TFiR videocast “I do think that it’s not the best option for humanity to just have it be concentrated in the hands of a few. These systems should be available to everybody to be able to own in a sovereign way,” he said. 

In a VMBlog interview, Bhatia elaborated on the danger of corporate monopolization, noting that the frontier labs are working on similar capabilities. 

“The skew and balance of power is a concern. That is why we have open-sourced it, where more people can have access to this technology and build with it,” he said. “The ability for many people to contribute to the technology will have a benefit.”