Google Launches a 'Universal Format' for Karpathy's LLM Wiki

AI pioneer Andrej Karpathy proposed using the wiki format as an easy way to capture everything that agents learn in the course of doing their work. Now, two Google Cloud tech leads have created a new metadata specification based on Karpathy’s LLM-wiki pattern that could capture enterprise data in a standard, reusable way.

The Open Knowledge Format (OKF) formalizes Karpathy’s suggested workflow into a portable, interoperable format, so that as wiki-based knowledge management systems proliferate, agents can use data from multiple sources without translation.

“Instead of using models to search the same documents for the same facts over and over, you can give your agents a shared markdown library that grows more useful over time,” wrote Google Cloud tech leads Sam McVeety and Amir Hormati in an introductory blog post.

Although pitched for general usage, OKF takes aim at the enterprise knowledge management systems from Notion, Atlassian Confluence, and Collibra, which lock up enterprise data in proprietary formats not easily understood by external agents.

Evolving Knowledge

Karpathy is perhaps best known as one of the founding research scientists of OpenAI. He later helped Tesla solve some of its autonomous driving challenges. Last month, he joined Anthropic to help better pretrain LLMs.

In April, Karpathy floated the idea of building personal knowledge bases that can provide essential material for agents and user prompts.

On their own, LLMs have no short-term memory (much like the amnesiac protagonist in the movie Memento). It’s up to the supporting application layer to keep notes of an ongoing conversation with the user or agent, but these notes have to be introduced to the LLM for each new query or task.

Karpathy proposed, and demonstrated, how to create a wiki that an LLM can use to keep notes of previous transactions, “a structured, interlinked collection of markdown files” of what an LLM has learned in completing its task.

Ward Cunningham introduced the idea of a wiki in 1995 as a set of writable text files that can be altered by third parties over the Web, using a markup language to set the formatting. Originally, wikis were used as a collaboration tool, but Karpathy found wikis to be the perfect format for LLMs to use as scratch pads.

The LLM can log all the information it learns during a session, using the wiki as the scratch pad. Then later, when that LLM, or another, is asked to execute a similar task, it can consult the wiki.

“When you add a new source, the LLM doesn’t just index it for later retrieval. It reads it, extracts the key information, and integrates it into the existing wiki — updating entity pages, revising topic summaries, noting where new data contradicts old claims, strengthening or challenging the evolving synthesis. The knowledge is compiled once and then kept current, not re-derived on every query,” Karpathy wrote.

Beset by issues of accuracy and the increasing cost of tokens, the AI community seized upon Karpathy’s idea of the LLM-Wiki to improve agent performance. The community around Obsidian, a knowledge management application, quickly built a “Karpathy LLM Wiki” plug-in. AI start-up Nimbalyst built a framework to automate the bookkeeping needed to maintain an LLM Wiki-knowledge management system. The open source Hermes ingested the format for its own internal operations.

Standardization Helps the Agent

Now, Google wants to standardize all these efforts.

OKF “represents knowledge as a directory of markdown files with YAML front matter,” the Google duo wrote. The files can reside on any file system, stored in a repository such as Git, and be easily transported via tarball.

Standardization would certainly ease operations for agents. Today, agents must rifle through metadata catalogs, each with its own API. They must dig for data squirreled away in code comments or notebook cells, or hidden on shared drives, with little or no metadata to guide navigation.

“Karpathy’s wiki and your team’s wiki and a vendor’s catalog export may all look alike (markdown, frontmatter, cross-links), but none of them are intentionally designed to cooperate. There is no agreed-upon answer to what fields every document should carry, or what filenames mean what,” the tech leads wrote. “As a result, the knowledge encoded in wikis remains siloed within the original teams, leading to redundant effort whenever a new agent is built.”

A Simple Specification for All

Like markdown and the wiki format itself, OKF was designed to be intentionally simple. An OKF artifact can be created without specialized knowledge or an SDK.

The base element of OKF is “a bundle” of concepts. A concept is anything the user wants to record, including tables, datasets, metrics, playbooks, runbooks, APIs. Each concept is a single file with a YAML-based header that includes a basic description of the content, including attributes as title, resource and description.

Concepts can be related to each other with normal markdown links. A directory of concepts is then inherently captured as a graph of relationships.

The full OKF specification (currently at v0.1) includes conformance criteria, cross-linking rules, and a set of reserved filenames.

The Google team also provided a set of reference implementations for building a knowledge catalog, a discovery agent and a data enrichment agent.

Of course, the most difficult part of creating a standard is adoption. The company has updated its own, recently introduced, Google Knowledge Catalog so that this enterprise-focused query service can understand the format. But given Google’s reputation for purloining data from the web to feed its own AI-based services, the current generation of knowledge management system providers may be reluctant to accept any format from Google, however simple it may be.

Google Launches a ‘Universal Format’ for Karpathy’s LLM Wiki

Evolving Knowledge

Standardization Helps the Agent

A Simple Specification for All

SHARE THIS STORY

FOLLOW US

Google Launches a ‘Universal Format’ for Karpathy’s LLM Wiki

Evolving Knowledge

Standardization Helps the Agent

A Simple Specification for All

TECHSTRONG AI PODCAST

SHARE THIS STORY

RELATED STORIES:

FOLLOW US

NEWSLETTER SIGN UP