Diffbot today revealed it has developed a large language model (LLM) that makes it possible to invoke the knowledge graph technology it has developed via a natural language interface.
The company previously developed an open source knowledge graph that has collected more than one trillion structured facts from across the Web. Unlike platforms such as Open AI, the knowledge graph can update the index of data it relies on in real time.
The LLM being added to the knowledge graph makes it possible to query all that index – including examples of code and images – without having to directly master the programming language that DiffBot developed to enable organizations to both index and query that data.
The LLM is based on Llama, originally developed by Meta, which Diffbot trained to create natural language queries that can be executed by the programming language it created for its knowledge graph. “It’s an expert user of our tools,” says Diffbot CEO, Mike Tung.
Diffbot also makes available an application programming interface (API), based on specifications defined by OpenAI, the developer of the ChatGPT AI model, that IT teams can use to integrate the Diffbot knowledge graph into their applications.
The result is an alternative approach to, for example, being able to launch a query that generates summarizations of all the latest information available, including citations, relating to a particular subject, says Tung. In effect, Diffbot has created the first retrieval-augmented generation tool for a knowledge graph that is tied to an index that continuously crawls the Web to retrieve fresh data, he added.
In addition to providing more accurate response to queries, the Diffbot knowledge graph uses orders of magnitude less compute and energy resources to aggregate that content, said Tung.
Organizations can also self-host the knowledge graph versus relying on AI models that are invoked using application programming interfaces (APIs) to create input and output tokens that result in fees being assessed for every query created.
Ultimately, the core reasoning capability required to generate more accurate responses to queries can be distilled down to less than one billion parameters, he noted. That approach also eliminates the tendency LLMs have to generate hallucinations when they don’t have access to enough data to generate an accurate response, added Tung. Each response provided by the knowledge graph also includes the sources used to enable end users to verify them, said Tung.
Additionally, the open source platform also provides access to all the code and weights that were used to create those responses, which IT organizations can tailor as they best see fit, he noted.
Diffbot has set up an instance of Diffy Chat, a decentralized messaging software based on a blockchain framework, to foster a secure ecosystem around its knowledge graph that it is encouraging application developers and data scientists to join.
It’s not likely that a natural language interface for a knowledge graph is going to replace generative AI platforms in every possible use case. However, in many of the use cases that involve simple retrieval and aggregation of data, it may provide a much more efficient alternative that is far less costly to implement, integrate and maintain.