An open source project from AI observability company TraceLoop, OpenLLMetry focuses the cloud native OpenTelemetry observability toolset on the task of understanding agentic AI workflows.

OpenTelemetry is “a great protocol for observability in general, so why not extend it to support agentic AI?” said Nir Gazit, CEO of TraceLoop, speaking in a virtual session of the Cloud Native Computing Foundation’s Artificial Intelligence Technical Community Group earlier this month.

The software can be used to capture the activity of interactions with large language models (LLMs), vector databases, and AI frameworks, providing a lot more nuance for organizations who need a better understanding of how effectively their LLM-based applications perform. How effective are the prompts they’ve been given? What parts of the templates were being used? What are the resource hogs?

Without telemetry, you don’t know what you don’t know, Gazit reminded the audience.

Built on OpenTelemetry

The second most popular project of the CNCF (after Kubernetes itself), OpenTelemetry offers a standard protocol for logging, gathering metrics, and tracing processes across cloud native services.

Because OpenTelemetry is an open protocol, it has been widely adopted by most observability platforms. BMC, Datadog, Dynatrace, Google Cloud, Honeycomb, New Relic, and Splunk, among many others, all support OpenTelemetry.

OpenLLMetry uses OpenTelemetry for its primitives, but it was written to understand AI-speak. It is conversant in the language completion tokens and model versions.

Developers can add OpenLLMetry to their own applications, through a number of software development kits, for Python (the predominantly supported language), and also JavaScript/Typescript, Ruby on Rails, and Golang, all of which work by adding a single line of code to the entry point.

The project also includes a wide array of instrumentations that can be added in front of model interfaces, vector databases and frameworks. To instrument each resource, you install a separate module that captures the interactions with that resource.

OpenLLMetry supports more than 30 LLMs, including the latest releases from Anthropic, Google Gemini and OpenAI. While the models themselves are largely black boxes, the project has “monkey patched” their client libraries to intercept outgoing requests (hopefully without impacting latency).

It can also instrument a number of the most popular vector databases, such as Pinecone, ChromaDB, Elasticsearch, et al. For instance, if you are using Pinecone, you can track the queries going in, the indexing taking place, and the data and performance metrics associated with each interaction.

And there is also instrumentations for various AI frameworks, such as Langchain and Haystack. Frameworks are used to build LLM-based applications, and provide them with the short-term memory to capture context. OpenLLMetry can be used to monitor MCP tools, visualize agentic loops, pinpoint trouble with Retrieval-Augmented Generation (RAG) processes.

Understanding the AI Space

One of the most valuable potential uses of OpenLLMetry would be tracing, Gazit told the attendees. Agentic workflows are inherently event-driven, usually involving multiple services. OpenLLMetry has a way to hand off a trace identifier from one MCP server to another, allowing the user to follow the process through all its steps.

An organization could use OpenLLMetry to track costs, based on prompt tokens and completion tokens. They can even be attributed to a specific user, feature, or team.

Traceloop itself offers a commercial platform that supports OpenLLMetry for enterprise use, including features such as LLM evaluations, a prompt registry and drift detection. The company closed $6.1 million in seed funding last May.