You have a mountain of documents. Most of them are PDFs, slide decks, or internal reports. They’re great for human eyes, but for an AI system, they’re basically a brick wall for your AI application, hard work to ingest. The goal isn’t just to make this data readable; you need to turn it into a high-utility asset that follows the “write once, read many” philosophy. If you do this right, you build a single, secure pipeline that feeds every AI application you ever decide to launch. Brian Martin from Signal65 presented the results of his work with Gadget Software at AI Infrastructure Field Day 4 in Santa Clara.
Most companies start with a specific use case. They want an HR chatbot, so they scrape some documents and store them in a vector database. Then they want a BI tool for the sales team, so they repeat the process. It’s inefficient and messy, and you end up with fragmented data silos that don’t talk to each other. Instead of building one-off solutions, you should be building a centralized, reusable, compute-ready data asset. This is the foundation, a permanent repository where all information is cleaned and enriched, ready for any model to consume.
Reusable Compute-Ready Data
The “write once” part is where the heavy lifting happens. You take your raw files and put them through a transformation layer. This isn’t just about turning a PDF into text. It’s about semantic decomposition. You break the information down into its core ideas while keeping the context intact. If a paragraph mentions a “project,” the system needs to know exactly which project is being discussed, even if the name was mentioned only three pages earlier. You’re essentially prepending context to every snippet of data.
Once the data is decomposed, you enrich it. This is where you use an LLM to generate metadata. You create summaries, extract key entities, and assign categories. You do this once and store the results for reuse. By the time the data hits your storage, it’s already been “understood” by an AI. This enriched state enables the “read many” capability. Because the data is so well-structured and context-heavy, you can point a chatbot at it today and a complex reasoning agent at it tomorrow without re-processing a single byte.
Security and attribution have to be baked into this pipeline from the start. In a professional environment, “the AI said so” isn’t a valid answer. You need to know exactly where a piece of information came from. When you build a compute-ready asset, every chunk of data carries its lineage with it. If the model generates a response, it can point directly to the source document and the specific page. This creates a level of trust that you just can’t get with raw, unorganized data dumps.
There’s also the question of where this processing happens; close to the data is best. If you’re sending every single document to a cloud API for enrichment, you’re looking at a massive bill and unpredictable latency. Using local GPUs for the transformation layer is often the smarter play. It keeps data within your perimeter, improving security, and provides consistent throughput.
You also have to consider the long-term utility of this asset. Technology moves fast, so the model you use today will be obsolete in eighteen months. However, if your data is already decomposed, cleaned, and enriched with high-quality metadata, switching models is straightforward. You don’t have to reindex your entire company’s knowledge base every time a new version of GPT or Claude is released. You just plug the new model into your existing, compute-ready asset. It’s easy to get caught up in the excitement of the “read” side of things. Everyone wants the shiny chatbot or the automated report generator. But those tools are only as good as the data they consume. If you’re feeding them raw, poorly structured text, they’re going to hallucinate or miss the point entirely. You’re back to “garbage in, garbage out.”
Building a compute-ready data asset is about shifting your focus and effort to the “write” side. It’s an investment in the infrastructure of your information. When you treat your data as a high-value asset rather than a collection of files, you unlock a different level of performance. There is a lot of work upfront to build AI-ready data. Getting the decomposition right and setting up the enrichment pipeline takes time. But once it’s running, it’s a force multiplier. You have a single source of truth that is secure, attributable, and ready for whatever the next wave of AI brings. That’s how you move from just experimenting with AI to actually running it at scale.
Ultimately, the goal is to simplify data access for new or updated applications. By focusing on a “write once, read many” pipeline, you ensure that every piece of information in your company is prepped, primed, and ready to go. It’s about being prepared. It’s about being efficient. It’s just good business.
Watch Brian’s presentation on the Tech Field Day website, or find all the research on the Signal65 website.

