Synopsis: In this AI Leadership Insights video interview, Mike Vizard speaks with Etan Ginsberg, Co-CEO of Martian, about large language models and the need for routing and orchestration layers.

Mike Vizard: Hello and welcome to the latest edition of the TechStrong.ai video series. I’m your host Mike Vizard, and today we’re with Etan Ginsberg, who’s co-CEO for Martian. They just raised some funding for what’s known as routing in the land of large language models, which of course are key to building out generative AI. Etan, welcome to show.

Etan Ginsberg: Thanks, Mike. It’s great to be here. I’m excited to chat with you.

Mike Vizard: If you would explain the issue you guys are addressing, guys. We understand that ultimately that there will be multiple LLMs, but eventually I need to connect them, but if so and how?

Etan Ginsberg: Yeah, that’s exactly the idea. So my co-founder and I were previously doing research in academia where we built applications of AI, including the first tutors powered by LLMs. This was back in 2020. And what we noticed even back then, and it’s become even more true now after ChatGPT and Llama, is that different LLMs have different strengths and weaknesses.
So a lot of people know OpenAI. There’s other companies like Anthropic and Google and XAI, and they’re all creating these models. But different models are good at different things, and the difference between models in terms of performance and cost is quite dramatic. Just to give an example, GPT-IV is 30 times more expensive than 3.5 Turbo, OpenAI’s other model, and it’s up to 900 times more expensive than some open source models like DeciCoder.
So you can have huge differences in the cost of using AI, depending what models you use. And there’s also different models which are trained on specialized data. For example, Bloomberg has a model BloombergGPT, which is trained specifically on their proprietary financial data. And many different enterprises have their own proprietary data sets. So they’re going to be looking to train LLMs on their data sets.
Say for example, you have a domain specific language. So these LLMs will be trained on data that your typical closed source LLM isn’t trained on, so they’ll be much better at specific tasks. And in the same way that you wouldn’t want to give a finance person HR task or vice versa, you want to be able to effectively delegate the tasks that you have to different LLMs. And that’s essentially what we’re creating at Martian, is an orchestration layer that routes each individual request to the best LLM based on the expected performance and cost.

Mike Vizard: Where does that orchestration layer sit? Is it in the cloud or will it be in an on-premise environment? Because these LLMs are going to be everywhere I think.

Etan Ginsberg: That’s a good question. So we’re offering support for both. For smaller developers and mid-market companies, you can route through our infrastructure in the cloud and for larger enterprises, we’ll work to deploy into your VPC.

Mike Vizard: Who is going to set this up? Is this part of the function of a data science team, or is there somebody else starting to emerge within IT organizations who’s going to be in charge of the routing of the, for lack of a better phrase, calls to the LLMs?

Etan Ginsberg: Yeah, that’s a good question. Actually, I know some organizations now created a chief innovation officer title that’s specifically focused on AI. So it’s something that as this field, it’s really only in the last year that generative AI has started to be adopted by enterprises. And I think that what we’re going to see is more changes in the organization structure as this becomes increasingly more important. So you’ll see more chief innovation officers, sometimes it’s the CTO, depending on the organization, sometimes it’s the traditional data science teams.

Mike Vizard: In my experience so far, which like everybody else is somewhat limited, but it seems like people are extending LLMs. They’re putting data into a vector database and showing that to the LLM and hoping that the LLM doesn’t take that data and run away with it. But the core idea is that I’m going to basically extend it. When would people actually customize an LLM or decide to build their own LLM? What drives what motion when?

Etan Ginsberg: That’s a good question. I think one of the biggest drivers is if you have some kind of domain specific language that the LLM wouldn’t be exposed to. It’s important to keep in mind that the LLMs are only as good as the data that they’re trained on. And there’s actually some recent research papers that have come out that have further investigated this. So even though LLMs can generalize, it’s limited specifically to domains where they’ve been exposed to lots and lots of data. So if you have your own domain specific language or some specialized set of knowledge about your enterprise application that’s not widely available on the internet, there’s a very good chance that the top LLMs are not actually trained on that data. And that’s a sign that you either want to incorporate some kind of fine-tuning or retrieval augmented generation, which is what you’re alluding to there. So typically it comes from domain specific languages and specialized data sets.

Mike Vizard: How will I operationalize all this? And I asked the question because I may build an app today that calls an LLM tomorrow, but a year from now I wanted to call a different LLM, and today we have a DevOps workflow for managing that process. How will that look in the age of AI and LLMs?

Etan Ginsberg: Yeah, that’s a good question. I think this is also a future that we’re building at Martian. We see a future in which there’s going to be many, many different LLMs that are being used in production. So right now, to be able to use many different LLMs, it’s an extremely difficult process. You have to ensure different API keys have different specifications for calling the API, and you have to figure out how do I integrate all of these different models, and there’s no good solution right now. Whereas with Martian, we can integrate all of these different APIs together with our own specifications, so you can only worry about integrating Martian specifically.
And then from that you get access to all of these different LLMs, and that includes both the general closed source model. So this is like your OpenAIs, Anthropics, Coheres, as well as open source models like Llama. And even if you fine tune your own models, you can add those to our router so we can route specifically for your models. I think that’s one piece of this, is we’re essentially going to be indexing all of these different models like Google. And in the same way that when you go to Google, you don’t have to worry about supporting all of the different websites that are out there, the company handles that, we would handle using all of these different LLMs together in production with a single orchestration layer. And I think the other piece too is around specifically working with enterprises to improve the router for their use cases.
So we’ve created this general purpose router that performs very well on a variety of tasks, but for very specialized applications that enterprises may have there may also be some additional tuning that’s needed, and this is something that we can work directly with these enterprises to do. So let’s say you’re optimizing, for example, you have an app and you care a lot about user engagement. You want to make sure that your users are spending as long as possible in a given session in your application. You can send that kind of data to us. How long did each session take? And then we can use that as a reward signal to further tune and calibrate our router. So we can actually work directly with enterprises to build them a custom router that performs especially well for their use case.

Mike Vizard: We’re also going to see the LLMs will eventually get updated. Sometimes that will be easier than others, but how frequently do you think that process might occur?

Etan Ginsberg: That’s a good question. And it’s something we see for example, just last week, OpenAI announced GPT-IV, Turbo. They’re adding other multimodal components to their LLMs like vision. You saw recently Llama three came out, that was a few months ago. And basically every week if you follow this space, there’s some new model or fine-tuned variant that comes out from the various companies. So honestly, on a weekly basis, we’ll see models come out that outperform previous benchmarks. So there’s constantly a need to update models and you run the risk if you don’t stay up to date of falling behind and using second in class models. That’s a big part of why we also wanted to create Martian is so companies don’t have to have a whole ML engineering team that’s staffed in their company and continuously testing these models. And then if a new model outperforms a previous model, but they have a completely different integration, well now you have to worry about changing your entire backend infra.
You might have to change your prompt. And that becomes a really difficult process and it is very difficult to keep up with. So models come out basically every week. And the other thing that I would also highlight here is there’s often a gap between the open source models and the closed source providers. So your open source models like Llama, they’ll tend to lag behind OpenAI, which is closed source. And so it’s often useful to be able to use the open source models if you have, for example, certain requirements around privacy or security. But you may want to use the closed source models for certain tasks. And that’s what’s also useful about this idea of routing, is we can segment and use both of them.
You can use the cutting edge from the closed source models for specific tasks that are needed, but you can also still support your own open source models or fine-tune variants. But you can imagine it would be very frustrating if every time you built your own custom model, it became out of date because some new closed source model comes out and it has much better performance. And it’s like, “Now I have to go and retrain my whole system, re-experiment with my prompt.” That becomes really, really difficult to deal with and it’s only going to increase as AI takes off more and more.

Mike Vizard: The other thing I’m not so sure people are wrapping their heads around is there are foundational models, but then I may take a slice of that and create something that feels like a parent-child relationship between LLMs. And then in other cases I may have a bunch of LLMs that do different functions that are, for lack of a better phrase, daisy-chained together to perform some functions. So is this all going to get a little more complicated?

Etan Ginsberg: Yeah, I think that the direction that the space is moving is using LLMs in more agent-like applications. So you’ll have many different LLMs being called. And it’s not just going to be like, “Oh, I’m going to call an LLM to summarize this passage,” but actually I want it to summarize and then take certain actions and gain other insights, and it can plug in with your applications. So it’s going to become more and more interdisciplinary for sure. Yeah. I think that’s also one of the areas where routing is very important, is the more disciplinary your tasks, the more you find that certain models could be better suited for different pieces of those tasks.

Mike Vizard: All right, folks, you heard it here. My LLMs are going to have LLMs. So essentially I got to figure out how to operationalize all this stuff and it’s going to be a new and interesting discipline. Etan, thanks for being on the show.

Etan Ginsberg: For sure. Thanks for having me.

Mike Vizard: And thank you all for watching the latest episode of TechStrong.ai. You can find this one and all our others on our website. Until then, we’ll see you next time.