Synopsis: In this AI Leadership Insights video interview, Mike Vizard speaks with Vaibhav Nivargi, founder and CTO of Moveworks, about the need for framework for employing copilots.

Mike Vizard: Hello and welcome to the latest edition of the video series. I’m your host, Mike Vizard. Today we’re with Vaibhav Nivargi, CTO for Moveworks, and we’re talking about the need for a framework for how you employ copilots these days. There’s a lot of them and a lot of different processes. Vaibhav, welcome the show.

Vaibhav Nivargi: Thank you, Mike. Good morning. It’s great to be here today.

Mike Vizard: So what exactly is the challenge we’re looking at? Could copilots essentially wind up being, well, I don’t know, too much of a good thing these days and we need to figure out how to manage this process and what does that look like?

Vaibhav Nivargi: That’s a great question to start. I want to say that these copilots are definitely compelling. I think ever since chatGPT has come out, there has been a flurry of innovation in this space, and there are different options in front of large and small companies out there, whether it is to serve their internal constituents, their employees, whether it is to serve their external audience, their customers. And with that come choices and options, whether they should build these in-house using open source, large language models where they can do some fine-tuning on some of these if they have the data, if they have the infrastructure, or whether they should partner with a third party vendor such as GitHub for a coding copilot or such as Salesforce for a marketing copilot or Moveworks for an enterprise-wide employee copilot.
So given the breadth and spectrum of these choices available to customers, there are ways in which they can think about this investment. There are ways in which they can think about this decision such that it is not something that is done from a short-term optimization perspective, but something that sustains long-term innovation for them.

Mike Vizard: So ultimately, most organizations are going to wind up using a mix of copilots. Do the copilots need to be aware of each other? Will they communicate and somehow or other update each other or share information as we build out processes that span multiple tools and frameworks? So how do we need to think about that?

Vaibhav Nivargi: That’s a great point. I think in some of these cases, you might have power users that are using specific tools, specific workflows, where there might be these copilots that enhance those operations. I mentioned GitHub already. There might be other software, other workflows, other tools that they’re using where these copilots could be siloed, where the productivity gains for power users are tremendous from using these. Using language, using the right sort of automation infrastructure within the confines of that application can be an amazing outcome. But there are use cases, for example, what we do at Moveworks is where we connect to different applications and different workflows across different departments, and there you can imagine an ecosystem of copilots, if you will, where there are handoffs and delegations and interconnections between these copilots that together enhance the experience of the users who are using them. So I imagine both of these would coexist. Both of these would be pretty common as you have more and more copilots that are introduced into a day-to-day workflow of an employee.

Mike Vizard: Will organizations build their own copilots for custom applications as well, or would they just be better off leveraging the copilots that every ISV seems to be building out for every application that gets delivered? It seems to me making a commitment to build and maintain an LLM is a significant undertaking, shall we say.

Vaibhav Nivargi: It is. It is a significant endeavor because the process for training, monitoring, updating, deploying these LLM-based copilots is non-trivial. So there needs to be a strong case made for long-term leverage and long-term value for organizations to invest in this. Now, the open source ecosystem around LLMs is extremely vibrant. If you look at Hugging Face and their portfolio of LLMs, there’s some 18,000 LLMs and models with different modalities, some deal with images, some deal with audio. Of course, many of them deal with language and there are supported licenses. There are models that are compatible with fine-tuning if you have something that is domain specific.
So imagine that you are in an industry where you have some private data that is not available to an off-the-shelf LLM, like an OpenAI model or an Anthropics model. Then your only option in some ways is to fine tune this data on the LLM so that you have something that is optimized for your domain. Now, there are options where you can also do something called retrieval augmentation where you’re using an external database and that gives the large language model this information at runtime. All of these require some investment, some infrastructure that you need to build and maintain and deploy, which assumes that you would have the right machine learning engineers, the right infrastructure engineers, the right data annotation experts who understand data ontologies and taxonomies.
So there is an investment here, but the value can be commensurate as well if the domain is sophisticated enough and rich enough, and there is long-term value in doing this. I think for the most part, there are vendors and partners who are sophisticated enough who have invested in this. So in some cases, partnering makes sense where you don’t want to reinvent the wheel. Now in our world, in the Moveworks world, we got started in resolving employee support requests in IT. So if you have problems with employees dealing with their workstations, with their laptops, with software they’re using, with the infrastructure they’re using, with printers and conference rooms, now that is a problem that is pretty universal across most major companies.
And dare I say, there is not that much unique appeal for a company to solve it on their own. So in that case, partnering with a vendor like us makes sense, because we’ve been doing it for seven years now. We have a sophisticated suite of large language models that we can deploy in an environment for a given customer, and they can see value and ROI right away. So there is a trade-off here, and it depends on the specific use case and that’s what business leaders, IT leaders, technology leaders need to identify.

Mike Vizard: As we look at all these copilots, do you think that there’s going to be a major jump in productivity as a result of all of this? And I ask the question because there’s a lot of folks who are still dubious about the value investing in IT when the productivity labor rates for the last decade or so haven’t really moved all that much.

Vaibhav Nivargi: Yeah, I think that’s a great point, and I would imagine on both the spectrums… so if you are a power user in an application, if you’re writing code or if you’re using Tableau or Box or Microsoft or any of these tools, I would imagine there are measurable, tangible ways in which your productivity can increase if you’re using a copilot that accelerates certain common workflows for you, that helps you with doing more complex things in easier ways. In some cases it might be hard to measure, but in our world, again, moving the conversation to Moveworks, I think it’s extremely measurable. If you have companies that can track productivity of employees when they have requests or problems, what is the meantime to resolution? What is the first contact resolution of a service desk? How much time is lost by employees if they’re not getting the help at work?
And with a product like this, you can measure it. In fact, we offer these analytics so our customers can make an informed decision on how are things improving day over day, week over week, month over month. So I think, in fact I would argue, that companies who are making this investment in these copilots have a clear value framework, understand where exactly the needle gets moved, whether it’s in terms of top line, whether it’s in terms of efficiency, employee productivity, there are ways to connect it using not so contrived and complex ways so you can understand the value of the investment you’re making.

Mike Vizard: Where are we going to be in the future when we’re going to have all these copilots and I need some way to understand what they’re all doing, I need to make sure that they’re all operating within a certain amount of guardrails and that they’re secured?

Vaibhav Nivargi: Yes.

Mike Vizard: So will my copilots need copilots to manage the copilots?

Vaibhav Nivargi: That’s a great point. I think there was actually a paper in the late ’80s which talks about the pedals of automation, and the heart of the paper is exactly what you said. At some point, when you introduce enough sophisticated automation, who monitors that automation? And if you remove humans from the mix, then at some point you tend to lose the specificity, the skills, that put pilots in the first place once you give them copilots. So in some ways, you can’t rely in every case on something that is in a similar domain of complexity to monitor the performance of something equally complex. So you need some simpler rules, you need some simpler guardrails, you need simpler dashboards, analytics, and monitoring, and simpler constraints in some ways. So I think that applies in terms of business process. That can apply in terms of permissions and access control.
That can apply in terms of evaluation, which is a big open research area, because these models, especially as you get into billions, hundreds of billions, trillions of parameters are extremely sophisticated. And I think even Open AI’s GPT4 paper talked about the risk of hallucinations, and they said that as these models get more sophisticated, the hallucinations are harder to track because we start relying on these more and more. So I think our philosophy at Moveworks is we employ a vast array of guardrails and constraints starting with something deterministic saying, “What are these copilots supposed to have access to? What are they not supposed to have access to?” We tune and monitor the performance and accuracy of each of these models such that we can trade off how often they need to be right versus how much confidence they have in the answers. So we don’t want these copilots to speculate.
We don’t want them to engage in guesswork, especially in a domain that is really important to maintain their correctness in. So we can say, “Hey, I’m not that confident in the answer, so I’m going to let the IT agent resolve the problem for you.” So that’s a natural fallback option that is available for our copilot that may not be in the case in every scenario. So I think the long-winded way of saying this is, you are going to need a lot of supporting infrastructure around this. The more deterministic and predictable you can make it, the more you can monitor the performance of these models.

Mike Vizard: Let’s say I have a finite resource, such as your conference room example, and I want to use a copilot to manage my access to that, but so does Bob and Susie down the hall. So won’t our copilots start to battle with each other for this limited resource, and I’m going to wind up with a scenario where my copilot wants to beat up your copilot? How does that work?

Vaibhav Nivargi: Yeah, I think there needs to be some consolidation, right? So I don’t think it’ll occur at the granularity of an individual copilot dealing with a single resource like that, although that is possible. And in some cases, you might then have the calendar that acts as the ultimate arbitration saying, “Who has this slot for 9:00 AM on a Tuesday?” But I would imagine that if it is something with a conference room, there is a single copilot that everyone can go and talk to saying, “Hey, I’m trying to book a meeting with Mike and it needs to have Zoom. It needs to have nice soundproofing capabilities or something else like that.” And you can have the copilot be aware, you can have those semantics, that can have that metadata to know, “Hey, I’m in the Mountain View office, or I’m in the Chicago office, and these are the three conference rooms that have these capabilities.”
But I can look at the calendar and say two of them are booked. So I have the third option. So I think these copilots have the reasoning capabilities now, these LLMs have the reasoning capabilities to understand this at some basic level, and you can augment them with more business context, with more permissions. Let’s say some people are allowed to book the conference rooms, some aren’t. Full-time employees versus contractors. So I think that is additional metadata that can embellish them to help them make the right decision in more cases than not. It’s not going to be foolproof, and there are going to be some cases where there will be some gray area, but I think that is the case where we can rely on escalating it saying, “Hey, I ran into an issue where I can’t resolve it, so I’m going to file a ticket, or I’m going to engage a live agent, or I’m going to let you figure it out yourself,” in some cases. But that ends up becoming the retraining data that makes the copilot smarter tomorrow versus today.

Mike Vizard: Will that copilot know the difference between somebody who booked the rooms so they can discuss their fantasy football results versus a customer meeting that the CEO wants to have?

Vaibhav Nivargi: Yeah, I think it might need some more annotation and training, right? I think some of the intent behind why somebody’s booking that might be missing. You might have it modeled after VIP personalities. You might have it modeled based on something that is periodic. So there are ways in which you can encode additional information and logic, but it’s not going to be foolproof because these models are based on certain probability distributions and certain word characteristics and word models that they have based this on. So I think if a case like this occurs where something wrong has happened, that ends up becoming a retraining artifact for the underlying model saying, “Hey, you did this wrong. Here is how you should do this correctly.” Now, in some cases, if you’re unable to retrain the model that can be embedded in the context window, saying that prioritize this use case above that use case, and that is something that these models are reasonably good at remembering and applying at runtime.

Mike Vizard: What’s your best advice to people who are going to be living in a world where each employee has multiple copilots and they need to be managed and they’re essentially junior employees sitting alongside their regular employees? How do I approach that?

Vaibhav Nivargi: Yeah, I think this is an active conversation that we are having with one of our customers as well, because they are being approached left, right, east, west, everywhere from different vendors, their internal constituents, who want to build something, who want to deploy something, who want to buy something. And our advice to them is to understand what is the business problem that they’re trying to solve and how do you measure the success of that particular solution? So that will at least give them a framework in which they can say, “Does this fit as part of the problem you’re trying to solve? Or is it sort of a science project in some way that we can defer to later?” Once you have that, then they have a junction in the road. Is there value in them building it themselves, or is there value in them buying and partnering with someone else?
Now, there might be domain advantages, there might be business criticality, there might be limited data. There might be various artifacts where building it makes sense, assuming they have the talent and they have the infrastructure to do it themselves. If not, it makes sense to partner with a vendor. And I think there are increasingly robust frameworks in which they can evaluate the performance. They can POC something, they can try something out. And then are there ways in which they can monitor the effectiveness of it once something like this is deployed? So that is where many companies are thinking like the copilot is the end all, be all, but that is a necessary but not sufficient component of this overall strategy. We are calling it the AI transformation strategy. So you need to figure out, “Hey, where is the data going to go for this copilot?” In many ways, it’s garbage in, garbage out.
If copilots have… maybe some of them are only doing one thing. They are rewriting your emails. That’s a very well-bounded use case, right? It’s only affects a small section of employees. It’s easier to monitor, easier to deploy, it’s very constrained. Whereas if something is doing something like Moveworks does, then the evaluation can be pretty broad because we’re talking to many systems. In more ways than one, what our customers are trying to do is centralize the decision making by understanding the business value and then empowering their teams to make this decision given this framework. Does that make sense? Does that help?

Mike Vizard: It surely does. I guess, folks, the way to think about this going forward is for every problem that AI eliminates or solves, it creates all kinds of new and interesting twists and turns that got to be managed, as well. And you got to start thinking this through today. Hey Vaibhav, thanks for being on the show.