Synopsis: In this Leadership Insights video interview, Amanda Razani speaks with David Colwell, VP of AI and ML at Tricentis, about the recent White House Executive Order on AI, and the impact it will have on organizations.
Amanda Razani: Hello and welcome to the Techstrong AI Leadership Insights interview series. I’m Amanda Razani, and I’m excited to be here today with David Colwell. He is vice president of AI and Machine Learning at Tricentis. How are you doing today?
David Colwell: I’m doing well, Amanda. Thanks for asking.
Amanda Razani: Good. Can you share a little bit about Tricentis and what services do you provide?
David Colwell: Sure. So we, at Tricentis, we build quality assurance software. Well, anything to do with making your products work after you’ve built them, that’s where we’re here to help. So we build anything from software to make sure that you are testing the right things when you’re building the software, to making sure that it operates correctly once it’s been constructed, to making sure that it performs well and is secure when it’s in production.
So we offer a range of products across this range from Tosca and Testim and qTest suites through to our performance testing in NeoLoad and situations like that. We’ve been embedding artificial intelligence into this technology since this far back is 2019 and making sure that it provides our customers with the ability to do their jobs in a… Well, we like to think in a more joyful way, taking away some of the kind of day-to-day bits and pieces work. And that’s really been my role in Tricentis for the last four or five years, is trying to find ways to utilize this emerging technology of artificial intelligence and embed it into these products so that our customers are able to take more joy in their work and to do their work faster and more productively with our tooling.
Amanda Razani: Awesome. Well, that’s a perfect segue then into our first topic of discussion today, which is the White House AI executive order that was recently introduced and the wide-ranging impact that it’s going to have on enterprises and business leaders. So what are your thoughts there?
David Colwell: Yeah, when I first read the executive order, I was going through it with our legal team and we’re doing what most organizations are doing going, “Okay, well how does this impact us? Where do we need to change? Where do we need to adjust?” But we started to see that a lot of our customers were asking exactly the same questions. Some of them were asking it from the point of view of, “Look, it’s just an executive order at this point. It’s only really scoped to impact sort of the wheels of federal government, so we don’t need to worry too much about that.” But the truth was there was sort of a bit of a cascading impact coming out of it because it was not just federal government, it was things that federal government used and the services that they engaged with and the procurement processes that they took. And that starts to get a little bit more murky as to where the actual boundary was.
And at the first glance, we looked at it and said, “Well, this only really is targeting people that are building what they would call the foundation models. These things like the GPT models that OpenAI and Azure construct or the Gemini models that Google’s releasing, that type of thing. But when you dig into it, there’s a lot of language there about the quality of the product that you build with artificial intelligence and making sure that it is a high quality product and that it actually does proverbially what it says on the team, and that you have to have an AI framework for this, for managing the risk of it going wrong because the key message of the executive order is, “This technology is very useful and powerful and you should use it and you should experiment with it, but you need to be responsible.”
We kind of likened it to cave diving. It’s a sort of, “Please make sure that you do this safely. We’re not banning you, but make sure you do it safely.” And so what we internally had to adjust to was there was a lot of… You can do things very quickly with generative AI, but you can do them partially, if that makes sense. You can ship something that summarizes legal documents, for example. And it reads the document, comes up with a summary and puts it out there. And that would take you mere days now, which is phenomenal. Generative AI is very powerful. Whereas before you would spend potentially years building this type of capability. But you don’t have time in those days to properly test it to make sure that all the content of that legal document is represented in the summary so that you are not misrepresenting the document in some way. And it’s a very difficult problem to solve.
So we kind of call this doing the 80%. You get 80% of the way there, and then you’ve got that remaining 20%, which you probably need to spend almost three or four times the time of the initial implementation doing that testing. It’s things that teams aren’t used to, such as having what we call a red team. This is like a group of internal people that have full access to the code and information, and their job is to try to make the system misbehave using almost any means necessary. They need to be familiar with artificial intelligence, they need to be familiar with techniques for prompt hacking and all these other things to be able to push the system, push the AI into an error state to make it try to… I don’t know if we’re building a chatbot to talk to our customers. You want to try to force that chatbot to say things that it shouldn’t or to make commitments on behalf of the company and things like that.
And so there is a lot of focus in this executive order around that risk management framework, what your red teams should look like, how you should validate these systems, as well as a general requirement for a bit of an overhaul on education about artificial intelligence and the concepts of training. Like when you train, how does that data get stored, where does it go, what happens to it? And once it’s in that big artificial brain, it’s kind of hard to extract and remove from it, so you have all these privacy concerns that flow on from that.
And this education isn’t really that common out there in DevOps or engineering teams that are building products. So one of the main focuses of the order was making sure that people using this technology, which is easy to use and enables you to do things quite quickly, making sure that they’re using it in a safe and effective way, not trying to stop them from using it.
Amanda Razani: So that’s a lot for business leaders to digest and a lot of different things they have to consider. From your experience, when business leaders are trying to do these AI initiatives and implement the new technology, where are the hangups and what advice do you have to help them through?
David Colwell: I think one of the first hangups is the magical demo. People come in with a problem, that problem’s almost immediately solved, and then business leaders will go, “Okay, this looks quite easy. We’re ready to go.” That’s kind of the first failing. And we’ve seen a lot of customers… I don’t know if you’re familiar with technology hype cycle. You start out and you go up this really rapid curve of expectations, you breach the pinnacle and then you rapidly fall off of it. In AI unfortunately, we’ve seen a lot of customers fall all the way off that curve and end up saying, “Oh, this technology doesn’t actually have productive use for us.” So they become very skeptical of it. And the reality is that it’s just not as easy as it initially looked. There’s a lot of effort that needs to be put into making that initial demo work in a continuous way, and so that’s the first thing to look out for.
There’s a lot of little bits and pieces that you need to consider when you’re using this technology. A major one is your training data and information. If you are just using a generative AI out of the box as it were that someone else has built, then the minute you train it, being aware that that training data is all just effectively stored in that model. It’s not exactly, but it’s a good approximation. Meaning that if you train a model, for example, on information from multiple different clients, again, let’s use the legal analysis example, you get the AI to read multiple legal documents and it learns how to summarize them, and then you allow multiple different clients to use that model. Then it’s foreseeable that they can extract information from it that was used in training, so they might be able to get someone else’s legal document.
And there’s examples of this, like a telephone manufacturer, or sorry, what do you call them, mobile devices, had some of their source code leaked because they were inputting it into these foundation models and it was used elsewhere. So that’s kind of some of the big things that people need to be aware of.
I think when you get into the, “Okay, we’ve decided we’re going to go down the path of using this technology. We’ve kind of got our things together. We know what we’re doing with training,” that it breaks down into a set of what I call AI syndromes. People mistake the black box on the other end for being a logical entity because it sounds a lot like one. When you ask it questions, it responds. They’re like passes the touring test with flying colors and passes a lot of other tests, which is a little scary at times. But then you ask it, “Give me five examples of this” and it gives you seven. And you say, “That was five” and it says, “No, it was eight.” And you go, “Hang on, what’s going on here?” Simple things like counting, it doesn’t do because it’s not a logical being. It’s not thinking through your question and coming up with an answer. It’s really trying to guess what a human would say in that place.
And so it has these kind of problems such as in curiosity, it sometimes just responds to what you say rather than what’s right. So if you say, “That’s wrong, try again,” it’ll go, “I’m sorry it was wrong” even if it’s correct. It can be negligent in some cases, like it just goes, “I think I’ve answered the question. That’s fine.”
There’s also people that mistake it for an information repository, so it’s the second AI syndrome where they’re like, “Oh, I can ask any question. It gives me the output.” It generally has access to things that it can grab, so the internet or documentation you’ve given it access to and the things that it’s learned. It doesn’t have a good way of differentiating between the two. So it can just hallucinate things. It can come up with facts that aren’t real. It can invent things. And it doesn’t know it’s doing it, so you can’t stop it.
A lot of people see it and they go, “This thing is able to write code. And code is a precise thing, so I’m going to get it to do things like guessing the value of a mortgage application.” It’s terrible at that. It’s very poor as a precision instrument.
And the last misunderstanding is sort of broadly people misunderstand how they learn. They’re not trained to answer logical questions by breaking it down a bit like how we were taught in school to solve a math problem where you go, “Okay, this is a math problem, I’m going to decompose it and I’m going to come up with an answer.” Or even if you were taking a literacy course and you have to take the statement by the author and provide logical reasoning as to why that statement is true or false. AI isn’t trained to do that. AI is trained to say, “Here’s a sentence, this part of it is hidden from you. Guess what goes here? And now you multiply that out by a very large scale of data and this is what you get.” So what’s the impact of that syndrome? Well, sometimes you think, “Okay, well this thing is not actually understanding. It’s not logical. It’s not reasoning. It’s not giving me answers that make sense.” Correct, because not what it was designed to do.
David Colwell: So these are some of the big things to look out for.
Amanda Razani: Yeah. So it sounds like education is key here and understanding exactly what the technology can and cannot do, and then better communication when it comes to the implementation. So how do we solve that issue? Should there be a special department for training and education, or what solution is there?
David Colwell: There are some freely available courses that will kind of give teams at different levels. There’s a really good one from Google that’s kind of introductory to the topic that I would advise any leaders go out and watch. It’s only like a max four hours worth of educational content. There’s more detailed courses. Again, some by Google, some by Udemy. They’re generally pretty good for the implementation teams where people have to go in and actually use the technology. Because the more you’re aware of how it really works, the more you can use it for its intended purpose.
And for that purpose, it’s very, very good for translating a human’s request and intent into a set of activities that need to be done for saying, “Hey, you asked me to create a test case in Tosca. Okay, I know how to drive the app. I know how to do that. I know how to take your story about what you want and turn it into logical components.” So for that kind of understanding and component breakdown, it’s a really phenomenal tool. And if you understand that and you’ve got that basic education, then you can start down the journey.
A lot of larger organizations are starting to have an AI office at the top level, so CAIOs, which is I think the first time we’ve had a C level suite that’s had four letters in the acronym. But that is becoming more and more common as a person who has the responsibility not of developing all of the AI technologies, but being responsible for making sure they’re used well, used properly, and that they are used in a way that is not going to cause harm or loss to the business.
So it’s sort of an education, regulatory and innovation function that’s being embedded into a lot of our customers. Actually, I think, I’m not sure if it’s in the executive order or in the EU regulation that they’re requiring that in some of their departments. So I think it’s the executive order that says that you need to have one of these offices, somebody who is accountable for making sure that that education is propagated and that that function is centralized and continually improved on.
Amanda Razani: So moving forward, I know you spoke about a lot of the things AI can’t do at this time, but this technology seems to be advancing really rapidly. So what do you foresee in the future as far as this technology and how business leaders will utilize it?
David Colwell: I find it very difficult to put a limit on where this technology could go, because every limit that has been proposed up until now has been broken in a matter of years. And so we used to say, “Oh, it will never be able to hold a reasonable conversation with people for any length of time.” It’s like, “Well, that was accidentally proven wrong when people can’t tell if they’re talking to a bot anymore.” In fact, I was on a support call with an organization the other day and I had to confirm that I was talking to a human because I thought I really don’t know if I am, and the only way that I could do that is just saying, “Can you please confirm that you’re human?” Because these AIs have been trained to not say that, and so I thought maybe that was a good way of guessing.
So I don’t see at the moment a fundamental limit. And the reason that I don’t see that is because the restrictions that I’m talking about, like, okay, it can’t logically reason, it can’t count, it can’t… Well it can kind of count. But these are small limitations and they’re being overcome by adding more specialized brains into the mixture that the sort of general understanding brain can delegate off to.
A great example of this is, if you asked the initial release of GPT 3.5 a math question, if it was a common math question that was online, it would be like, “Hey, here you go. Here’s the answer.” And it might be correct. If it was a unique one, then it would be more or less likely to give you no sensible answer. But if you asked GPT-4 the same question now, it gives you an answer and it’s like full math notation breakdown, logical reasoning, et cetera, because there’s kind of a specialized sub brain that’s focused on that task.
So I can definitely foresee in the future these micro optimizations of different areas growing and growing. And that’s something that we are actively investing in Tricentis. “Okay, well there are gaps in the current generative AI. Can we plug those gaps with areas that have specialization to do with testing? For example, understanding user interfaces or test data or whatever else it may be.”
And so I don’t see a structural limitation right now on how this will go. I see a societal limitation, that we are unlikely to allow the AI to be autonomous for quite some time. And a good analogy for this is probably self-driving cars. They’re already safer than most human drivers. We still don’t allow them out there for the fundamental reason of who’s responsible if it crashes. And so in the same way, if you’ve got an artificial intelligence that’s out there doing some job on your behalf, we will probably have it subject to the same supervision we would put a person under that’s doing the same job. So if it’s approving mortgage applications, then someone’s going to be reviewing those approvals of mortgage applications because that’s what we do with people. We don’t generally put people into a role and say, “You are an island, go do it. There’s no review on your work.”
So I see a societal limitation on where AI is going to be able to be used. But at that point, it’s already very, very well advanced. My projection is we’re probably getting into that territory within three to five years of it being fairly effective at doing most tasks independent of feedback or assistance from a human, but needing decisions to be delegated out to somebody that can actually take responsibility for those decisions.
Amanda Razani: Mm-hmm. Absolutely. So all that being said, is there one key takeaway you’d like to leave our audience with today?
David Colwell: I would like the takeaway to be, if you aren’t trying and experimenting with this technology yet, you should be. And if you have fallen off the bottom of that curve because you’ve tried something and you go, “Ah, it didn’t work the way I thought it would, it’s too unreliable or something like that,” then find somebody that specializes in generative AI to come and help you lift that up. Because the technology is powerful and it’s only getting more powerful, but getting onto that journey is going to take time. And the earlier you start, the better you will be at progressing along it.
Amanda Razani: All right. Well thank you for sharing your insights with us today, and I look forward to speaking with you again soon.
David Colwell: Thank you, Amanda.