Synopsis: In this AI Leadership Insights video interview, Mike Vizard speaks with Paul Sparta, CEO of Vbrick, about how AI will be applied to video.

Mike Vizard: Hello and welcome to the latest edition of the Techstrong AI video series. I’m your host, Mike Vizard. Today we’re with Paul Sparta, who’s CEO for Vbrick, and we’re talking about how AI will be applied to video because, well, it’s all been text-based so far, and maybe the world’s going to change with other types of content. Paul, welcome to the show.

Paul Sparta: Thank you for having me. Great to be here.

Mike Vizard: What should people expect in 2024 in terms of how we’re going to see AI apply to video? I mean, we talk a lot about deepfakes and these other things, but there’s a lot of nuances here. So walk us through what to expect in terms of how we’re going to create this content maybe easier and simpler for all involved.

Paul Sparta: Well, I think everybody’s used to these ideas. Image recognition has been around for a bit. It’s an expensive process, but it’s very difficult to apply at large scale. And we’ve seen some really great use cases with video for things like license plate recognition, facial recognition, which is a feature we’ve had for quite some time. But when you look at the problems that are facing companies today in terms of video and AI, the biggest single issue is the sheer volume of video being created today.
And the largest source of that is exactly what we’re doing right now, which is video conferences that often they get recorded automatically and then, poof, they go away. And everything that almost happens in a video conference related to an organization probably has some value or even some potential risk in it. And whether we like it or not, it’s getting documented by the provider, whether it be a Zoom or a Microsoft. And we’re just scratching the surface based on even our internal statistics of what enterprises are doing in terms of the speed of the content generation. So the biggest problem is simply where can I put it all, and how can I determine what’s on it and how can I apply? How can I use AI to curate, analyze, summarize this stuff, because it’s beyond the ability, it’s being generated at a level too quick for people to manage it.

Mike Vizard: Do you think people are getting sensitive to the cost of all that video as well? It seems like it’s one thing if I’m Paramount pictures or CBS or somebody like that, where I build that into my business model. But I think a lot of companies are now looking at this going, holy moly, how do we store all this stuff?

AWS

Paul Sparta: Well, I think they’re really not expecting it. And the funny thing is, media companies, quantity of video, even if you count Netflix, pales in comparison to actually what organizations generate. They don’t put anything out there. What we’re experiencing in the consumer world, they don’t put things out there that are not professionally produced media. Generally speaking. This is much more like YouTube or TikTok, only in an enterprise scenario where huge amounts are getting generated. We have large enterprise financial services customers, large enterprise manufacturing customers. These are global corporations, excess of a hundred terabytes of video, and the rate of growth is doubling every year in the amount of video that’s getting ingested. And it’s not as simple as just saying, “Hey, don’t save all this stuff or get rid of it.” You’ve got client conversations, regulatory issues, HIPAA compliance. So the good news is, hey, we’ve got a lot of value trapped on video.
The bad news is, you’re going to have to deal with it one way or the other. So the generation rate of video in the enterprise is so high now, and now you’re talking about things like wearable glasses with two 4K cameras on them. You can’t even wear that into a hospital if you’re an employee, gets an instant HIPAA violation. I mean, as soon as you see patient’s name, if you record that, that’s PII, I mean, that’s private healthcare information. So it’s crazy.
And I think what we’re seeing from the companies that are the early adopters is if we don’t get policies, technology, and a strategy in place to manage, distribute, federate video, control it, create policies for proper destruction, all this sort of stuff, they’re going to have very big difficulties. Not to mention, how much value is trapped in all those videos, all those conference calls about major corporate issues, and customer support issues, or R&D issues, all of that value, all the stuff that’s recorded there, unless it’s very carefully conserved, just goes away. So we’re using AI to capture all that and make it relevant to whatever business process there is. So it’s going to be a bit of a revolution, I think. It’s just a delayed revolution compared to what we’re used to with consumer.

Mike Vizard: So exactly how will AI get applied to this? What can I expect to see? Is it going to help me find certain video segments? Is it going to flag certain things? Give me a use case.

Paul Sparta: Well, let me give you a perfect example. Let’s say I’m within an organization, I’m just going to hypothetically say it’s a large organization, has an R&D department, has a pipeline of products coming up. Could be an auto manufacturer, could be a pharmaceutical manufacturer, could be a financial services company. There’s going to be hundreds of people that are going to have meetings and discussions on what those items are. Those items are not public information. They don’t want to be disseminated. So our AI, for example, would take all the video ingested from all those meetings, all of the professional video that would’ve been done for training or education purposes internally. All those are going to be transcribed, all those are going to be translated. And then we’re going to run LLM models against them to actually summarize all that information and pinpoint it right down to the minute.
So if I have somebody who I could build a business rule, for example, that says, let’s say our new product is called the XYZ special. Well, if anybody says XYZ special, I want that immediately flagged. I want that to be sent as a notification to somebody who curates and say, how many times was this thing mentioned? This is our new secret compound here that’s going to cure disease X or that we’re getting ready for trial. We don’t want anybody to know about it. So it could be something like that. Or it could be something that has to be protected because it’s privileged client information. That’s one example. But when we flag those things, immediately you’ve got a list of all the video that contains it, exactly where in the content that video is, the ability to immediately click it, go right to that moment, see what it is, understand it, look at the literal transcript versus the AI generated summary, what was it discussed? And the AI generated summaries are very important. You have a two-hour call, and the AI generation summary will tell you, Hey, this was what was discussed. This is who talked about this, this is who talked about that. And it’s going to give you a ton of information to get access to it without having a human being, having to manually do all of that stuff.
So the result, if you didn’t do it, is that value just kind of goes poof. Nobody’s going to look at those recordings anymore. They’re gone. So there are a huge number of use cases around that. But to your point, one of the keys is, and it’s very similar to the use of AI that you see in healthcare and other areas, is the objective is to quickly get video so that a knowledgeable human can get right to it and just make a decision. Can we get rid of this? Is this of value? Do I want other people to see it? But what the humans can’t do is scrub the hours and hours and hours and hours of stuff that’s being generated on a daily basis.

Mike Vizard: Will it flag sensitive data conversations, because it seems like a lot of video has conversations about intellectual property, or to your early example, compliance issues. So how smart and smart get?

Paul Sparta: It can get pretty smart and pretty custom in its definitions. We have a large financial services company that’s a household name that records all of the private wealth management conversations. So every client conversation, every analyst briefing, they’re all getting recorded. And the first thing they’re looking for is to make sure, is there any PII on there? Was there any personal financial information that was discussed? Those sorts of things. And it’s smart enough that it can actually do those sorts of things. And some of that can be, you can transcribe some of that out and you could federate it off to another AI application and send it there if you want, but at some point you’ve got to take source video, consolidate it, strip out the text, and provide and do whatever magic you’re going to do upon it.
Now we use most of the AWS models and we’re very concerned about our own enterprise security that the models never cross. There’s no cross-training of models, there’s no training of models other than for that one specific client. We stick to the generic, because again, security, privacy, and risk are huge concerns.

Mike Vizard: We’ve seen people start using tools that are AI based to create video, and a lot of it is still early days and it’s usually only a couple of minutes long. But what do you expect long term in terms of the ability to produce videos?

Paul Sparta: Well, I think it’s already getting easier and easier and easier. And I think we’re going to continue to see that bar lower. Right now there’s some great AI videos to do animations and imagery, and those sorts of things where you can give people simple scripts and it’ll actually create a cartoon for you. Those sorts of things, which are great for instructional video and certain sorts of scenarios. But we’re also going to see AI editing advance very quickly of live video. So people recording videos or various snippets and saying, “Hey, take all my video and make this story out of it and cut out the extraneous stuff.” So we’re going to have hybrid extemporaneous video that’s going to be able to have AI used against it in order to create it into a more polished consumable form. And these tools are only going to get better and better and better.
And at Vbrick, we’re not focused on that side of it. What we’re concerned with is the derivative problem. How do you manage it, where does it go, and what are large companies going to do to get their hands around this stuff? I kind of think that the classic analogy of we want to be the rail lines, bus lines, and roads and infrastructure for clearing depot for video for a large organization, but we’re not there to say, what kind of cars do we want to drive and how do you build the trains, and what color do you paint them? So the content creation, a completely different space, but something that we’re intimately tied with in order to do what we need to do, which is provide a way to manage it all.

Mike Vizard: As you look down the road, and we’re in the early stages of 2024, what do you think the big issues are going to be as it relates to video and AI going forward? What are we not thinking enough about?

Paul Sparta: I think what we’re not thinking enough about is how AI is being used in many, many spaces as a value extractor. We have more and more complexity in the world. We’ve got more and more problems that are difficult to solve.
One of the things that we’re doing with ServiceNow that’s really, really cool, and a lot of people in the corporate software enterprise software world knows ServiceNow is, they got their start managing customer support. So if you just take that one thing in ServiceNow, extracting value and mining value out of that video is important because on that conference call or in all that extemporaneous video that could happen, might be the solution to somebody’s problem.
You’ve got a customer on the line, for example, and they’re asking about, “Hey, I got this error code and I don’t understand where it’s coming from.” And yeah, you want to search articles and those sort of stuff. We’ve been doing that for years, for 20 years since the internet. But smart searching and actually going through, hey, this topic was talked about in a conference call in engineering overseas two hours ago. Let me see if this is it. Aha, somebody just pointed this out. This is a brand new problem, but I have the solution, because I can go in and find that exact thing in video, look at the source information, even take the clip and send it to the person and say, we can release this to you. This is what was talked about. I hope it solves your problem.
And that same scenario applies to whether it’s fixing an industrial pump, repairing a crane, fixing an airplane, designing the next circuit board. I mean, it’s a universally valuable capability, but it doesn’t work if you don’t have all the video in a place where you manage it and can understand what’s there and also understand how to get that value extracted out of it with AI.
So I think that’s the thing that’s vastly underestimated, because speed to resolution and speed to relevant information will go up massively. And I think once the light bulb goes on, this thing kind of explodes.

Mike Vizard: Who’s in charge of managing all the videos then? Is there a job function in here or where is this person?

Paul Sparta: Well, this is kind of an odd situation. That is actually a super insightful question. Most enterprises, large businesses, have legacy people managing sort of boutique media and the occasional when we do live streaming. So they’re stuck in there in the UC part of the enterprise. But they’re not worried about, they’re there to make sure the networks work. And yeah, so we can have conferencing systems turned on. And our Microsoft Teams works with everything and they’re in that organization. And yes, now they’re being asked, can you give us this video enterprise wide?
But when you get to the business equation, right? The real value, business process equation, there’s not anybody. It’s not really part of IT infrastructure. There’s a component of it that’s significant in IT infrastructure because moving video is non-trivial. The reason this hasn’t happened before is not just because of the lack of AI, it’s because of the difficulty with corporate networks of managing video versus text base or asynchronous smaller forms. And so, a lot of organizations are trying to figure this out as we speak. And what’s happening is they’re realizing this isn’t just a UC problem, this is a business process problem. It’s more like it’s closer akin to document management and business process workflow functions only with the very difficult added complexity of ginormous files, difficulty in transport, the difficulties that the physical format, or the digital format of video provides, the network challenges that it has, it’s really much closer to that space than it is meeting software or something like that.

Mike Vizard: All right, folks, you heard it here. Managing video is probably going to be a lot harder than the creation of the video, because we’ve already reached a point where it’s pretty simple. Challenge will next be is, what AI tools can we use to maybe save ourselves from ourselves? Hey Paul, thanks for being on the show.

Paul Sparta: And Michael, thank you very much for having me. It’s been great. And look forward to talking again soon.

Mike Vizard: All right, and thank you all for watching the latest episode of the Techstrong AI video series. You can find this episode and others on our website. We invite you to check them all out. Until then, we’ll see you next time.