Synopsis: In this AI Leadership Insights video interview, Mike Vizard talks to Fabio Soares, engineering manager for data platforms at Andela, about what's really required to operationalize artificial intelligence (AI).

Mike Vizard: Hello and welcome to the latest edition of the Techstrong AI video series. I’m your host, Mike Vizard. Today we’re with Fabio Soares, who is Engineering Manager for Data Platforms for Andela, and we’re talking about how to operationalize AI because, well, I think we all know what it is these days, but making use of it is a whole other ballgame. Fabio, welcome to the show.

Fabio Soares: Thanks for having me, Mike.

Mike Vizard: I think if you look at it, just about everybody’s figured out how to write a better email using generative AI, but now we’re into the next level. We’re trying to figure out how to take the data within our organizations, safely expose it to some sort of AI model, whether it’s one that we built or somebody else built, remains to be determined, and then execute something that we have faith in that will be consistent and reliable. What is your sense of where are we on this journey at the moment? What are our challenges and just how pervasive will AI get?

Fabio Soares: Well, my sense is that we are at the beginning of the journey. In Andela, we are a tech talent marketplace and we have a product called Andela Cloud, which is our product in which we match talent with job opportunities. To do so, we have talent profiles and many other pieces of information like their country, their time zone, their skills, and so on. And on the job site, we have a mirror of this, the same type of information, and we try to match them as best as we can.
And one of the problems we have in the company is to create pitches for our clients, which is basically a short summary, like a cover letter to present the talent to the client. And it would be wonderful if we could automate this process as most as we can. In the past, matches used to write it down this process from scratch, and now they’re trying to use OpenAI to do part of this job. But we still need a lot of human intervention for that because OpenAI tends to exaggerate or create a lot of false positive information inside the cover letter.
For example, I brought an example for you in which a talent that never worked in a healthcare sector was applying for a machine learning position in a healthcare sector, and one of the requirements was to know HIPAA and some other technologies. And then we had here the paragraph, that he knew Cohere, HIPAA, he worked with voice data, he had experience with healthcare applications and [inaudible 00:02:57]. All of that were hallucinations.
So managers are still in the need to review the cover letter before sending to clients. According to our internal statistics here, around 40% of those cover letters created hallucinated considerably and needed a strong review. 30% had some exaggerations. And just around 30% were really true, we could approve without considerable interventions.

Mike Vizard: So how will organizations get past this hallucination hurdle? Because I am not sure that we can all just use a general purpose AI model for everything. I think there’ll be times when we’ll want to customize it using our own data. Other times we may want to build our own AI model. So do we need to figure out the when and the where to use what? And how challenging is that going to be?

Fabio Soares: There is no silver bullet in AI. I don’t have the answer for this question specifically. We’ve been testing, but the major concept is that we need to give feedback to OpenAI model somehow, and there are several strategies to do so.
Our engineers are studying ways to give them feedback to improve the accuracy of these outputs. So far, we are not prioritizing this type of service internally. Instead, we are focusing on more straightforward solutions based on more traditional techniques like time series prediction, forecasting and so on to solve smaller problems, more focused problems like match fitness, to estimate the fitness between a job with a talent, to recommend them to managers, or to talent of course, vice versa. Another one, to estimate if the talent is going to respond a potential contact because we not only want to offer to recommend talent who are good fit, but also talent who are engaged and we’ll contact and talk with people and so on.

Mike Vizard: As you look at the AI landscape, we’re all excited about all things generative AI these days, but there are other AI models. So do we need to weave these all together so we make use of generative and predictive in the right ways or will one supersede the other?

Fabio Soares: No, no, no. I don’t think either wins or will supersede. They solve different problems. I don’t think LLMs will replace… Each of them has their own, let’s say, business domain to solve. However, LLMs are quite useful when we create a text with less ambiguous pieces of information. For example, when we want to create a talent profile based on their skills and talent, et cetera, the summary is relatively good, but we contrast them with boundary constraints, which is job requirements and restrictions. And when we pile a bunch of restrictions, that conciliation doesn’t happen well. And that’s why we need to identify the problems where LLMs are capable to solve like, create descriptions, job descriptions, talent descriptions and so on. Avoid a little bit the ones which are still not very mature and in which we need to give considerable feedback to make it right, like the first one I told you. And focus on traditional systems like predicting [inaudible 00:06:54] and start response and recommendation and so on.

Mike Vizard: Everything we’ve seen so far seems to have required a specialist with a lot of knowledge in data science. Do you think we’re going to see that a democratization of AI where developers will just be able to build apps and the AI model will just be one more artifact that makes up that apps? Where are we on this journey?

Fabio Soares: In 2015 when I really started to develop considerably in this area, I’ve been hearing about automatic machine learning. Sometimes they are useful. We can definitely reuse pre-trained models to solve problems, very repetitive problems like predicting out of stock of a given product in the next day. We have some pre-trained models that solve relatively well this.
Facebook has developed some libraries too… What’s their name? I forgot it now. There are many libraries too for time series prediction that we can reuse and expedite our solutions. But when it comes to very specialized… Prophet is the name of the Facebook’s library. When it comes to very specialized problems like match fitness, that takes into consideration a completely new business domain, which is very vertical.
We need to develop in-house systems that creates features for every piece of information, talent profile, the overlapping between the talent skills and the job requirements. Some of them are mandatory, some are optional. We must consider filters to avoid recommending people from the wrong countries because some job requirements specify that they need to work in the US or Canada, for example. So we need to avoid some countries. Sometimes also some people are willingly, they accept working in exotic hours from, for example, midday to midnight. So we need to consider all that. So it’s quite specialized. I don’t see how to democratize this in the short term, but for sure, repetitive tasks, yes.

Mike Vizard: What are you guys doing to operationalize AI yourselves? And I’m asking the question almost from a cultural perspective because there’s clearly data scientists, there’s developers, there’s cybersecurity people, there’s data engineers. How do you bring that all together to kind of create something that is meaningful?

Fabio Soares: It’s worth mentioning that it was quite of an interesting journey from two years to now because we didn’t know how to answer that question at the very beginning. We tested some, we hired different teams with different knowledge across the last two years, until we finally stabilized with the organization we have now.
Our team is small. We have just five machine learning engineers, data scientists and managers, including myself. We have our DevOps team responsible for the CI CD framework. They offer some automations that we can manage. So if I want to create an application today, I just need to copy a template, configure some cure nets configurations, like to configure the CPU, the memory, reverse proxy and so on. And then I can deploy our systems through Jenkins managed pipeline.
Then in my hands, there is just a responsibility to solve the business problem. Then I focus mostly on talking with product managers and directors to avoid ambiguous requirements to try to simplify them because by doing so, I can remove moving parts, remove dependencies to make our software more simpler and consequently more stable. And inside my team, we have two machine learning engineers, two data scientists. The second one focuses on experimental data analysis to improve and test new techniques and improve models, KPIs, whereas the first group, the machine learning engineers focusing on deployment. Yeah, I think that’s a decent overview.

Mike Vizard: How do you set reasonable expectations for the business? Because I think a lot of business executives are reading all things AI these days and thinking magical things will happen overnight. So how do you have that conversation with those folks in a way that doesn’t make it sound like you’re against AI per se, but I think you were looking for some sort of measured reasonable response?

Fabio Soares: I’m never against AI. I’m only in favor of realistic outcomes. There was a Fortune research recently that concluded that around 70% of C-level guys expected AI to revolutionize their business, whereas in the middle layer, like me, engineering managers, only 40-something believe in that.
And that happens because they are not very attached to new reality. That’s why when I mentioned that I need to talk with them very often is to translate their requirements into real requirements and sometimes prune them, put them into reality. And sometimes even change their mindset. There is in fact a very difficult problem internally here, which is how to measure liquidity.
We have different perspectives about how to do so. C-level want just to cluster some similar talent and create statistics based on that, but when we do so, and we don’t pay much attention on how homogeneous are the clusters, we may for example, mingle people from a country that has a surplus of a given talent with another country which has a deficit. Then on average it seems to be okay.
Then we focused on developing a bottom-up approach to solve partially the problem by identifying which jobs don’t suffer from liquidity problems and vice versa, and delegating to them how they want to build that cluster, if by country, if by languages, if by areas and so on. But this second problem is still very tough and they are still asking our help to develop that taxonomy that makes sense. And as I mentioned, it’s not a silver bullet. We don’t know which taxonomy is best suites to solve this problem, and we are still researching this.

Mike Vizard: What do you know now that you wish you knew when you first started out, as you look back in time and go, hey, my life would’ve been a lot easier if I had have thought about this earlier?

Fabio Soares: When I was young, especially when I started my consultancy days, I accepted my client’s requirements and try to develop them. Today, I don’t accept. I question and try to simplify the requirements as most as we can because the most dependencies of software has, the more unstable it is. Simplicity is the utmost sophistication, and simplicity starts from the top. We cannot eliminate complexity, but we can select which complexity we want to deal with. And I prefer to deal with the human level complexity, which is by eliminating ambiguity at the very beginning than bringing it to the software and create a bunch of clogs and unstable software that won’t ever work. I think this is the major lesson of my life.

Mike Vizard: What is it that you hope that the vendor community and the people who provide the tools and the platforms that you guys use would think about or appreciate more than they do today? I mean, it seems like there’s a lot of heavy lifting still involved. What’s on your wishlist for 2024?

Fabio Soares: I think my wishlist won’t be accepted. For example, in terms of… Let’s get an area. Let’s select an area. Pipeline management for data engineering tasks and data science tasks. We have a lot of tools in this area. We have Kedrel, ML Flow, Airflow and many, many others, Brief [inaudible 00:16:07] and so on. It takes around three weeks to evaluate each of them, and when we do, we usually notice that the propaganda… I wish I lived inside their propaganda because they pretend to believe that, us to believe that they are going to solve much more than they really solve.
When, for example, we evaluated Kedrel to organize our pipeline and see, so that we could see the graph and trigger only some parts of our pipeline. It was incredible. But when we tried to deploy this into production and we needed to have real-time requirements, we faced many problems like we needed to have a connection pool with our databases because we need to offer real time recommendation. But Kedrel needed to create a context, which means establishing connection with the databases, and this takes a lot of time, and every time this connection needed to be recreated. And then it is frustrating that they solve the solution, but bring a new one that basically makes their software unusable.
So as I mentioned, there is no silver bullet. I think our life in this field is difficult. We need to assess every tool for a couple of weeks individually and create MVPs with them, analyze to which scenario they are useful and select the ones that feeds our purpose.

Mike Vizard: All right. Folks, you heard in here, there’s no shortcuts. The more magical something appears, the harder it was to build. Somebody actually went to all the time and effort to do that on your behalf to hide that complexity, and that’s the challenge of the day. Fabio, thanks for being on the show.

Fabio Soares: Thank you, Michael,

Mike Vizard: And thank you all for watching the latest episode of the Techstrong.AI video series. You can find this episode and others on our website. We invite you to check them all out. Until then, we’ll see you next time.