
When generative artificial intelligence (AI) entered the scene towards the end of 2022, it won over the crowd almost instantly. Now, more than two years after the breakthrough, we have a ton of lessons learned, a medley of corporate AI-based solutions in production, and an adoption rate that continues to grow.
This article looks at the opportunities and challenges of enterprise GenAI from the perspective of top technology decision-makers, and aims to provide a CXO-level view on the subject.
Corporate GPTs
Enterprises worldwide have embraced AI chatbots as sidekicks for their employees and collaborators. Quick and easy to deploy — just add the corporate credit card information and you can start using immediately. The cost of these bots considered universally acceptable is at around $20 per employee today. Thanks to that, at least nobody got left behind in the AI race.
However, what’s less publicly talked about is that high usage of private chatbots has led to unregulated use of corporate documents for training models, resulting in intellectual property leak and privacy violation.
Many companies have opted to build wrappers around public OpenAI APIs with special contracts not allowing technology vendors to use the data to train their models, plus advanced prompt tuning and output checking to minimize hallucinations and unwanted behavior. Adding corporate document repositories integrations (uploading, linking) to this has brought enterprises past the first major milestone in generative AI adoption.
The alternative is to use open-weight LLMs (like. Llama 3.x) and serve them using an internal GPU-powered infrastructure (cloud or on-prem). The chat performance may be slightly inferior compared to GPT, Claude or Gemini, but privacy and data leaks can be prevented with this as the data never leaves the local infrastructure and therefore, will not be used to train the next iteration.
The Dream of Corporate Wisdom Access
With the decades-old dream of being able to talk to some form of intelligence based on petabytes of corporate and market data, and get relevant bytes of information now closer to realization, we are witnessing the meteoric rise of retrieval augmented generation (RAG).
Generically trained chatbots suffer not only knowledge cut-offs but also hallucinations which lead them to respond with generic or even irrelevant information instead of basing answers on truth and evidence-based facts from documentation and corporate databases.
Advanced language models’ intelligence is combined with the precision of factual information retrieval in the right context. Although they may deliver great value, RAGs are universally considered hard to do properly. They require complex pipelines, retrieval, generation algorithms selection and tuning — not to mention, constant and costly maintenance. The projects are often delayed and exceed their estimated costs, including monthly operational costs, and yet the output quality remains an area of constant improvement.
Users are still cautioned about the outputs and are advised to double-check sources and follow the reasoning path of an AI agent and verify it. These reduce the theoretical efficiency gains, and the overall user feedback is largely positive.
There’s a ton of experience gathered already on how to build RAGs effectively. More and more best practices and patterns are being added on top of that. However, they don’t translate as easily as enterprises would like. The dream of Auto-RAG is still miles away.
Skills required to deliver robust and secure RAG-based solutions include data operations, IT operations, cloud, software development, testing, data engineering and data science. These multidisciplinary projects are very ambitious, and many bring great results, yet, they are not guaranteed to deliver a return on investment.
And the pesky and everlasting problem of corporate data quality resurfaces again, this time in an even trickier way. In the classical data processing technologies, the lack of output or garbage output was much easier to spot and detect,. Nowadays, almost everything looks valid and convincing, despite lackluster data quality in many real-world applications.
Technology, Business and People
Expectations of productivity gains increasingly seem to overlook human adaptability, as though people would not respond to the introduction of a new technology.
Direct productivity gains are kept inflated by turning a blind eye to the output quality. We get responses ten, hundred, thousand times faster but usually we have to iterate multiple times until we get the desired output, which is often then edited manually. Still, there’s a gain if not as high as aimed.
Moreover, everyone is expected to be a prompt engineer to use large language models (LLMs) efficiently. This requires constant training and includes prompt generators on top of specified goals and user questions. Prompt generators based on open text input came as a surprise for many, because LLMs were supposed to be ready to serve out-of-the-box, especially with fine tuning for business domain and company-specific prompt tuning. Yet, here we are, with prompt generators on top of free text provided by the users.
There’s a dilemma related to prompt engineers as a separate function. The opinions are divided; I tend to agree that business domain experts should become domain-specific prompt engineers and coach their teams.
The First One Loses?
As with technologies before GenAI, early adoption might result in business losses. For example, companies that invested heavily in consumer-facing chatbots based on GPT-3 often had to shut them down because of hallucinations, cost and loss of reputation.
Of course, others claim was worth it because the experience they gained helped them deliver significantly improved experiences with newer technologies and models. In my view, the “wait and see” approach is not chosen too often, as nobody wants to miss opportunities. Businesses are more keen on wanting to at least give it a shot by delivering focused proofs of concept (PoC) to validate solutions in their specific context.
Enterprise Software Will Never Be The Same
The impact of GenAI on enterprise software development must be recognized. A commonly shared observation is that senior or lead developers, when assisted by various AI tools, can achieve a much faster time-to-market than by working with additional regular and junior developers. A developer who knows exactly what is needed, can control the output and iterate efficiently, becoming 10 times more productive. Also, by reducing the communication cost associated with managing additional team members, expanding the team is no longer necessary.
Improvements are also visible in security code audits where finding vulnerabilities in the code becomes speedy and greatly accurate compared to manual reviews. Data generation is another great example of how GenAI speeds up the key component of every data project including but not limited to test data generation. Automated test generation is better than ever before, much more contextual and deeper than the simple algorithms of the past.
Troubleshooting now accompanied by log output plus source code often leads to a massive reduction in time to fix the bug or overcome technical issues (i.e. broken dependencies).
Of course, the opinions about how large the improvements are vary, but the most popular range is between 10% and 40% of workload saved, which greatly translates to cost savings and faster time to market without jeopardizing the security and quality of software products.
There’s also an ongoing debate about how deeply AI has affected the labor market for IT professionals. In many cases, employees were not laid off; instead new hires were stopped due to AI-driven efficiency improvements.
Dilemmas and Challenges
Cognitive Overload, AI Fatigue
CIOs and CTOs are constantly bombarded with new model announcements, demos, benchmarks, irrelevant examples, conferences, AI-based product ads and API wrappers. This leads to a cognitive overload while provoking constant fear of missing out on what is latest and probably greatest in the space.
This requires a cool head approach to filter out the noise and analyze the news stream by asking the right questions, including “What’s in it for my organization?” and “Will this product be available in six months?” given how many AI startups are out there.
AI vendors present demos usually totally disconnected from enterprise needs, focusing instead on consumer applications, such as shopping, ordering food, etc. These demonstrations are almost entirely end customer-focused rather than corporate-oriented.
The narrative of Big Tech are primarily targeted mainly towards investors rather then actual users and enterprise adopters. These companies seek additional funding with which they want to buy another tens of thousands of NVIDIA GPUs, and maybe even invest in a large-scale projects like nuclear plants. That’s why CIOs need to read these messages in their actual context and translate them to address their organizations’ current and near-future needs.
Lack of Transparency
Lack of detailed model change logs and info about training data is the norm in AI, unfortunately.
Sudden drops in performance between versions of the same models may occur because vendors improved (their internal) model cost-efficiency at the expense of performance. Moreover, models are getting more and more censored with each release, which may lead to prompts working a week ago to not working anymore with a new release and refusing to generate answers.
CIOs/CTOs and LLM engineers have to bet on black boxes to a lesser extent when they choose open-weight models (often improperly called open-source models), when working with external APIs, without direct access to a model.
A whole new area of testing model performance has emerged, as the companies cannot rely on generic benchmarks based on students’ university tests ,etc. Their data and goals are different, so the different models will perform differently in their environments.
Small Language Models, Local LLMs, and Public API Dilemmas
The dilemma of local language models vs. public APIs is another theme of heated debates in the industry. Usually decision makers tend to go with a hybrid approach, for example, picking small language models (SLMs) with specific tuning for on-device deployments, local large models for RAGs and public APIs for corporate chats.
Choosing the appropriate model, tuning parameters and prompting strategies for each purpose is a set of tough decisions and experimentations. While testing and experimenting can lead to improved results, they also result in longer development times and higher costs. Additionally, even after selecting the best solution from three months prior, there’s almost always a newer, more tempting version that promises better performance or smaller model size (for instance the latest 7B models may perform comparably better to older 70B models; advanced quantization techniques can reduce model size; and domain-specific tuning, for instance, medical Mitral models, can enhance performance in specialized fields).
2024 shows how fast open-weight models got closer to the market leaders, hidden behind APIs (black box approach).
Data privacy, operational costs, lack of upfront investment, better models vs. lesser models, local infrastructure and local maintenance and development (including model tuning, prompt engineering and data preparation) are really hard choices to make. The experimentation always welcomed the novelty and questioned the old way each time. Business requires more stability and predictability; a balancing act is the key.
AI as a Software Product
We tried to escape it, to get consumed by the dream of a brighter future. And here we are with corporate AI projects that are a new type of complex software data-driven project.
The proper requirements analysis takes into account both expected goals and technology capabilities, and economical constraints, such as operational monthly cost. Also, we can’t skip solution architecture and its fit into enterprise architecture. We need component design, quality and speedy implementation, testing that is much different from predictable transactional systems. There’s no way around proper user experience, developer experience, deployment pipelines or data pipelines.
Corporate AI solution is another, even more sophisticated software product, that joins an already crowded and complex set of corporate systems.
The primary challenge for CTOs is effectively communicating the complexities of AI tools to businesses. While ChatGPT and Microsoft CoPilot are fast and easy to use, their underlining complexities are often overlooked. The illusion of simplicity comes from a nice user experience we got used to. What’s happening behind the scenes is much more complex and requires people, skills, and focus.
“It’s The Economy, Stupid”
How do you measure AI programs and project performance? Vendors and companies that are invested in AI throw bold claims like 10 times productivity improvements every day.
There’s this internally shared thought, that there’s at least one company that benefits from all of this, and it’s called NVIDIA, an equivalent of a shovel factory in the age of the nineteenth-century gold rush. The losses reported by all AI startups bring the question of an AI bubble that refuses to burst for another consecutive year, or whether there’s any bubble at all. Not everyone will survive the race to the bottom, that’s for sure, it always has been the case.
In the case of the lowest-hanging AI fruit, where GenAI does an undoubtedly great job, specificallly code generation, the real productivity gains are enormous on the surface. But once the more cooled-down, scientific approach is applied, companies report low tens of percent of productivity gains for their software development teams. Still, a great result, and great value for business, that would not be possible without LLMs — yet far away from the bold claims of the vendors.
The more advanced the case based on local data from multiple sources, the fewer expected gains and more risks are associated with it. As IT budgets are going through optimization every year, the question is what to reduce or cut to make room for another AI-driven initiative. There are even cases where companies significantly reduced quality assurance teams for critical business systems, and then suffered from business performance and reputation. No amount of “we do AI too” can help in such situations, the damage is done already.
There are also ideas that AI projects should not be at all a part of the IT projects portfolio but separated as something special and different. There are even voices that the return on investment should not apply to AI projects. Others disagree with that approach sticking to economy laws that outlasted many technology hype cycles.
Rational management of experimental streams is recognized as the way to keep the hand on the pulse. In the local enterprise context, without burning too much money, relevance, business context-aware proof of concepts is a great way to deliver practical guidelines on what to invest right now and where to expect the most gains. Nobody wants to miss anything from the AI train that could be possibly relevant to them. On the other hand, there are serious business constraints, as AI is not cheap for both technology vendors or its corporate users.
CTOs also have to keep in mind that other technology trends are on the horizon, such as the evolution of traditional machine learning technologies, federated learning and a growing number of security and privacy threats that need to be addressed properly.
Data Protection Issues
AI models tend to memorize more data more accurately than previously predicted. The data used for training can be leaked with sufficient precision to identify people or leak company secrets. This not only violates regulations but also compromises unique business advantages.
Copyright problems may even come from using publicly available open-weight models as we don’t know for sure which data it was trained on and when and what can leak accidentally. Model runaway attacks – making models leak their proprietary information – cannot be prevented. Despite the efforts to reduce their likelihood, there’s always a new exploit after the previous problem was fixed.
Organizations must constantly balance the ever-growing regulatory space across different countries and de-globalization trends that lead to increased protection versus the business value of AI-based solutions.
Trends
Cases and experience sharing
There’s a consensus that learning from previous projects, even if they didn’t achieve expected results from an output quality perspective or an economical perspective (i.e. operational costs higher than direct business efficiency gains) is the key for the next generation of projects to improve the outcomes for the business.
Companies try to borrow business-relevant ideas from each other; the usual area of interest at every conference and meeting is to learn from each other what works today and what is likely to work tomorrow. For instance, it is becoming less common for companies to train their own models. Instead, they find it more effective to start with an openly available (or commercially available) fine-tuned model and then do a proper prompt engineering. Of course, this is not a general rule, but a trend.
New Competence Models
GenAI revolution also leads to redefinition of roles and positions within the organization. After the initial hype that only prompt engineers and a large number of GPUs would be necessary to deploy AI, now we learned that all the roles are here to stay and collaborate in new ways. The great examples include RAG projects, which are interdisciplinary and require multiple roles to perform both traditional and new tasks and activities.
Uncertainty to be Continued
A common complaint is that public models are becoming more controlled and censored, getting ‘dumber’. Additionally, hallucinations are getting even harder to detect due to advanced model performance.
How do you convince employees to verify the outputs thoroughly? There are multiple answers to this question, but none of them is considered a good answer.
Not Just Chatbots but Agents to the Rescue
We got used to interactive conversations with chatbots, both public and private. We tell them the problems and after a few seconds, we get the answers. This trend is only going to improve.
It is worth noting that we are observing two opposite trends. Real-time duplex conversations allow users to interrupt the bot responses and add relevant information to the conversation. It’s a game changer from a user experience perspective, offering benefits such as, milliseconds-level responsiveness; ability to skip waiting for the bots entire response, etc..
On the opposite side, there’s a growing faith in a long-running agentic AI that is given an advanced task, that requires a more lengthy process of orchestrated or choreographed data retrieval and reasoning. Instead of seconds, agents come back with answers and outputs in minutes or hours but with much greater accuracy and relevance, while allowing the user to move on immediately to other activities after defining the task.
Let AI Use Company Apps and Data
Another trend that can be a game changer is allowing AI agents to operate as computer users, interacting with applications and corporate data even more efficiently as if they were human users.
It brings analogies to Robotic Process Automation (RPA), but in a new, much more advanced way, recognizing everything that is on screen, reasoning based on the goals of the tasks, and defining its actions and workflows, including trials and errors to reach the goals.
The current performance is subpar but many believe that the future belongs to combined teams of humans and AI bots using existing IT solutions in a more efficient way than ever before. The question is more about when, not if.
Efficiency Improvements
API costs per transaction are dropping which is a great trend as it is going to reduce the operational cost of AI-based solutions.
Small language models of 2-3 billion parameters that can run on smartphones and laptops are rapidly becoming viable options for specific use cases due to constant improvements in their performance.
Memory requirements of LLMs are being reduced, due to various techniques, including quantization, allowing running complex models on modest hardware without requiring vast amounts of GPU memory, which includes gaming-grade laptops and PCs of today.
Energy consumption is still a challenge, with more investments in building dedicated nuclear plants than making improvements to the energy efficiency of AI models themselves. Contrary to the claims of some AI enthusiasts, AI and green computing are moving in opposite direction. While the advice is to use AI less to save the planet, this approach it’s not exactly what we’re looking for.
AI-Based Products as Software Products
As mentioned above, it is now clear that successfully delivering and maintaining enterprise-grade solutions requires proper processes, skillsets and technology sets. There is no way around these essential components.
The trend here is more about improvements in all areas (technology, people, process) and universal understanding. This is the key, and there’s no universal general AI that is going to do that for us.
Survival of the Fittest
The race to the bottom is a natural phenomenon at this stage of technological adoption. For so many startups, money is not as cheap as it used to be, still billions, trillions of dollars are pumped into the AI space.
How long can it last? Who is going to survive?
Many CIOs want to make sure they reduce vendor lock-in, others focus on the immediate value and defer safety measures for the blurred future. There are pros and cons to both strategies.
Future
There is not a lot of universal truths for enterprise AI adoption, and this is very company-specific and has to be carefully thought through for each particular business case even in the same industry. Data ecosystems are different, corporate app ecosystems are different, and processes are specific in each organization.
There’s a long way to go before Automatic RAGs and automated AIs of the future will arrive (if ever). So let’s end this with the advice not to spend time listening to overblown announcements of CEOs of AI startups or big tech companies, but to work with other technology and business leaders on what works today and and will tomorrow.
Keeping the mind open in experimentation and POC stream while avoiding starvation of other emerging technologies is critical. Again, we’ve been there before with the PC revolution, internet , mobile and machine learning revolutions. The same is going to happen to GenAI, all the experience and knowledge gathered today won’t be in vain. The key question is how to balance the experimentation that helps to avoid missing the opportunities with a focus on delivering value in a short timeframe.
A fresh perspective may come from the outside while working with independent technology partners, not biased by the need to sell their AI product but by fitting the existing technology ecosystems to your specific needs of today and tomorrow.