Operationalizing AI

On the third day of Operationalizing AI in Boca Raton, leaders talked about large language models (LLMs) and how people might want to do some experimenting. Therefore, the below exercises offer the ability to try out some LLMs from the comfort of your own computer:

LLM Practice From Home

First, you’ll need to spin up a server on Amazon Web Services, and connect to it to try out these examples.

Then you’ll use a python library called Sentence Transformers. Sentence Transformers provides access to several different LLMs, through which you can query documents, searching for sentences that are similar to a sentence query.

The library does the hard lifting for you, using an LLM to calculate embeddings on a list of sentences you provide. Then you can provide a query sentence, and the library can locate the sentence that’s closest to the one you provided.

But what does “closest” mean? Consider these two sentences:

A cheetah is running behind its prey.

I saw a big cat following an antelope.

These two sentences are clearly similar to a human; however, it would be difficult for a computer to determine if these sentences are similar without access to a large language model. That’s where Sentence Transformers can help.

Before trying out Sentence Transformers, get a server running on Amazon Web Services that has an NVIDIA GPU card attached to it. Yes, that’s correct; instances on AWS are available that have access to NVIDIA video cards. Why is that helpful? Because NVIDIA cards have massive numbers of cores that are excellent at doing floating point math and vector calculations. Such vector calculations are vital to generating gaming images, but they’re also useful in querying large language models. As LLMs learn, the words get assigned floating point vectors that can be used for calculating whether words are similar. GPUs are ideal for this.

Spinning up an AWS GPU Instance

Head over to the AWS console, go into the EC2 service, and launch an instance. Use the g4dn.xlarge instance, with the latest Ubuntu Server image. This particular instance type provides an NVIDIA Tesla T4 GPU, with 2560 CUDA cores, which the Sentence Transformers library will use to do its calculations, with the help of a few other libraries, including pyTorch.

IMPORTANT TIP: Set an alarm on your phone to remind you later in the day to shut down the instance. This instance will cost you around $13 per each 24-hour day to run!

After you launch the server, SSH into it. You’ll need to manually install the NVIDIA drivers. To do so, follow the instructions shown on this page for Ubuntu LTS.  (Alternatively you can find some images on the AWS Marketplace with the drivers already installed; however, this step is pretty quick.)

Now install python and its pip tool. Copy and paste these in:

sudo apt install python3

sudo apt install python3-pip

sudo apt install python-is-python3

(That last one lets you just type “python” to launch python3.)

And now install torch:

pip install torch

The next step isn’t required, but it’s fun to see what GPU model is available. Start up python by simply typing:


Then type the following:

import torch

You should see True appear. CUDA is the platform that runs on the NVIDIA card, and that means Python is finding the GPU. Let’s see exactly which card is available. Enter:


You should see ‘Tesla T4’ appear. (Go ahead and google the specs, and even the price. It’s pretty impressive… and expensive. But you’re able to use it for just a few dollars for a few hours.)

Finally, we’ll install the Sentences Transformers python library:

pip install sentence-transformers

Sentence Transformers First Practice

Open up your favorite command-line editor, such as vim or nano. For this first practice, we’ll just grab the code right off the front of the Sentence Transformers site, found here.

Save the code to a file such as test1.py, and run it:

python test1.py

This first example isn’t particularly useful, but it’s interesting. The code provides three different sentences:

“This framework generates embeddings for each input sentence”

“Sentences are passed as a list of string.”

“The quick brown fox jumps over the lazy dog.”

The library then calculated embeddings for each of these. Embeddings are in the form of vectors. This app simply prints out all the vectors for these three sentences. (If you like, you can see the size of the embeddings by also printing out their lengths):

print("Embedding length:", len(embedding))

This shows it’s coming up with 384 numbers for each.

Again, this isn’t particularly helpful as is, but it shows you’re heading in the right direction. Now you can do an actual sentence search.

A Real Sentence Search

These vectors can be used to locate sentences that are similar. To do that, it is necessary to calculate the cosines of the vectors. Fortunately, however, you don’t have to do that yourself. In fact, you don’t have to worry about math or trig at all! Instead, you can let the Sentence Transformers library do it for you.

To try this out, create an example similar to the example found here

The idea is that you’re going to first load up a list of sentences called a corpus. Then we’re going to provide a query sentence. Sentence Transformers will then search the corpus for the sentence that’s closest to the one you entered. Then, use the example you saw at the start of this article.

Here’s the entire python code:

from sentence_transformers import SentenceTransformer, util
import torch

embedder = SentenceTransformer('all-MiniLM-L6-v2')

# Here are the sentences you'll search within

corpus = ['A man is eating food.',
          'A man is eating a piece of bread.',
          'The girl is carrying a baby.',
          'A man is riding a horse.',
          'A woman is playing violin.',
          'Two men pushed carts through the woods.',
          'A man is riding a white horse on an enclosed ground.',
          'A monkey is playing drums.',
          'A cheetah is running behind its prey.'

corpus_embeddings = embedder.encode(corpus, convert_to_tensor=True)

# Here's the sentence you're searching for:

query = 'I saw a big cat following an antelope'

# Now use Sentence transformers to find a similar sentence

query_embedding = embedder.encode(query, convert_to_tensor=True)
cos_scores = util.cos_sim(query_embedding, corpus_embeddings)[0]
top_result = torch.topk(cos_scores, k=1)

print('Query sentence:')
print('Closest match:')

When run, here’s the output:

Query sentence:
I saw a big cat following an antelope.
Closest match:
A cheetah is running behind its prey.

You can modify the query sentence. Try changing it from a sentence similar to one of the other sentences in the corpus. For example, consider the query sentence:

I saw somebody playing a musical instrument.

When placed in the code, the app will find this sentence, which is definitely similar:

A woman is playing violin.

About the Model

The examples above make use of an LLM called all-MiniLM-L6-v2. This is a special library that is a relatively small LLM that works well for everyday use. There are several more libraries at your disposal, however, and some might be better but come with tradeoffs such as taking more time to process. You can explore them here. https://www.sbert.net/docs/pretrained_models.html 

Applications for Sentences Transformers

One way this is useful is you could write an app that loads in text from many documents, such as an employee self-serve portal. There might be dozens or even hundreds of documents. Then employees could type in a search phrase, and locate the sentences within the documents that are a close match.

Of course, coding that up would be a bit of work. But there are other libraries that can help with that part. An ideal library for that would be the Chroma Library. You can find out more here.


Using the right tools, you can perform sentence transformations, which can be used as the basis of a full document search system.

Indeed, at Operationalizing AI, participants spent time building some amazing apps that made use of LLMs, including document searching. We’ll provide some more information on what the teams have created soon.