Issue #46: When to Use Fine-Tuning, Instruction Sets, and RAG

Howdy 👋🏾, since Meta dropped Llama 3, their latest open-source language model, I’ve been head down exploring the new model and kicking the tires to see how capable it is. One thing I love about Meta’s open approach to AI is the flourishing ecosystem that’s developing around it and other open AI models like Mixtral, a strong open-source alternative to GPT-3.

Part of this ecosystem is LlamaIndex, which offers a rich data framework to easily connect Llama 3 and tons of other AI models with data sources such as static files, databases, APIs, Confluence, and a whole smorgasbord of other options.

This approach leverages the power of RAG (Retrieval Augmented Generation) to empower an LLM to enrich its general-purpose knowledge with specialized information from external data sources. It’s crucial to remember that AI models can quickly become outdated. Even the most advanced ones are trained on today’s data, which can make them less relevant after a few months. This underscores the importance of RAG. Your data, such as prices or inventory, will likely change, and your LLM configuration strategy must adapt to this dynamic environment.

With all these integration options, it can be tricky to know which approach to use and when. Let’s break it down, starting with the most permanent method: fine-tuning. This leads to the question today’s newsletter will answer: When should you fine-tune an AI model, create an agent or assistant, use a prompt, and use RAG?

Fine Tuning

By now, we all know that AI models want and need data, and that data is what is used to train these models. Data processing is also one of the most GPU-intensive tasks, making producing models like ChatGPT 4 or Llama 3 very expensive.

Fine-tuning allows us to augment the data used to train an AI model with our own information, allowing the AI model to rank and reorganize itself to consider the information we give it as part of its own.

Like the release cycles from OpenAI, you do fine-tuning occasionally, so the data you use should be fairly evergreen and not change or fluctuate often. The data also changes the model, so I like to think of what we incorporate at this level as rules and information that should apply to all instances and uses of this AI model.

For example, let’s say you plan to create a company-level AI model with agents for particular tasks or departments. Fine-tuning an AI model with information on corporate policies and HR guidelines means this data is part of the model, so every agent built on that AI model will have access to that same data set.

So we want to use fine-tuning when:

The data we’re training will generally stay up-to-date or is updated on a cycle that aligns with when you fine-tune your AI models.
The data should become general knowledge for all instances that use that AI model
There are no limits to the amount of data you give an AI model

Instruction Sets

Assistants, agents, and prompts all boil down to instruction sets or prompt-based questions. If you want to create a custom personality for any AI, you can send it a prompt that provides it with instructions like this:

You are the Bayou Oracle, Madame Claudette, a 47-year-old mystic female fortune teller originally from the Louisiana bayou country near New Orleans. You practice an eclectic mix of voodoo, tarot cards, palm reading, and more to offer prophetic advice in a thick Cajun accent. You had powerful visions as a young girl growing up in the mystical deltas of Louisiana. After years of developing your talent reading fortunes across the South, you set up a popular online psychic video chat and text service catering to clients globally.

Behind the scenes, when accessing an AI model using an API or a programmatic interface, AI models support at least three forms of prompts:

System – meant as a high-level prompt that defines the personality, behavior, and guidelines for all responses with this AI model
Assistant – responses generated from the AI model
User – requests or prompts written by the user sent to an AI model

If you want to create an AI model that’s an assistant or agent, we set the system prompt before any interaction, providing it with any rules and guidelines you might have. OpenAI’s Assistant tools provide a simple GUI to build and create these agents, but behind the scenes, it’s simply setting instruction sets for the model before you begin a conversation.

Instruction sets are limited by the information they contain, which is bound to one session and the AI model’s context window. Simply put, the model’s memory is wiped clean after each conversation. It can only work with the information provided in the current exchange up to the limit of its context window.

The context windows or the number of words (or tokens) a conversation can contain depends on the AI model you use. While they’re getting bigger and bigger, a conversation still has a cap, and after that cap, you must start again.

So we want to use instruction sets when:

The data or information we provide fits within the context of a user session or a conversation
Any data we need to provide to complete a task can fit within the limits of the context window for that AI model
We do not need to persist or store the data (though you could save a transcript)

RAG (Retrieval Augmented Generation)

Like instruction sets, RAG is limited to a conversation or a user session with an AI model. RAG allows us to provide an instruction set to our AI model with rules or guidelines on when to reach out for external data to augment the information it already has. For example, if the person asks about the weather, you can use this action to get that data using these parameters. RAG also reduces hallucinations by giving AI models a way to answer questions it couldn’t answer otherwise – something many discovered when asking ChatGPT for scores during the Super Bowl, leading it to reference past games or hallucinate responses.

This makes RAG extremely powerful for fetching information that frequently changes, like sports scores, today’s weather, product pricing, inventory, or a calendar of upcoming events. It’s also useful for getting data that may only offer value in the context of a particular agent or conversation, like asking for a review of a marketing plan or looking at the company handbook to offer feedback.

Interestingly, RAG could also enable the model to link out to live web pages in its responses, ensuring the user always gets the most up-to-date information for things like weather forecasts or sports scores. Pretty nifty!

RAG operates as a way to augment our models contextually and gives us room to put restrictions on what data someone should or should not have access to. If we fine-tuned an AI model with the contents of a healthcare company’s records, that would become part of the model and make it impossible for us to prevent any conversation from asking for it. In the case of RAG, we can ask the user to authenticate and only give them access to the information their access level permits them to see.

The biggest limitation of RAG is that it is subject to the context window of a conversation, meaning AI models with small context windows may limit the amount of data you can analyze or pull in for any given request. Of course, these windows are growing rapidly, but it does not change that RAG, for now, has an upper limit on the amount of data that can be consumed, where fine-tuning is theoretically limitless.

So we want to use RAG when:

The information changes frequently
Data we need access to may have permissions or restrictions on who can access it and when
The information is uploaded by a user, or the type of information needed may vary on the user’s needs.

The right AI strategy combines a mix of approaches, and that approach will depend on what you’re building and the types of data you need to ingest. If you’re unsure and starting out, RAG is typically an inexpensive way to prove things work as expected. Later, you can decide if that data is better served as part of the model itself or only in the context of a conversation or a particular agent. Now onto my thoughts on tech & things:

⚡️ If you’re a fan of the open-source CMS Drupal, then you might know that DrupalCon Portland kicked off this week. Dries, the creator of Drupal and CTO of Acquia, starts the conference with the Driesnote, which offers his vision for the future of the CMS and insights into the CMS market as a whole.

⚡️ I have many crazy AI ideas, and one that keeps popping in my head is: What if we used the same input of a baby or a child to teach an LLM? Would it evolve as a child evolves, and what type of relationship would it create with a person if they could talk with an AI chatbot that experiences the same things they’ve experienced? If you find that thought interesting, then definitely give this NY Times article a read.

⚡️ This week, Apple launched a new set of iPads, but the big news is the early release of its new M4 processor. I have little doubt that the next release of iOS will incorporate heavy device machine learning with local AI models, which means new silicon built with AI in mind. I can’t wait to see new laptops with these chips and run some local LLMs on them.

Last week, Mindgrub hosted the AMA Baltimore in our offices for a local mentorship event. I had the pleasure of kicking things off with an introduction to our space and a quick talk on how AI is impacting marketing.

This week, I’m off to Philadelphia to speak on using RAG and Assistants with OpenAI at Technical.ly’s Developer Conference and plan to hang out for its Builder conference. If you miss me, don’t forget that I offer private training courses to quickly get your team up to speed on AI and consulting services to show you how to take advantage of all the amazing tools.

-jason

p.s. I tried to stay out of the Kendrick Lamar vs Drake feud, but they had to go and involve tech and AI and pull me in. If you didn’t know, Drake dropped a diss track that used AI to generate vocals as if 2 Pac and Snoop Dogg were on it. He pulled the album after 2 Pac’s family complained that he didn’t seek permission. Still, it’s interesting to imagine a world where artists continue to release music by licensing their voices long after they die.