I’m excited to open up my little corner of the web I’ve been tinkering with – an AI sandbox to easily compare and play with various conversational assistants and generative AI models. This web app, located at labs.jasonmperry.com, provides a simple interface wrapping API calls to different systems that keeps experimentation tidy in one place.
Meet the AI Assistants
Last year, OpenAI released AI Assistants you can train as bots accessing files and calling functions through Retrieval-Augmented Generation (RAG). To test capabilities, I created personalities to check how well these features work for customer service or business needs.
Each of these work assistants works at the fictional firm Acme Consulting, and I uploaded to each bot a company primer detailing the history, leadership, services, values, etc., as a reference. The bots include:
- IT manager, Zack “Debugger” Simmons, is here to help with helpdesk inquiries or to suggest best practices and can help troubleshoot issues or explain configurations.
- HR Coordinator Tina “Sunbeam” Phillips is armed with general HR knowledge and a fictional employee handbook with policy details she can cite or reference. Ask her about the holiday schedule and core schedule or for benefits advice.
- Support Coordinator, Samantha “Smiles” Miles is part of the Managed Services team and helps maintain support tickets in the Jira Service Desk for all of our corporate clients. By using RAG, you can ask for updates on support tickets she can grab with phrases like “Tell me what tickets I have open for Microsoft” or “Get me the status of ticket MS-1234” which call mock endpoints.
In addition to the Acme workers, I wanted to experiment with what an assistant powering something like Humane’s upcoming AI pin might function like; after all, we know that the product makes heavy use of OpenAI’s models.
- The witty assistant Mavis “Ace” Jarvis is trained with a helpful instruction set and some RAG operations that allow her to get the weather or check stock prices. She can also show locations on a map based on a query. Try asking her, “Will the weather in Las Vegas be warm enough for me to swim outside?” or “Nvidia is on a tear, how’s the stock doing today?”
Finally, I used Anthropic’s Claude to create backgrounds for three fictional US political commentators with different background stories. You can get political insight, debate, or get views on current issues from Darren, the Conservative, progressive Tyler, and moderate Wesley. In the wake of a push to create AI that bends to different philosophies, I figured these assistants could offer a view into how three distinct personalities might respond to similar prompts while all trained on the same core data.
Compare multiple models’ outputs side-by-side – currently supporting Cohere, Jurassic, Claude, and ChatGPT. Specify max length, temperature, top p sampling, and more for more tailored responses. I plan to continually add the latest models as they become available for testing how phrasing, accuracy, creativity etc. differ when asking the same prompt.
Similarly, visually compare image results from DALL-E and Stable Diffusion by entering identical prompts. The interpretation variance based on the artists and datasets used to train each is intriguing.
Of course, as a playground and lab, I’m continually adding features and experiments, and I plan to add video generation, summarizers, voice cloning, etc. So check back for the latest or suggest additions.