Newsletters | October 23, 2024

Issue #65: OpenAI DevDay Brings New AI Tools

Howdy👋🏾. Did you miss OpenAI’s annual DevDay on October 1st? This invite-only event took place in San Francisco, London, and Singapore. Unlike last year, the event was not live-streamed, and so far, no videos have been released, but OpenAI did update its website with its most recent updates.

This year’s event was more low-key than last year’s event. This year’s event didn’t feature a keynote from CEO Sam Altman. This year also followed the massive $6.6 billion funding round—the largest VC round ever—shortly after Mira Murati, OpenAI’s CTO, resigned.

Realtime API

The Realtime API was the showstopper—allowing developers to create advanced speech-to-speech systems for the first time (at least, that I’m aware of). This API also integrates with OpenAI’s existing chat completion API, enabling developers to build applications where users can speak to an AI system and receive verbal responses.

In simple terms, this means anyone can now create real-time conversational AI agents. For example, you could build an AI assistant that takes spoken input, processes the request, and responds verbally—essentially creating a conversational agent with a voice. This opens the door to more immersive, voice-driven applications and AI-powered customer service tools.

Vision Fine-Tuning API

The Vision Fine-Tuning API is another exciting update. It allows developers to provide images with custom metadata and train OpenAI to recognize and understand unique image sets. This is ideal for businesses working with proprietary images that OpenAI’s general models don’t recognize.

For instance, if you run a hardware repair business with thousands of obscure parts, you could upload your catalog and fine-tune the Vision API. Your customers could then simply upload an image of a part, and OpenAI would be able to identify it—potentially streamlining the entire repair process.

During the event, OpenAI demonstrated how developers can use this to enhance automation by providing screenshots of internal applications, helping AI navigate complex systems with visual context.

Prompt Caching API

The Prompt Caching API is designed to boost efficiency. It allows OpenAI to recognize commonly asked prompts and cache the response, providing faster and cheaper answers by avoiding repeated full inference costs.

This is especially helpful for businesses with large-scale API use, as it reduces cost and improves response times for frequently asked questions or tasks.

Model Distillation

Model Distillation is a game-changer for developers working with large models like ChatGPT-4, which contain billions of parameters. While those vast amounts of knowledge are useful, many applications don’t need access to such a massive dataset.

Distillation allows developers to fine-tune these large models, stripping them down to a smaller, more efficient version tailored to specific use cases. This “distilled” model can focus on answering core questions more effectively and can be deployed more cost-efficiently, especially in environments where speed and specialized knowledge are critical.

These new developer tools are exciting, and I’m looking forward to experimenting with them. Features like Prompt Caching could be a huge cost-saver for high-volume API users. At the same time, the Realtime API brings us closer to creating real-time conversational agents—possibly even creating personalized versions of ourselves in the near future.

Now, my thoughts on tech & things:

🤖 OpenAI has outlined 5 steps to AGI, with ChatGPT-4o currently at level 2, capable of reasoning and problem-solving, aiming for autonomous AI agents and innovation in future releases. Read more.

🔧 If you’re following the WordPress vs WP Engine saga, don’t miss Matt, CEO of Automattic and WordPress founder’s response! Read more.

So, let’s talk about the Realtime API. This is a pretty big announcement that could make conversational AI agents much easier to build and deploy. I’ve spent the last week experimenting with the API, and while it’s incredibly easy to use, it’s worth noting that the cost of speech tokens is significantly higher than text-based prompts.

One of the most exciting and lesser-known features is that OpenAI’s Realtime API allows access to external tools. This opens up the possibility for the model to interact with external services, enabling Retrieval-Augmented Generation (RAG) operations like grabbing real-time weather data or checking the status of a Jira ticket.

The system is now limited to six voices, with some features still in beta or labeled “coming soon.” But I can’t wait to start using this in client projects. If you’re interested in a fun proof of concept, contact your friendly fractional CAIO!

In case you missed it, I had the pleasure of speaking at DC Startup & Tech Week on The Business of AI. The event was a huge success with tons of eager technologies, startups, and business leaders from all over the DMV.

On November 7th, I’m moderating WTCI’s rebooted AGILE panel on AI at the Crossroads in Baltimore, then heading to New York for NYC’s AI Summit at the Javits Center on December 11th and 12th. I’d love to see you at any of these events, so grab a ticket and drop in!

Also, don’t forget to pre-order my book, The AI Evolution. The Pre-order price will expire on November 30th, and the the books goes to its normal price of $24.

-jason

P.s. AI is becoming an essential part of our lives and deeply embedded in all the devices we use, so it is only a matter of time before AI summaries help break difficult news to you like your girlfriend is leaving you. You have just a few days to pack her stuff. Thanks AI!