LLMs like OpenAI’s ChatGPT, Anthropic’s Claude, and Google’s Bard are undeniably impressive. But as the name implies, these large language models require massive computing power. The more data used to train them, the larger these models balloon.
However, most use cases don’t actually need all that sophistication. The hefty size means LLMs live in the cloud, needing constant internet connectivity. This introduces privacy concerns regarding how user data is leveraged to refine models or even stored.
In contrast, smaller models like Microsoft’s 2.7 billion parameter Phi-2 provide extensive capabilities while fitting on embedded devices or phones. Microsoft claims Phi-2 matches or exceeds models 5x its size on certain benchmarks. Running locally enables lightning fast responses, which will only improve as chipmakers optimize for AI. Local processing also allows you to keep your data private, even personalizing an SLM with your own data.
Both LLMs and more compact SLMs like Phi-2 play key roles. But model selection matters hugely, impacting project feasibility and costs. A blended approach with specialized models handling discrete tasks may offer the best solution for many companies.
P.S. I suspect this hybrid route may be what Apple wants to pursue – storing a small LM directly on devices while still enabling cloud connectivity when needed. This would allow Apple to incorporate AI while upholding their privacy commitment, even if it means less training data than the largest LLMs leverage. Their new open source tool for running models locally hints at this focus. With a multi-model approach prioritizing on-device AI, they could query the cloud only for complex requests beyond a local model’s scope.