In context: Large language models are dominating the news cycle with no signs of slowing. Everybody wants in on the ground floor of the technology, so there is currently a gold rush to release the next great AI chatbot. Unfortunately, models like ChatGPT are prohibitively expensive to build and train. Smaller models are much cheaper but seem more inclined to devolve into a mess akin to Microsoft's Tay from 2016.

Last week, Stanford University researchers released their version of a chatbot based on Meta's LLaMa AI called "Alpaca" but quickly took it offline after it started having "hallucinations." Some in the large language model (LLM) industry have decided that hallucination is a good euphemism for when an AI spouts false information as if it were factual. The university added that increasing hosting costs and concerns for safety were also factors in its removal.

"The original goal of releasing a demo was to disseminate our research in an accessible way," a Stanford University's Human-Centered Artificial Intelligence Institute spokesperson told The Register. "We feel that we have mostly achieved this goal, and given the hosting costs and the inadequacies of our content filters, we decided to bring down the demo."

"Given ... the inadequacies of our content filters..." is code for "the internet ruined our model," which is no surprise since the internet ruins everything.

Of course, LLMs are prone to fanciful musings, presenting them in a completely believable way. Researchers have pointed out this weakness in virtually every recent chatbot released into the wild. There are numerous examples of ChatGPT and others presenting false information as factual and repeatedly arguing their story when called out about it.

Furthermore, Stanford knew Alpaca generated inappropriate responses when it launched the interactive demo.

"Alpaca also exhibits several common deficiencies of language models, including hallucination, toxicity, and stereotypes," the researchers said in their press release last week. "Hallucination, in particular, seems to be a common failure mode for Alpaca, even compared to text-davinci-003 (OpenAI's GPT-3.5). Deploying an interactive demo for Alpaca also poses potential risks, such as more widely disseminating harmful content and lowering the barrier for spam, fraud, or disinformation."

Despite the webpage hosting the Alpaca demo being down, users can still retrieve the model from its GitHub repo for private experimentation, which Stanford encourages. It asked users to "flag" failures not listed in its press release when it initially posted the model.

One of the problems with Alpaca is that it is a relatively small model as LLMs go, but this is by design. Meta intentionally created LLaMA as an accessible language model that would not take an expensive supercomputer to train. Stanford used it to develop a seven-billion parameter model for about $600. Compare this to the $3 billion (or more) that Microsoft invested into its ChatGPT-based model with hundreds of billions of parameters.

In this light, it is no surprise that Alpaca failed so quickly when released to the public. Even ChatGPT and Bing Chat had many mishaps, faux pas, and controversies when they debuted, and that was after reasonably lengthy closed betas.

However, that does not mean Alpaca will never be suitable for public consumption. The GitHub code has only been out for a week, and folks have implemented it on a Raspberry Pi and Pixel phones. These feats are only possible because of Alpaca's lightweight size. The main hurdle will be getting such a small model to behave, something Stanford has been studying on for over three years (video above). It's an area where even massive LLMs like ChatGPT need work.