AI Search Is a Disaster

Microsoft and Google believe chatbots will change search forever. So far, there’s no reason to believe the hype.

Animation of asking a personal question to a search engine
Joanne Imperio / The Atlantic

Last week, both Microsoft and Google announced that they would incorporate AI programs similar to ChatGPT into their search engines—bids to transform how we find information online into a conversation with an omniscient chatbot. One problem: These language models are notorious mythomaniacs.

In a promotional video, Google’s Bard chatbot made a glaring error about astronomy—misstating by well over a decade when the first photo of a planet outside our solar system was captured—that caused its parent company’s stock to slide as much as 9 percent. The live demo of the new Bing, which incorporates a more advanced version of ChatGPT, was riddled with embarrassing inaccuracies too. Even as the past few months would have many believe that artificial intelligence is finally living up to its name, fundamental limits to this technology suggest that this month’s announcements might actually lie somewhere between the Google Glass meltdown and an iPhone update—at worst science-fictional hype, at best an incremental improvement accompanied by a maelstrom of bugs.

The trouble arises when we treat chatbots not just as search bots, but as having something like a brain—when companies and users trust programs like ChatGPT to analyze their finances, plan travel and meals, or provide even basic information. Instead of forcing users to read other internet pages, Microsoft and Google have proposed a future where search engines use AI to synthesize information and package it into basic prose, like silicon oracles. But fully realizing that vision might be a distant goal, and the road to it is winding and clouded: The programs currently driving this change, known as “large language models,” are decent at generating simple sentences but pretty awful at everything else.

These models work by identifying and regurgitating patterns in language, like a super-powerful autocorrect. Software like ChatGPT first analyzes huge amounts of text—books, Wikipedia pages, newspapers, social-media posts—and then uses those data to predict what words and phrases are most likely to go together. These programs model existing language, which means they can’t come up with “new” ideas. And their reliance on statistical regularities means they have a tendency to produce cheapened, degraded versions of the original information—something like a flawed Xerox copy, in the writer Ted Chiang’s imagining.

And even if ChatGPT and its cousins had learned to predict words perfectly, they would still lack other basic skills. For instance, they don’t understand the physical world or how to use logic, are terrible at math, and, most germane to searching the internet, can’t fact-check themselves. Just yesterday, ChatGPT told me there are six letters in its name.

These language programs do write some “new” things—they’re called “hallucinations,” but they could also be described as lies. Similar to how autocorrect is ducking terrible at getting single letters right, these models mess up entire sentences and paragraphs. The new Bing reportedly said that 2022 comes after 2023, and then stated that the current year is 2022, all while gaslighting users when they argued with it; ChatGPT is known for conjuring statistics from fabricated sources. Bing made up personality traits about the political scientist Rumman Chowdhury and engaged in plenty of creepy, gendered speculation about her personal life. The journalist Mark Hachman, trying to show his son how the new Bing has antibias filters, instead induced the AI to teach his youngest child a vile host of ethnic slurs (Microsoft said it took “immediate action … to address this issue”).

Asked about these problems, a Microsoft spokesperson wrote in an email that, “given this is an early preview, [the new Bing] can sometimes show unexpected or inaccurate answers,” and that “we are adjusting its responses to create coherent, relevant and positive answers.” And a Google spokesperson told me over email, “Testing and feedback, from Googlers and external trusted testers, are important aspects of improving Bard to ensure it’s ready for our users.”

In other words, the creators know that the new Bing and Bard are not ready for the world, despite the product announcements and ensuing hype cycle. The chatbot-style search tools do offer footnotes, a vague gesture toward accountability—but if AI’s main buffer against misinformation is a centuries-old citational practice, then this “revolution” is not meaningfully different from a Wikipedia entry.

If the glitches—and outright hostility—aren’t enough to give you pause, consider that training an AI takes tremendous amounts of data and time. ChatGPT, for instance, hasn’t trained on (and thus has no knowledge of) anything after 2021, and updating any model with every minute’s news would be impractical, if not impossible. To provide more recent information—about breaking news, say, or upcoming sporting events—the new Bing reportedly runs a user’s query through the traditional Bing search engine and uses those results, in conjunction with the AI, to write an answer. It sounds something like a Russian doll, or maybe a gilded statue: Beneath the outer, glittering layer of AI is the same tarnished Bing we all know and never use.

The caveat to all of this skepticism is that Microsoft and Google haven’t said very much about how these AI-powered search tools really work. Perhaps they are incorporating some other software to improve the chatbots’ reliability, or perhaps the next iteration of OpenAI’s language model, GPT-4, will magically resolve these concerns, if (incredible) rumors prove true. But current evidence suggests otherwise, and in reference to the notion that GPT-4 might approach something like human intelligence, OpenAI’s CEO has said, “People are begging to be disappointed and they will be.”

Indeed, two of the biggest companies in the world are basically asking the public to have faith—to trust them as if they were gods and chatbots their medium, like Apollo speaking through a priestess at Delphi. These AI search bots will soon be available for anyone to use, but we shouldn’t be so quick to trust glorified autocorrects to run our lives. Less than a decade ago, the world realized that Facebook was less a fun social network and more a democracy-eroding machine. If we’re still rushing to trust the tech giants’ Next Big Thing, then perhaps hallucination, with or without chatbots, has already supplanted searching for information and thinking about it.

Matteo Wong is an associate editor at The Atlantic.