The takeaway: The findings provide an early empirical snapshot of how generative AI is transforming the architecture of search. While traditional engines remain anchored in popularity and ranking, AI systems are shaping a model based on synthesis – one that blurs the line between retrieval and interpretation.

A new academic study comparing traditional web search results with those generated by AI-driven systems has found that generative AI tools frequently rely on less popular or unconventional sources. The findings underscore a growing divide between how conventional search engines and large language model-based systems gather and present online information.

Researchers from Ruhr University Bochum and the Max Planck Institute for Software Systems conducted the analysis, published as a preprint paper titled "Characterizing Web Search in the Age of Generative AI." The study measured differences across a range of AI-based search engines, including Google's AI Overviews, Gemini-2.5 Flash, and two variants of OpenAI's GPT-4o: its built-in web search mode and the GPT-4o Search Tool, which accesses the web only when the model determines that outside data is needed.

For decades, search engines have operated by indexing and ranking pages, returning lists of links ordered primarily by relevance and authority. Generative AI systems, by contrast, synthesize information from multiple sources into concise, summarized responses. The researchers aimed to quantify how this shift affects the types of websites that inform those answers.

To test their hypothesis, they drew thousands of sample queries from several public datasets. These included questions collected from ChatGPT interactions in the WildChat dataset, general social and political topics cataloged by the media-bias monitoring site AllSides, and the 100 most searched items on Amazon's product ranking list. Trending topics from Google's search trend data were also included for comparative testing.

Each query was submitted to both traditional Google Search and the AI-based systems. Researchers then compared the domains cited in the AI-generated responses with those appearing in the first 10 and first 100 links of a standard Google results page.

The differences were striking. Using Tranco, an independent tracker that ranks web domains by popularity, the study found that AI-generated search results consistently drew from websites outside the most-visited categories. In Google's own AI Overviews, more than half of all cited sources did not appear in the top 10 organic Google results for the same query, and 40 percent were absent even from the top 100 links.

Gemini search results showed a similar pattern, frequently citing domains outside Tranco's top 1,000. The researchers noted that the typical, or "median," source cited by Gemini fell below the threshold of widely visited websites. GPT-4o and its web-enabled counterpart also drew from less prominent sources, though they tended to reference institutional domains such as company pages and encyclopedias rather than social media or discussion forums.

The study did not conclude that AI search results were inferior, but rather that they reflect a different approach to information retrieval. Using an independent evaluation tool developed at Stanford University, LLOOM, the team found that AI search systems covered a comparable number of distinct concepts as the top 10 links in a conventional search. This suggests that while AI responses summarized similar breadths of information, they sometimes compressed that data, downplaying nuances preserved in traditional search results.

This effect was particularly noticeable for ambiguous search terms, such as names shared by multiple individuals. Standard link-based search tended to provide broader contextual coverage, while AI responses often consolidated these cases into single interpretations, omitting some alternative results.

Generative systems benefited from the pre-trained knowledge of large language models, giving them an advantage in synthesizing background context. GPT-4o with the Search Tool, for example, sometimes offered comprehensive summaries without citing any external data, relying entirely on the model's internal knowledge base. This behavior proved useful for well-established topics but was less reliable for recent events or breaking news.

When tested with trending Google queries from mid-September, GPT-4o's web-enabled version frequently failed to retrieve current information, generating placeholder replies such as requests for clarification or simple acknowledgments of uncertainty. This reflected the system's hesitancy to access external sources unless explicitly necessary.

The authors emphasized that evaluating the accuracy or quality of generative AI search requires new benchmarks. They called for future research to use metrics beyond those designed for traditional search ranking, particularly frameworks that account for source diversity, the range of conceptual coverage, and the effectiveness of AI systems in synthesizing information into cohesive summaries.