Wikipedia servers are struggling under pressure from AI scraping bots

Alfonso Maruccia

Posts: 1,856   +559
Staff
Editor's take: AI bots have recently become the scourge of websites dealing with written content or other media types. From Wikipedia to the humble personal blog, no one is safe from the network sledgehammer wielded by OpenAI and other tech giants in search of fresh content to feed their AI models.

The Wikimedia Foundation, the nonprofit organization hosting Wikipedia and other widely popular websites, is raising concerns about AI scraper bots and their impact on the foundation's internet bandwidth. Demand for content hosted on Wikimedia servers has grown significantly since the beginning of 2024, with AI companies actively consuming an overwhelming amount of traffic to train their products.

Wikimedia projects, which include some of the largest collections of knowledge and freely accessible media on the internet, are used by billions of people worldwide. Wikimedia Commons alone hosts 144 million images, videos, and other files shared under a public domain license, and it is especially suffering from the unregulated crawling activity of AI bots.

The Wikimedia Foundation has experienced a 50 percent increase in bandwidth used for multimedia downloads since January 2024, with traffic predominantly coming from bots. Automated programs are scraping the Wikimedia Commons image catalog to feed the content to AI models, the foundation states, and the infrastructure isn't built to endure this type of parasitic internet traffic.

Wikimedia's team had clear evidence of the effects of AI scraping in December 2024, when former US President Jimmy Carter passed away, and millions of viewers accessed his page on the English edition of Wikipedia. The 2.8 million people reading the president's bio and accomplishments were 'manageable,' the team said, but many users were also streaming the 1.5-hour-long video of Carter's 1980 debate with Ronald Reagan.

As a result of the doubling of normal network traffic, a small number of Wikipedia's connection routes to the internet were congested for around an hour. Wikimedia's Site Reliability team was able to reroute traffic and restore access, but the network hiccup shouldn't have happened in the first place.

By examining the bandwidth issue during a system migration, Wikimedia found that at least 65 percent of the most resource-intensive traffic came from bots, passing through the cache infrastructure and directly impacting Wikimedia's 'core' data center.

The organization is working to address this new kind of network challenge, which is now affecting the entire internet, as AI and tech companies are actively scraping every ounce of human-made content they can find. "Delivering trustworthy content also means supporting a 'knowledge as a service' model, where we acknowledge that the whole internet draws on Wikimedia content," the organization said.

Wikimedia is promoting a more responsible approach to infrastructure access through better coordination with AI developers. Dedicated APIs could ease the bandwidth burden, making identification and the fight against "bad actors" in the AI industry easier.

Permalink to story:

 
Except the ad content - and revenue from such - is based on the amount of traffic a website gets... Perhaps Wiki should start putting some ads on their site... Techspot caved years ago...
 
Tell Jimmy Wales to beg up another couple million with a half-page obstruction and then put everything behind cloudflare. It's gonna break a lot of apps depending on how the API functions, but it would solve the bot problem.
 
Tell Jimmy Wales to beg up another couple million with a half-page obstruction and then put everything behind cloudflare. It's gonna break a lot of apps depending on how the API functions, but it would solve the bot problem.

They don't need any more money at all.

Wikipedia/media begs for donations regularly, and they take in far, far more than what they spend on infrastructure - or anything else. They have about $250 million in assets - over and above their expenses. Most of the foundations costs are for salaries and wages - and dozens of *****ic pet projects that have nothing to do with their stated purpose. Barely 2% of the foundations expenses go to 'internet hosting'.

At this point, it's just one big grift.
 
Apparently this is affecting people running their own small servers, some bots visiting a number of times a day

Didn't GOP want ISPs to be able to whack likes of Netflix , ie effectively double dipping . Users pay for bandwidth and then pay Netflix more because the use that bandwidth ( obviously 2nd part is not 1 for 1 , may not use Netflix etc )
 
Wikipedia is a massive grift that professionally begs for cash. Much like the linux foundation (and most charities), over 90% of money "donated" to them goes to all sorts of political clauses and NGO groups.
 
Here comes the "I don't pay an organization that asks for donations but provides a valuable service anything and now I'm going to ***** and moan about their financial model" crowd.

Good. They found a way to make money without forcing a subscription service or payment model or anything. Someone get me my smelling salts I have the vapours!
 
I remember when we faced a similar issue with crawlers from countless search engines. At one point, they made up 95% of all incoming requests, crawling our entire history and rendering simple caching strategies useless. The key difference back then was that search engines, at least in theory, drove users to our sites. With AI, it feels like the opposite - extracting value without sending any traffic back.
 
They don't need any more money at all.

Wikipedia/media begs for donations regularly, and they take in far, far more than what they spend on infrastructure - or anything else. They have about $250 million in assets - over and above their expenses. Most of the foundations costs are for salaries and wages - and dozens of *****ic pet projects that have nothing to do with their stated purpose. Barely 2% of the foundations expenses go to 'internet hosting'.

At this point, it's just one big grift.

I'm not quite sure what you're so unhappy about? That you receive a free service of exceptional quality and with no strings attached?

Youtube, instagram, tiktok etc. take in billions of dollars and don't provide a fraction of the societal and global benefit that Wikipedia provides. I am a regular donor and am more than happy to keep donating. I've tried contributing to some articles and let me tell you: Contributing to wikipedia is not easy and very time consuming. The standard for submissions is university grade. I am extremely grateful to all the volunteers who spend endless hours providing content that's better than almost everything else on the internet.

Honestly you just need to take your butthurt attitude elsewhere.
 
I'm not quite sure what you're so unhappy about? That you receive a free service of exceptional quality and with no strings attached?

Honestly you just need to take your butthurt attitude elsewhere.

What doesn't benefit one, let alone match one's, and especially the Dumpers' and their ilks', ideology equals a scam/grift. The Thinkers will judge according to this. I will preside over the proceedings.
 
I'm not quite sure what you're so unhappy about? That you receive a free service of exceptional quality and with no strings attached?

Youtube, instagram, tiktok etc. take in billions of dollars and don't provide a fraction of the societal and global benefit that Wikipedia provides. I am a regular donor and am more than happy to keep donating. I've tried contributing to some articles and let me tell you: Contributing to wikipedia is not easy and very time consuming. The standard for submissions is university grade. I am extremely grateful to all the volunteers who spend endless hours providing content that's better than almost everything else on the internet.

Honestly you just need to take your butthurt attitude elsewhere.

Your preconceptions betray you.

I donated to the WM foundation for years. After learning about the massive amounts they waste on projects unrelated to their purpose - as well as their hoarding of assets while constantly begging for money under false pretense, I stopped. If you spend some time looking into it, you'll see that the foundation has lost its way and misuses donations. I made a rational decision to stop donating. Again - they whimper about the load on their servers, while spending two percent of their revenue on maintaining it. Ridiculous.

I've been editing on WP continuously for nineteen years. Since you say you're extremely grateful for the volunteers who spend endless hours providing content (in my case, refining content, fixing grammar/spelling/syntax, verifying sources, adding sources, stopping vandals, deterring POV pushers and much more) I'll ignore your final sentence, and instead say 'you're welcome'.
 
Last edited:
I've been editing on WP continuously for nineteen years. Since you say you're extremely grateful for the volunteers who spend endless hours providing content (in my case, refining content, fixing grammar/spelling/syntax, verifying sources, adding sources, stopping vandals, deterring POV pushers and much more) I'll ignore your final sentence, and instead say 'you're welcome'.

I appreciate that you do that. Some of these topics are orthogonal to each other. I'm sure there is vigorous community debate regarding how WM manages its funds. I pay $17/month for Netflix to watch 1 or 2 shows, and I donate less than that to wikipedia even though I use it multiple times a day. To me it's still worth it, and I'm happy to keep donating.

How do you feel about the fact that Open AI, Google, Facebook etc are using your hard work to enrich their shareholders?
 
Since Wikipedia is full of lies, misinformation and propaganda, training an AI is idiocracy.
MV5BNmNhN2NiOGUtOGM0Yy00ZjA3LTljNjAtNGZjZDZlNzdlZDcxXkEyXkFqcGc@._V1_FMjpg_UX1000_.jpg
 
How do you feel about the fact that Open AI, Google, Facebook etc are using your hard work to enrich their shareholders?

If I were being forced to put in the labor for free, I might care.

I'm not a slave, and I'm not being impoverished by someone else making a profit. Wikipedia chooses to make the information freely available. There's nothing wrong with profit, if it's consensual. Wikipedia could block all of those orgs if they chose to. Or they could alter their terms of use and explicitly require for-profit organizations to pay to play. They could also pay editors who make meaningful contributions, and open a hornet's nest of ethical issues and complications in doing so.

In perspective, whatever pennies those orgs make by re-using WP content pales in comparison to their primary revenue streams.

It's worth noting that Open AI isn't a publicly traded company, there are no shareholders that are being enriched thanks to WP. They're entirely funded by venture capital, in other words, rich people are paying billions to keep OpenAI running. Their net revenue is negative. Eventually they hope to make a profit; that remains to be seen. I happen to find Anthropic Claude vastly superior to ChatGPT, and they're similarly funded.
 
Back