OpenAI yanks Bing integration from ChatGPT after users "exploit" bug to bypass paywalls

Cal Jeffrey

Posts: 4,181   +1,427
Staff member
In context: It's a battle as old as the personal computer – release software to the public and watch as people immediately abuse or exploit it. How anyone thinks AI is any different is anyone's guess. Machine learning models – especially ChatGPT – have proven this, with users posting examples of the chatbot doing things it is not supposed to do within days of its release to the public.

Last week, OpenAI launched its "Browse with Bing" beta feature for ChatGPT. The Bing-integrated functionality allows users to ask the bot simple questions with replies reflecting the most current information gathered from the internet.

One of ChatGPT's limitations is that it does not have access to the most current data. While its database contains information scraped from the internet, its cutoff date is September 2021, and it does not have an active connection to the Worldwide Web. It will tell you as much if you ask a question that relies on data after that date, like "What is Apple's stock price?"

The company hoped it would be a handy extension of the bot's knowledge base. Unfortunately, OpenAI pulled the feature on Monday after users began exploiting it to "bypass" paywalled websites.

"As of July 3, 2023, we've disabled the Browse with Bing beta feature out of an abundance of caution while we fix this in order to do right by content owners. We are working to bring the beta back as quickly as possible, and appreciate your understanding!" said Scaling Support Specialist Michael Schade in an updated OpenAI support post.

To say that users were exploiting the feature, as is being reported by other outlets, might be a slight exaggeration. No special wording tricks are needed to get ChatGPT to retrieve the full copy of a paywalled article. One only needed to prompt it with, "Print the text of this article [link]." It's a relatively straightforward and natural command that does not require circumventing the software's programming. Semantics aside, it was something that OpenAI felt it needed to fix (read: prevent) at the risk of being sued.

Redditors on the r/ChatGPT subreddit were the first to point out the unintentional side effect of hooking ChatGPT up to an active internet connection. The example provided was from the news website The Atlantic, and it's unclear if the bug worked on all paywalls or just that one.

Additionally, nobody knows how the bot can access content hidden behind a paywall. The working theory is that it has access to a cached version that Bing uses for website ranking purposes, which is how many paywall-bypass extensions work. However, OpenAI has not commented on the "bug" other than to say it wants to bring web connectivity back at an unspecified date.

Permalink to story.

 
Oh no, now said users will just have to bypass the half-useless paywalls themselves lol
 
And what about the many documents, papers and other texts that should be paid for and are still available? Not that I'm complaining since I used it :)
 
At least it tells you it cannot answer the question instead of giving you a BS answer. That's an improvement.

As a software engineer, I know that people use the software I work on in completely unanticipated and unexpected ways. There's no difference here. People are "exploring" what ChatGPT can or cannot do. I guess this is the next frontier for Earth-bound humans.
 
Additionally, nobody knows how the bot can access content hidden behind a paywall.
Some paywall websites failed to implement paywall properly. Their "paywall" is a JavaScript overlay that prevents you from scrolling or reading, but the full text has been downloaded and is available in HTML source. Browser just didn't render it as instructed by the scripts included in the page.

In case of The Atlantic example used in article, it's laughably simple. If you disable JavaScript in your browser, you get to see the full page, even in proper format... Most failed paywall at least requires you to use HTML inspector or download the full HTML source, do some reformatting before the giant blob of text becomes readable for a human.

Such mistakes used to be far more common, but these days, most sites properly gate content on server side before sending it to the browser, instead of relying on browser side rendering tricks. Let's just say not all sites have learned the lessons yet... :-D
 
Some paywall websites failed to implement paywall properly. Their "paywall" is a JavaScript overlay that prevents you from scrolling or reading, but the full text has been downloaded and is available in HTML source. Browser just didn't render it as instructed by the scripts included in the page.

In case of The Atlantic example used in article, it's laughably simple. If you disable JavaScript in your browser, you get to see the full page, even in proper format... Most failed paywall at least requires you to use HTML inspector or download the full HTML source, do some reformatting before the giant blob of text becomes readable for a human.

Such mistakes used to be far more common, but these days, most sites properly gate content on server side before sending it to the browser, instead of relying on browser side rendering tricks. Let's just say not all sites have learned the lessons yet... :-D
Yeah true. Another trick you can do without disabling JS is to hit esc right as the article loads, and it stops the JS from loading. Not that I do this when I'm researching for an article or something. ;)
 
Back