OpenAI partners with Reddit to put users' posts in ChatGPT

midian182

Posts: 11,689   +177
Staff member
What just happened? OpenAI has entered into a partnership with Reddit that will allow it to surface user discussions from the site within ChatGPT and other products. In exchange, Reddit will start offering OpenAI-powered AI features to users and mods, and use the company's LLMs to build apps. OpenAI will also become an advertising partner on Reddit.

In a joint statement announcing the partnership, the companies say that the internet needs to be kept open, and part of that means Reddit content needs to be "accessible to those fostering human learning and researching ways to build community, belonging, and empowerment online."

OpenAI will access Reddit's Data API, providing real-time content from the platform. Reddit says the partnership does not change Reddit's Data API Terms or Developer Terms, which state content accessed through the API cannot be used for commercial purposes without Reddit's approval. There was no mention of financial terms in the announcement.

Earlier this year, Reddit signed a similar contract with Google, reportedly worth around $60 million, allowing the tech giant to train AI models such as Gemini on Reddit's massive trove of user-generated content. As with OpenAI, the deal allows Google access to Reddit's Data API.

"Reddit has become one of the internet's largest open archives of authentic, relevant, and always up to date human conversations about anything and everything," said CEO Steve Huffman. "Including it in ChatGPT upholds our belief in a connected internet, helps people find more or what they're looking for, and helps new audiences find community on Reddit."

News of the OpenAI partnership pushed Reddit's shares up 11% in extended trading yesterday, which should be welcome news for Sam Altman. The OpenAI CEO was once a Reddit board member and remains a major shareholder with a stake that is now worth around $750 million. There is a disclosure in the partnership announcement that states the deal was led by OpenAI's COO and approved by its independent Board of Directors.

News of the partnership is unlikely to be welcomed by Reddit users, based on how they reacted to news of the Google partnership. There was also the reaction to the API pricing changes last year that saw more than 8,500 subreddits go dark (private/restricted) in protest.

OpenAI had another interaction with Reddit last week when the company filed a complaint against a ChatGPT-focused subreddit for copyright infringement. It eventually backed down, likely due to the hypocrisy of a company that has been sued multiple times for stealing work accusing someone else of doing the same thing.

Permalink to story:

 
AI isn't my field, but I'm wondering if there's specific techniques used for low signal-to-noise cases like Reddit (or maybe Open AI just doesn't care, as long as it gives it something to say.) A particular challenge is nonsense but highly upvoted posts that such as ones that are intentionally sarcastic, meme-y, mocking something else, populist on an unrelated issue, etc. etc. How is AI going to recognize the difference?
 
It doesn't seem legal or ethical. But who am I to say?

Assuming it is scraping just public posts (vs. DMs or private areas if there are any), I'm not sure I see the legal problem. Reddit users already knew and intended for their posts to be publicly visible on the internet. Search engines have been indexing these posts for many years. I'm not sure what claim someone could have against this, or why public policy would want to impede it.
 
AI continues to provide zero benefits to normal people, just taking their jobs, making it harder to talk to real people when they need support, stealing their data and conversations and making what you read and watch on the internet even more untrustworthy and fake. It's seems only the extremely wealthy are getting anything out of this hype and nonsense.
 
It’s sounds like AI will eventually end up in a situation where it’s garbage in garbage out, since they are training it based on random inputs.
 
AI isn't my field, but I'm wondering if there's specific techniques used for low signal-to-noise cases like Reddit (or maybe Open AI just doesn't care, as long as it gives it something to say.) A particular challenge is nonsense but highly upvoted posts that such as ones that are intentionally sarcastic, meme-y, mocking something else, populist on an unrelated issue, etc. etc. How is AI going to recognize the difference?

Those who train the AI have many ways to choose what subset of data is to be used and how to tell the AI what is more important to learn. I'm not sure anyone will look at the upvotes, and if they are looked at, it might be in a specific context, such as a help subreddit, where upvoted comments might be more indicative of helpful replies.
 
All these companies want to use Reddit but here I am remembering how wild the old Reddit days were..... It's going to be Microsoft's AI all over again. That thing will come out messed up.
 
Back