Nightshade tool can "poison" images to thwart AI training and help protect artists

midian182

Posts: 9,745   +121
Staff member
Why it matters: One of the many concerns about generative AIs is their ability to generate pictures using images scrapped from across the internet without the original creators' permission. But a new tool could solve this problem by "poisoning" the data used by training models.

MIT Technology Review highlights the new tool, called Nightshade, created by researchers at the University of Chicago. It works by making very small changes to the images' pixels, which can't be seen by the naked eye, before they are uploaded. This poisons the training data used by the likes of DALL-E, Stable Diffusion, and Midjourney, causing the models to break in unpredictable ways.

Some examples of how the generative AIs might incorrectly interpret images poisoned by Nightshade include turning dogs into cats, cars into cows, hats into cakes, and handbags into toasters. It also works when prompting for different styles of art: cubism becomes anime, cartoons become impressionism, and concept art becomes abstract.

Nightshade is described as a prompt-specific poisoning attack in the researchers' paper, recently published on arXiv. Rather than needing to poison millions of images, Nightshade can disrupt a Stable Diffusion prompt with around 50 samples, as the chart below shows.

Not only can the tool poison specific prompt terms like "dog," but it can also "bleed through" to associated concepts such as "puppy," "hound," and "husky," the researchers write. It even affects indirectly related images; poisoning "fantasy art," for example, turns prompts for "a dragon," "a castle in the Lord of the Rings," and "a painting by Michael Whelan" into something different.

Ben Zhao, a professor at the University of Chicago who led the team that created Nightshade, says he hopes the tool will act as a deterrent against AI companies disrespecting artists' copyright and intellectual property. He admits that there is the potential for malicious use, but inflicting real damage on larger, more powerful models would require attackers to poison thousands of images as these systems are trained on billions of data samples.

There are also defenses against this practice that generative AI model trainers could use, such as filtering high-loss data, frequency analysis, and other detection/removal methods, but Zhao said they aren't very robust.

Some large AI companies give artists the option to opt out of their work being used in AI-training datasets, but it can be an arduous process that doesn't address any work that might have already been scraped. Many believe artists should have the option to opt in rather than having to opt out.

Permalink to story.

 
Okay, until they train an AI to to combat the anti-AI technique.

I don't care if people are for or against this AI stuff, it's here to stay and it's only getting better
 
The beauty of this is when you think about, things like this are also going to become the norm as well and investment into this technology will only grow alongside the growth of AI. The reason? Because eventually companies will see they have financial incentive to do it, either they're going to own large legal repositories of images they want to lease to AI developers and prevent them from getting free rides or they're going to be an AI developer themselves wanting to prevent competitors from easily using the image data they themselves have locked up in their own repositories.
 
Okay, until they train an AI to to combat the anti-AI technique.

I don't care if people are for or against this AI stuff, it's here to stay and it's only getting better
Except for the examples where it has actually gotten worse. The reality is that AI will only ever be as good as the data it is trained on. It cannot know absolutes; it also doesn't have intuition. And the more you have to code it to certain perspectives, the more bias it will become. AI cannot change its mind based on intuition, feelings, emotions, etc which are sometimes more accurate than what is being present as 'facts'. I say this because we all know that 'facts' can be manipulated, skewed, partially withheld, or just downright falsehoods. If AI is programmed to prefer certain sources of information, then guess what, those sources can be manipulated. AI will only ever be as good as what it is trained and being trained to accept as reality.
 
Except for the examples where it has actually gotten worse. The reality is that AI will only ever be as good as the data it is trained on. It cannot know absolutes; it also doesn't have intuition. And the more you have to code it to certain perspectives, the more bias it will become. AI cannot change its mind based on intuition, feelings, emotions, etc which are sometimes more accurate than what is being present as 'facts'. I say this because we all know that 'facts' can be manipulated, skewed, partially withheld, or just downright falsehoods. If AI is programmed to prefer certain sources of information, then guess what, those sources can be manipulated. AI will only ever be as good as what it is trained and being trained to accept as reality.
Aside from the fact that we can train AI to sort through data to look for manipulation. We are also getting very close to AI's that can train themselves. Many people are referring to this as the "singularity" moment for AI and it is coming. There are people in the AI industry who are predicting that the singularity could happen before 2030
 
Aside from the fact that we can train AI to sort through data to look for manipulation. We are also getting very close to AI's that can train themselves. Many people are referring to this as the "singularity" moment for AI and it is coming. There are people in the AI industry who are predicting that the singularity could happen before 2030
There are people in industries who have made grand predictions before that have not come to pass. AI can do some really cool things. However, it is not a mind. As I said, it does not have intuition and emotions. That might seem like strength from a purely logical standpoint, but those things can inform us about reality in ways that raw information cannot. The world we live in is not just information, a mind can process that, a computer cannot.
 
AI is not magic, as it is simply networks of algorithms providing the results they think the user expects based on data they have been given, they are not magic, although this solution seems pretty temporary considering all they have to do is tell the model with new data "by the way, this can happen, do this with these images, and this isnthe expected result", so very cat and mouse
 
Lets just cut to the chase here.

What's the ETA on the worlds largest class action lawsuit pitting creators vs AI companies?
 
There are people in industries who have made grand predictions before that have not come to pass. AI can do some really cool things. However, it is not a mind. As I said, it does not have intuition and emotions. That might seem like strength from a purely logical standpoint, but those things can inform us about reality in ways that raw information cannot. The world we live in is not just information, a mind can process that, a computer cannot.
This is one of those arguments that annoys me to no end. This is because it implies that somehow, humans are "special" and we're just not. We're selfish and ignorant leading to us destroying the world around us. Somehow we think that consciousness makes us superior to other living things, which it doesn't, and that anything that isn't biological is not capable of conscious. Consideirng we can't define conciousness we can't say if non-biological entities are concious or not.

I don't think AI has reached the point that it has emotions or desires but I also don't think it's impossible. However, I also have a different perspective on this whole thing than most people because I was born with an incurable neurological disease. This gives me a different perspective on what consciousness is for a few reasons. The biggest is that in order to keep myself alive I have had to learn about the brain since I was about 7 years old. No, I'm not a doctor, but I've been studying neuroscience longer at this point than nearly all of my Neurologists. I had to learn about this because I had to be aware of when parts of my brain were malfunctioning. I had to be able to quickly and accurately recognize why a certain part of my body wasn't functioning correctly, diagnose what part of my brain that was happening in and then administer the appropriate treatment myself.

Your brain is a meat machine and all it does is process information through sensory organs. I don't know if you believe in a "spirit" or whatever and if that somehow makes organic beings superior to inorganic one, but that doesn't change the fact that it's just a program is running on silicon instead of carbon. Your brain is a computer that has been programmed by a series of sensory experiences over the course of your life. Interestingly enough, your mind changes over time. It's how we build up coordination, it's how we "memorize" how to spell words or recognize images.

as someone with a very rare type of epilepsy, I have learned to directly interact with these parts of my brain overtime. I have learned how to recognize when one of my brain isn't acting the way it should. Sometimes my spelling will be atrocious and I'll know that my superior parietal lobule has decided it didn't want to work right. On test I use for my Olfactory Cortex is to see if I like the taste of Pepsi or not. If that part of my brain isn't working correctly then Pepsi will taste like bleach. Normally it starts as I will smell something unusual and I literally call it the Pepsi test.

So I have to recognize subtle ways that my brain is essentially causing errors. This has always made me very interested in AI and I was very big into AI sci-fi ever since I was a teenager.

So lets start with something basic, I see videos on youtube of people using 4090's or GPUs to train AI to do things like play games. That's all well and good, but AI's that generate images from a line of text operating VASTLY differently than teaching something on your computer to play minecraft using a 4090. What would take a whole rack weeks to do just 5 years ago a 4090 can do in a few hours. When I'm working with AI I can see the errors that it is making and I can draw very close paralells to "it's creating this error, that would happen in this secotion of my brain and show these symptoms." It's a very humbling perspective to have.

The thing is, you're brain is just a computer that you programmed through your life experiences. We don't know the algorithms that your brain is to remember your first kiss. We are no longer working with chatbots. See, we can't marginalize AI because we don't know enough about how "real" brains work. There is a mindset that AI can never be concious, that it will never be equal to us because we are alive and it is not. What bothers me about that is without electricity your brain cannot process information. Your brain requires the same thing to run that AI does.

So this might all sound like some philisophical mumbo-jumbo, and perhaps in some way it is, but to marganlize AI and place ourselves above it is a very dangerious game. But you don't have to take my work for it, maybe you should as Ilya Sutskever what he carries around in that backpack of his. People who know more about this than either you or me are afraid and that should make us afraid. If you aren't concerned about it then you are dumber than you accuse the AI of being.

But in typical dumb human fashion, we're more worried about AI taking our money than we are about it taking our lives. Then we downplay its significance so we can continue living in the delusions we create for ourselves in our heads.
 
This is because it implies that somehow, humans are "special" and we're just not.
Yeah, TLDR, you lost me at this point. You are going to advocate for AI which humans created by starting with that statement. It's lunacy.
 
Yeah, TLDR, you lost me at this point. You are going to advocate for AI which humans created by starting with that statement. It's lunacy.
Right there with you EdmondRC. Who in the hell says, "Humans are not special". I guess a person with very little if any self-confidence or worth.
 
I don't care if people are for or against this AI stuff, it's here to stay and it's only getting better

It's here to stay till the investment bubble bursts, just like the .com bubble. After which 1 or 2 companies will survive, only if they are able to provide any real value out of their AI tools.

I am a doctor and tried using GPT, Bing AI, OpenEvidence and Elicit for about 3-4 months. But trust me, at its present state, they are all sh*t. GPT has outdated data from websites which you cannot really trust for medical info. BingAI is up to date but also quotes untrusted and quack websites, and don't forget about the hallucinations. It is like misquoting sources and has happened ample times to me. Elicit is good as long as you use it just to surf some papers and don't rely on its generated summary. OpenEvidence is best of them all but still quotes rat studies and gives it equal importance to randomized human controlled trials, does not care about sample size used in studies, and frequently misquotes studies leading to exactly opposite conclusions.

It's high time people using these services realize that every GenAI is an LLM which is able to blurt out grammatically correct sentences. It doesn't ensure that the output sentence is factually correct or not.

Only thing it is good for is creating stories and art which are subjective and not objective or factual and it has achieved this by gobbling up large amounts of data which was illegally obtained and is now being illegally sold as subscription service by these GenAI companies.
 
It's here to stay till the investment bubble bursts, just like the .com bubble. After which 1 or 2 companies will survive, only if they are able to provide any real value out of their AI tools.

I am a doctor and tried using GPT, Bing AI, OpenEvidence and Elicit for about 3-4 months. But trust me, at its present state, they are all sh*t. GPT has outdated data from websites which you cannot really trust for medical info. BingAI is up to date but also quotes untrusted and quack websites, and don't forget about the hallucinations. It is like misquoting sources and has happened ample times to me. Elicit is good as long as you use it just to surf some papers and don't rely on its generated summary. OpenEvidence is best of them all but still quotes rat studies and gives it equal importance to randomized human controlled trials, does not care about sample size used in studies, and frequently misquotes studies leading to exactly opposite conclusions.

It's high time people using these services realize that every GenAI is an LLM which is able to blurt out grammatically correct sentences. It doesn't ensure that the output sentence is factually correct or not.

Only thing it is good for is creating stories and art which are subjective and not objective or factual and it has achieved this by gobbling up large amounts of data which was illegally obtained and is now being illegally sold as subscription service by these GenAI companies.
Well if a PERSON said something they obtained from a bad source we'd just call them wrong. If an AI says something because they obtained it from bad source we call it stealing. But considering how things are becoming more dependent on AI, is it ethical to only allow AI to harvest information from incorrect sources?

We're going back to the early Wikipedia days. And considering how "special" humans are....excuse me, I mean lazy, I'm certain we're going to be more inclined to ask an AI for answers than actually research them ourselves
 
And considering how "special" humans are....excuse me, I mean lazy, I'm certain we're going to be more inclined to ask an AI for answers than actually research them ourselves
That is exactly my issue with the whole GenAI thing. People using it for all the unintended purposes, it was never designed for, which will be a disaster given AI accuracy does not improve.

I am impressed with its content generation abilities. Great for writing stories and art (even though trained over illegally sourced data without compensating original creators), but as a doctor I don't want generated facts, no thanks. And I won't hesitate to sh*t on someone quoting an answer from BingAI, which is the worst one and the most public facing one given how aggressive Microsoft has got with including it in Windows as Copilot.
 
Last edited:
Well if a PERSON said something they obtained from a bad source we'd just call them wrong. If an AI says something because they obtained it from bad source we call it stealing.
That PERSON is not selling subscription services for giving out information.
OpenAI does charge 20$ for GPT4. It also charges money from companies building similar solutions using its API and resources. 🙂
 
Back