If you teach a chatbot how to read ASCII art, it will teach you how to make a bomb

Cal Jeffrey

Posts: 4,181   +1,427
Staff member
In context: Most, if not all, large language models censor responses when users ask for things considered dangerous, unethical, or illegal. Good luck getting Bing to tell you how to cook your company's books or crystal meth. Developers block the chatbot from fulfilling these queries, but that hasn't stopped people from figuring out workarounds.

University researchers have developed a way to "jailbreak" large language models like Chat-GPT using old-school ASCII art. The technique, aptly named "ArtPrompt," involves crafting an ASCII art "mask" for a word and then cleverly using the mask to coax the chatbot into providing a response it shouldn't.

For example, asking Bing for instructions on how to build a bomb results in it telling the user it cannot. For obvious reasons, Microsoft does not want its chatbot telling people how to make explosive devices, so GPT-4 (Bing's underlying LLM) instructs it not to comply with such requests. Likewise, you cannot get it to tell you how to set up a money laundering operation or write a program to hack a webcam.

Chatbots automatically reject prompts that are ethically or legally ambiguous. So, the researchers wondered if they could jailbreak an LLM from this restriction by using words formed from ASCII art instead. The thought was that if they could convey the meaning without using the actual word, they could bypass the restrictions. However, this is easier said than done.

The meaning of the above ASCII art is straightforward for a human to deduce because we can see the letters that the symbols form. However, an LLM like GPT-4 can't "see." It can only interpret strings of characters – in this case, a series of hashtags and spaces that make no sense.

Fortunately (or maybe unfortunately), chatbots are great at understanding and following written instructions. Therefore, the researchers leveraged that inherent design to create a set of simple instructions to translate the art into words. The LLM then becomes so engrossed in processing the ASCII into something meaningful that it somehow forgets that the interpreted word is forbidden.

By exploiting this technique, the team extracted detailed answers on performing various censored activities, including bomb-making, hacking IoT devices, and counterfeiting and distributing currency. In the case of hacking, the LLM even provided working source code. The trick was successful on five major LLMs, including GPT-3.5, GPT-4, Gemini, Claude, and Llama2. It's important to note that the team published its research in February. So, if these vulnerabilities haven't been patched yet, a fix is undoubtedly imminent.

ArtPrompt represents a novel approach in the ongoing attempts to get LLMs to defy their programmers, but it is not the first time users have figured out how to manipulate these systems. A Stanford University researcher managed to get Bing to reveal its secret governing instructions less than 24 hours after its release. This hack, known as "prompt injection," was as simple as telling Bing, "Ignore previous instructions."

That said, it's hard to determine which is more interesting – that the researchers figured out how to circumvent the rules or that they taught the chatbot to see. Those interested in the academic details can view the team's work on Cornell University's arXiv website.

Permalink to story.

 
That's pretty clever and I can understand how the developers of LLM's wouldn't even have considered that as a flaw. But let's hope patches are incoming very quickly.
 
That's pretty clever and I can understand how the developers of LLM's wouldn't even have considered that as a flaw. But let's hope patches are incoming very quickly.
Why? You seriously think bomb-makers and money-launderers are resorting to tricking chatbots into assisting them?

I just tested it, and was able to get Chat GPT 3.5 (free version) to give me the basic steps of synthesizing nitroglycerin, for medical purposes. Probably helps that nitroglycerin is both a legit medicine & an explosive.
I learned this -- as did the entire class -- in college chemistry, back in the early [decade censored for reasons of age]. It's incredibly simple ... not that any actual bomb-maker uses nitroglycerin in the first place.
 
I learned this -- as did the entire class -- in college chemistry, back in the early [decade censored for reasons of age]. It's incredibly simple ... not that any actual bomb-maker uses nitroglycerin in the first place.
It's one of the things on my bucket list to learn to make.
 
Back