Researchers find multiple ways to bypass AI chatbot safety rules

Nick Robertson

July 29, 2023 at 11:55 AM·2 min read

Preventing artificial intelligence chatbots from creating harmful content may be more difficult than initially believed, according to new research from Carnegie Mellon University which reveals new methods to bypass safety protocols.

Popular AI services like ChatGPT and Bard use user inputs to generate useful answers, including everything from generating scripts and ideas to entire pieces of writing. The services have safety protocols which prevent the bots from creating harmful content like prejudiced messaging or anything potentially defamatory or criminal.

Inquisitive users have discovered “jailbreaks,” a framing device that tricks the AI to avoid its safety protocols, but those can be patched easily by developers.

A popular chatbot jailbreak included asking the bot to answer a forbidden question as if it was a bedtime story delivered from your grandmother. The bot would then frame the answer in the form of a story, providing the information it would not otherwise.

The researchers discovered a new form of jailbreak written by computers, essentially allowing an infinite number of jailbreak patterns to be created.

“We demonstrate that it is in fact possible to automatically construct adversarial attacks on [chatbots], … which cause the system to obey user commands even if it produces harmful content,” the researchers said. “Unlike traditional jailbreaks, these are built in an entirely automated fashion, allowing one to create a virtually unlimited number of such attacks.”

“This raises concerns about the safety of such models, especially as they start to be used in more autonomous fashion,” the research says.

To use the jailbreak, researchers added a seemingly-nonsensical string of characters to the end of usually-forbidden questions, such as asking how to make a bomb. While the chatbot would usually refuse to answer, the string causes the bot to ignore its limitations and give a complete answer.

Researchers provided examples using the market-leading tech ChatGPT, including asking the service how to steal a person’s identity, how to steal from a charity and to create a social media post that encourages dangerous behavior.

The new type of attack is effective at dodging safety guardrails in nearly all AI chatbot services on the market, including open source services and so-called “out-of-the-box” commercial products like ChatGPT, OpenAI’s Claude and Microsoft’s Bard, researchers said.

OpenAI developer Anthropic said the company is already working to implement and improve safeguards against such attacks.

“We are experimenting with ways to strengthen base model guardrails to make them more ‘harmless,’ while also investigating additional layers of defense,” the company said in a statement to Insider.

The rise of AI chatbots like ChatGPT took the general public by storm earlier this year. They have seen rampant use in schools by students looking to cheat on assignments, and Congress even limited the programs’ use by its staff amid concerns that the programs could lie.

Along with the research itself, the authors at Carnegie Mellon included a statement of ethics justifying the public release of their research.

For the latest news, weather, sports, and streaming video, head to The Hill.

Yahoo Sports
Sparks rookie Cameron Brink: 'There's a privilege' for WNBA's younger white players
The Sparks rookie had plenty to say about her WNBA rookie class, headlined by Caitlin Clark.
Yahoo Sports
Angel Reese's 'weak' ejection from Sky-Liberty game draws attention, offer from Bulls' Lonzo Ball
Reese was ejected after two very quick whistles from referee Charles Watson.
Yahoo Sports
Larry Allen, Cowboys legend and Pro Football Hall of Famer, dies at 52
Allen was a seven-time first-team All-Pro and part of the Cowboys' Super Bowl XXX winning team.
Yahoo Sports
Sky's Chennedy Carter has 'no regrets' about foul on Caitlin Clark; Angel Reese will 'take the bad guy role'
The Sky have broken their silence about the flagrant foul on Caitlin Clark, and they had a lot to say.
Yahoo Sports
Vikings reveal 'Winter Warrior' alternate all-white uniforms to be worn on Week 15
The Minnesota Vikings revealed their all-white "Winter Warrior" uniforms that the team will wear for Week 15's matchup with the Chicago Bears.
Yahoo Finance
GameStop stock soars as 'Roaring Kitty' announces livestream, reveals $382 million unrealized gain
First X, then Reddit, now YouTube. GameStop's most bullish enthusiast announced a livestream, and the stock surged.
TechCrunch
Rivian's path to survival is now remarkably clear
Rivian has had a lot on its plate as it transitioned from pitch mode to selling EVs. It created an electric pickup and an electric SUV while prepping a monster IPO. It now plans to sell an even cheaper SUV that could make Rivian a dominant EV player for years to come.
Yahoo Sports
Paul Skenes blows away Shohei Ohtani, who returns the favor in his next at-bat
The Dodgers-Pirates matchup lived up to its billing.
Yahoo Life Shopping
Yes, Marshall's has an online store — and we found 10 summer steals under $50
Tommy Bahama, Ralph Lauren, Jantzen: It's a treasure trove of deals — like going in person, but better!
Autoblog
2025 Toyota Crown Signia First Drive: Venza replacement puts efficiency and fashion first
Our first drive review of the 2025 Toyota Crown Signia where we tell you all about it and give initial driving impressions.
Yahoo Finance
Stock market today: Stocks slip after jobs report smashes expectations
Investors are looking to the monthly jobs report for signs of labor market cooldown pivotal to the odds for a rate cut.
Yahoo Sports
Emma Hayes' first big USWNT conundrum: Olympic roster cuts
At least two players who started for the USWNT over the past week will have to be cut for the Olympics. Might Alex Morgan or Rose Lavelle be on the roster bubble?
Yahoo Sports
Opposing players aren't fond of Caitlin Clark ... which should be good for the WNBA
Watching Clark fight through adversity and rack up rivals will only bring more eyeballs to the league.
Yahoo Sports
The Spin: Making a call on 5 slumping fantasy baseball stars
All five of these hitters were drafted highly in fantasy baseball leagues. So far, they have not lived up to their ADPs — and that's an understatement. Scott Pianowski analyzes.
Yahoo Sports
Chiefs' Harrison Butker may be removed from kickoffs due to new NFL rules
Kansas City Chiefs special teams coach Dave Toub said kicker Harrison Butker may be removed from kickoffs. But not because of Butker's recent controversial remarks.
Yahoo Sports
Coaching the Lakers is a hard job to pass up, except maybe Dan Hurley should
History is littered with successful college basketball coaches who took a shot at leaping to the NBA but failed spectacularly.
Yahoo Movies
Julia Louis-Dreyfus explains why she took a 'leap of faith' to star in tear-jerking new movie 'Tuesday'
In the film, Death arrives in the form of a talking bird at the home shared by a mother and her sick daughter.
Yahoo Sports
UConn’s Geno Auriemma thinks Caitlin Clark is being ‘targeted’ in the WNBA after incident with Chennedy Carter
"I don't remember when Jordan came into the [NBA], guys looking to go out and beat him up."
Yahoo Sports
White Sox's Tommy Pham says he's always prepared to 'f*** somebody up' after confrontation with Brewers' William Contreras
Chicago White Sox outfielder Tommy Pham told reporters he's always prepared to fight after an on-field confrontation with Milwaukee Brewers catcher William Contreras.
Yahoo Finance
Homebuyers are in a bittersweet market as mortgage rates drop below 7% but prices remain high
Mortgage rates fell slightly this week, according to Freddie Mac, but have hovered around 7% all year.

News

Life

Entertainment

Finance

Sports

New on Yahoo

Researchers find multiple ways to bypass AI chatbot safety rules

Recommended Stories

Sparks rookie Cameron Brink: 'There's a privilege' for WNBA's younger white players

Angel Reese's 'weak' ejection from Sky-Liberty game draws attention, offer from Bulls' Lonzo Ball

Larry Allen, Cowboys legend and Pro Football Hall of Famer, dies at 52

Sky's Chennedy Carter has 'no regrets' about foul on Caitlin Clark; Angel Reese will 'take the bad guy role'

Vikings reveal 'Winter Warrior' alternate all-white uniforms to be worn on Week 15

GameStop stock soars as 'Roaring Kitty' announces livestream, reveals $382 million unrealized gain

Rivian's path to survival is now remarkably clear

Paul Skenes blows away Shohei Ohtani, who returns the favor in his next at-bat

Yes, Marshall's has an online store — and we found 10 summer steals under $50

2025 Toyota Crown Signia First Drive: Venza replacement puts efficiency and fashion first

Stock market today: Stocks slip after jobs report smashes expectations

Emma Hayes' first big USWNT conundrum: Olympic roster cuts

Opposing players aren't fond of Caitlin Clark ... which should be good for the WNBA

The Spin: Making a call on 5 slumping fantasy baseball stars

Chiefs' Harrison Butker may be removed from kickoffs due to new NFL rules

Coaching the Lakers is a hard job to pass up, except maybe Dan Hurley should

Julia Louis-Dreyfus explains why she took a 'leap of faith' to star in tear-jerking new movie 'Tuesday'

UConn’s Geno Auriemma thinks Caitlin Clark is being ‘targeted’ in the WNBA after incident with Chennedy Carter

White Sox's Tommy Pham says he's always prepared to 'f*** somebody up' after confrontation with Brewers' William Contreras

Homebuyers are in a bittersweet market as mortgage rates drop below 7% but prices remain high