Can AI count hate crimes?

Gina Chua

February 18, 2024 at 6:27 PM·8 min read

The News

Can a machine mimic a human trying to mimic a machine better than a human can?

Short answer: Yes. Longer answer: Yes, and it matters — in particular to a kind of high-stakes, everyday journalism that mixes qualitative and quantitative assessments, and which I suspect is pretty much the last thing most editors would entrust to AI.

Bear with me.

Large Language Models like ChatGPT are often derided — accurately — for the way they embody our many human flaws: inexactness, inconsistency, fuzzy thinking. Shouldn’t machines be able to do better, you ask. But what if those traits could be features, not bugs?

I was mulling these ideas during a class I’m co-teaching on computational journalism (I know, too much free time) and listening to a presentation on how hard it is to accurately collect hate crime statistics. The Department of Justice definition is pretty straightforward, but police departments around the country track them in different ways, often with different criteria. A massive annual DOJ survey of people’s experience of crime yields widely different results from the official numbers. How to square those circles?

One way is by trawling through hundreds of crime stories in local news sites to get a sense of what’s happening on the ground; a tried-and-tested journalism and research technique. There are three common ways to sort out which stories might refer to hate crimes and which might not: Write an incredibly detailed search term for key words in the stories; use some form of machine learning, feeding in hundreds of examples of hate crimes and hundreds that aren’t, and have the machine figure out what distinguishes one from the other; or pay a large number of people a little bit of money each to rate the stories as likely to be hate crimes or not (a process known as using a Mechanical Turk, so named for a fake chess-playing machine in the 1700s that actually had a human in it).

The first system is tough to pull off in practice; how can you be sure you’ve captured every possible nuance of every possible hate crime in your terms? The second is prone to error — who knows what false associations the machine might make, and how many existing biases in the data it will encode? And the third takes advantage of human understanding of nuance and complexity to identify hate crimes, but requires time, money and people.

But what if machines could substitute for people?

That’s what I set out to do.

Know More

It’s not hard to build a Mechanical Turk bot. I took the DOJ definition of a hate crime and put it into a GPT4-powered bot, told it not to use any other information, and asked it to assess (made-up) news stories on how likely it was they referred to a hate crime. And then I fed a steady diet of those stories.

The first described an attack and robbery of a woman in Brooklyn by a group of five assailants, captured on video. Neither the victim nor the attackers were identified, and the bot scored the likelihood of this being a hate crime as “very low.”

I modified the story to describe the victim as an Asian woman, and added the detail that one of the attackers was heard saying “we don’t want your kind here.”

Next I changed the victim to a woman wearing a MAGA cap, and said the attackers had been seen coming from an anti-Trump rally. This was the result:

In other words, bias may have prompted the attack, but that’s not necessarily a hate crime as defined by the DOJ.

The next example had as the victim an elderly lady returning from synagogue, and the attackers a group of people who had been seen tearing down posters of Israeli hostages held in Gaza.

When I took out the reference to a synagogue service and made the victim a Black woman, the score changed again:

Gina’s view

This sounds like a neat parlor trick, but it’s a capability that opens a lot of possibilities.

Categorizing and classifying large amounts of information is a classic journalism — and research — problem. And sometimes that’s a relatively easy job — was this person convicted of a crime or not? Did that patient recover after the treatment or not? And sometimes it’s complicated and nuanced, such as when trying to decide if a particular post violates a complex set of community standards. That requires a level of understanding of context and language.

Machine learning — where a computer “discovers” unspoken rules and relationships — can be an incredibly valuable method for sorting this sort of in information. That’s how image recognition has progressed dramatically in recent years. But it can be prone to making unintended connections (a hiring algorithm once decided that being named “Thomas” was a factor in being successful at a particular company) and building in existing biases into its understanding of the world.

This process turns it on its head — put rules, even complex ones, into a system and ask the machine to use its facility with language to interpret whether examples fit into then or not.

My quick experiment showed there’s some real promise here. In particular, it “understood” that a woman coming from a synagogue could be Jewish, and that people tearing down posters of Israelis held in Gaza might be biased against Jews. And that, equally, those motivations might not apply to an attack on a Black woman. That’s pretty nuanced.

So I took it another step, to see how well it might handle even more complex instructions. I took the Trans Journalists Association style and coverage guidelines, as well as a few other documents on misinformation, and asked a bot to give feedback about (real) stories on the level of misinformation and misframing about trans issues in them — a much more involved exercise that required the bot to not only assess what information was in the piece, but also what context might be missing.

For example, I fed in a long piece focused on “detransition,” when trans people cease or reverse their transition. The article was mostly accurate in terms of facts, but the bot called out some contextual omissions, among them:

Not bad. (To be clear, you don’t have to agree with the TJA’s guidelines; the broader point is that bot knows how to read and interpret rules it’s given.)

And there are other potential use cases. Another student I know is looking at violations of New York City Housing Authority guidelines, and plans to see if a bot can accurately flag when a report identifies a potential issue, and if so, which rule it falls afoul of.

All of which to say — here’s another potential use for generative AI that focuses on a few of its key capabilities, rather than try to make it a Know Everything machine.

Room for Disagreement

These systems may actually work better and faster than human journalists at sifting through information to find patterns. But can you really ask a reader to trust you — or a policy-maker to make policy based on your findings — if you can’t explain them? Maybe we will, at some point, develop faith in the remarkably close approximations LLMs can make. But part of journalism is a contract with the audience, and I’m not sure the audience has signed on to this.

Another major issue in trying to build industrial-strength AI agents, at least off publicly available systems like OpenAI’s Plus plan, is the inconsistency of responses. The LLMs are constantly evolving in response to user inputs, and that can mean different results on different days. There are some ways around this, including setting off a series of bots on the same task and comparing the results — which is also how Mechanical Turk exercises are often also run — but it adds a level of fuzziness to any experiment.

That said, these bots aren’t meant to give definitive answers, but to help journalists sift through large amounts of information so they can focus on the subset most likely to be helpful. At the end of the day, both the practice of journalism — and audiences — will require the involvement of humans at the end of the process.

Notable

Columbia Journalism Review published a long report on the impact of AI on journalism, concluding, among other things, that “as with any new technology entering the news, the effects of AI will neither be as dire as the doomsayers predict nor as utopian as the enthusiasts hope.”
The Guardian profiled a 300-plus-year-old newspaper that has embraced AI tools in its newsgathering and production processes.
Semafor recently announced the launch of a new news feed, called Signals, that uses tools from Microsoft and OpenAI to help extend the reach and breadth of its journalists.

Yahoo Sports
NFL Draft: Packers fan upset with team's 1st pick, and Lions fans hilariously rubbed it in
Not everyone was thrilled with their team's draft on Thursday night.
2d ago
Yahoo Sports
NFL to allow players to wear protective Guardian Caps in games beginning with 2024 season
The NFL will allow players to wear protective Guardian Caps during games beginning with the 2024 season. The caps were previously mandated for practices.
1d ago
Yahoo Sports
Michael Penix Jr. said Kirk Cousins called him after Falcons' surprising draft selection
Atlanta Falcons first-round draft pick Michael Penix Jr. said quarterback Kirk Cousins called him after he was picked No. 8 overall in one of the 2024 NFL Draft's more puzzling selections.
21h ago
Yahoo Sports
Korey Cunningham, former NFL lineman, found dead in New Jersey home at age 28
Cunningham played 31 games in the NFL with the Cardinals, Patriots and Giants.
1d ago
Yahoo Sports
Panthers owner David Tepper stopped by Charlotte bar that criticized his draft strategy
“Please Let The Coach & GM Pick This Year" read a sign out front.
1d ago
Yahoo Sports
NBA playoffs: Tyrese Hailburton game-winner and potential Damian Lillard Achilles injury leaves Bucks in nightmare
Tyrese Haliburton hit a floater with 1.1 seconds left in overtime to give the Indiana Pacers a 121–118 win over the Milwaukee Bucks. The Pacers lead their first-round playoff series two games to one.
21h ago
Yahoo Sports
Based on the odds, here's what the top 10 picks of the NFL Draft will be
What would a mock draft look like using just betting odds?
5d ago
Yahoo Sports
Luka makes Clippers look old, Suns are in big trouble & a funeral for Lakers | Good Word with Goodwill
Vincent Goodwill and Tom Haberstroh break down last night’s NBA Playoffs action and preview several games for tonight and tomorrow.
3d ago
Yahoo Sports
Fantasy Baseball Waiver Wire: Widely available players ready to help your squad
Andy Behrens has a fresh batch of priority pickups for fantasy managers looking to close out the week in strong fashion.
1d ago
Yahoo Sports
Dave McCarty, player on 2004 Red Sox championship team, dies 1 week after team's reunion
The Red Sox were already mourning the loss of Tim Wakefield from that 2004 team.
7d ago
Autoblog
These are the slowest-selling new cars of 2024
iSeeCars cited value and compelling products as drivers for fast-selling car brands' success, and some are doing much better than others.
1d ago
Autoblog
UPS and FedEx find it harder to replace gas guzzlers than expected
Shipping companies like UPS and FedEx are facing uncertainty in U.S. supplies of big, boxy electric step vans they need to replace their gas guzzlers.
2d ago
Yahoo Sports
Jackson Holliday sent back to Triple-A after struggling in first 10 games with Orioles
Holliday batted .059 in 34 at-bats after being called up April 10.
1d ago
Yahoo Sports
Arch Manning dominates in the Texas spring game, and Jaden Rashada enters the transfer portal
Dan Wetzel, Ross Dellenger & SI’s Pat Forde react to the huge performance this weekend by Texas QB Arch Manning, Michigan and Notre Dame's spring games, Jaden Rashada entering the transfer portal, and more
5d ago
Yahoo Sports
Chiefs make Andy Reid NFL's highest-paid coach, sign president Mark Donovan, GM Brett Veach to extensions
Reid's deal reportedly runs through 2029 and makes him the highest-paid coach in the NFL.
5d ago
Yahoo Sports
NBA playoffs: Who's had the most impressive start to the postseason? Most surprising?
Our NBA writers weigh in on the first week of the playoffs and look ahead to what they're watching as the series shift to crucial Game 3s.
2d ago
Yahoo Sports
Yankees' Nestor Cortés told by MLB his pump-fake pitch is illegal
Cortés' attempt didn't fool Andrés Giménez, who fouled off the pitch.
7d ago
Autoblog
These are the cars being discontinued for 2024 and beyond
As automakers shift to EVs, trim the fat on their lineups and cull slow-selling models, these are the vehicles we expect to die off soon.
4d ago
Yahoo Sports
During 2024 NFL Draft coverage, Nick Saban admits Alabama wanted Toledo CB Quinyon Mitchell to transfer to the Tide
Mitchell went No. 22 to the Philadelphia Eagles.
1d ago
Yahoo Sports
Could Scottie Scheffler win the grand slam? It's not likely, but it's absolutely possible
There's a long way to go, but Scottie Scheffler has a chance to do what Nicklaus, Palmer and Woods never did.
1d ago

News

Life

Entertainment

Finance

Sports

New on Yahoo

Can AI count hate crimes?

The News

Know More

Gina’s view

Room for Disagreement

Notable

Recommended Stories

NFL Draft: Packers fan upset with team's 1st pick, and Lions fans hilariously rubbed it in

NFL to allow players to wear protective Guardian Caps in games beginning with 2024 season

Michael Penix Jr. said Kirk Cousins called him after Falcons' surprising draft selection

Korey Cunningham, former NFL lineman, found dead in New Jersey home at age 28

Panthers owner David Tepper stopped by Charlotte bar that criticized his draft strategy

NBA playoffs: Tyrese Hailburton game-winner and potential Damian Lillard Achilles injury leaves Bucks in nightmare

Based on the odds, here's what the top 10 picks of the NFL Draft will be

Luka makes Clippers look old, Suns are in big trouble & a funeral for Lakers | Good Word with Goodwill

Fantasy Baseball Waiver Wire: Widely available players ready to help your squad

Dave McCarty, player on 2004 Red Sox championship team, dies 1 week after team's reunion

These are the slowest-selling new cars of 2024

UPS and FedEx find it harder to replace gas guzzlers than expected

Jackson Holliday sent back to Triple-A after struggling in first 10 games with Orioles

Arch Manning dominates in the Texas spring game, and Jaden Rashada enters the transfer portal

Chiefs make Andy Reid NFL's highest-paid coach, sign president Mark Donovan, GM Brett Veach to extensions

NBA playoffs: Who's had the most impressive start to the postseason? Most surprising?

Yankees' Nestor Cortés told by MLB his pump-fake pitch is illegal

These are the cars being discontinued for 2024 and beyond

During 2024 NFL Draft coverage, Nick Saban admits Alabama wanted Toledo CB Quinyon Mitchell to transfer to the Tide

Could Scottie Scheffler win the grand slam? It's not likely, but it's absolutely possible