Move Over, Mathematicians, Here Comes AlphaProof

Siobhan Roberts

July 28, 2024 at 11:32 AM·7 min read

AlphaGeometry 2 solved Problem 4 of the Olympiad in 19 seconds, after receiving the formalized statement of the problem in AlphaGeometry’s own language. (Google DeepMind via The New York Times)

At the headquarters of Google DeepMind, an artificial intelligence laboratory in London, researchers have a long-standing ritual for announcing momentous results: They bang a big ceremonial gong.

In 2016, the gong sounded for AlphaGo, an AI system that excelled at the game Go. In 2017, the gong reverberated when AlphaZero conquered chess. On each occasion, the algorithm had beaten human world champions.

Last week the DeepMind researchers got out the gong again to celebrate what Alex Davies, a lead of Google DeepMind’s mathematics initiative, described as a “massive breakthrough” in mathematical reasoning by an AI system. A pair of Google DeepMind models tried their luck with the problem set in the 2024 International Mathematical Olympiad, or IMO, held from July 11 to July 22 about 100 miles west of London at the University of Bath. The event is said to be the premier math competition for the world’s “brightest mathletes,” according to a promotional post on social media.

The human problem-solvers — 609 high school students from 108 countries — won 58 gold medals, 123 silver and 145 bronze. The AI performed at the level of a silver medalist, solving 4 of 6 problems for a total of 28 points. It was the first time that AI has achieved a medal-worthy performance on an Olympiad’s problems.

“It’s not perfect, we didn’t solve everything,” Pushmeet Kohli, Google DeepMind’s vice president of research, said in an interview. “We want to be perfect.”

Nonetheless, Kohli described the result as a “phase transition” — a transformative change — “in the use of AI in mathematics and the ability of AI systems to do mathematics.”

The lab asked two independent experts to adjudicate the AI’s performance: Timothy Gowers, a mathematician at the University of Cambridge in England and a Fields medalist, who has been interested in the math-AI interplay for 25 years; and Joseph Myers, a software developer in Cambridge. Both won IMO gold in their day. Myers was chair of this year’s problem selection committee and at previous Olympiads served as a coordinator, judging human solutions. “I endeavored to assess the AI attempts consistently with how human attempts were judged this year,” he said.

Gowers added in an email: “I was definitely impressed.” The lab had discussed its Olympiad ambitions with him a couple of weeks beforehand, so “my expectations were quite high,” he said. “But the program met them, and in one or two instances significantly surpassed them.” The program found the “magic keys” that unlocked the problems, he said.

Hitting the gong

After months of rigorous training, the students sat for two exams, three problems per day — in algebra, combinatorics, geometry and number theory.

The AI counterpart beavered away roughly in tandem at the lab in London. (The students were not aware that Google DeepMind was competing, in part because the researchers did not want to steal the spotlight.) Researchers moved the gong into the room where they had gathered to watch the system work. “Every time the system solved a problem, we hit the gong to celebrate,” David Silver, a research scientist, said.

Haojia Shi, a student from China, ranked No. 1 and was the only competitor to earn a perfect score — 42 points for six problems; each problem is worth seven points for a full solution. The U.S. team won first place with 192 points; China placed second with 190.

The Google system earned its 28 points for fully solving four problems — two in algebra, one in geometry and one in number theory. (It flopped at two combinatorics problems.) The system was allowed unlimited time; for some problems, it took up to three days. The students were allotted only 4.5 hours per exam.

For the Google DeepMind team, speed is secondary to overall success, as it “is really just a matter of how much compute power you’re prepared to put into these things,” Silver said.

“The fact that we’ve reached this threshold, where it’s even possible to tackle these problems at all, is what represents a step-change in the history of mathematics,” he added. “And hopefully it’s not just a step-change in the IMO, but also represents the point at which we went from computers only being able to prove very, very simple things toward computers being able to prove things that humans can’t.”

Algorithmic ingredients

Applying AI to mathematics has been part of DeepMind’s mission for several years, often in collaboration with world-class research mathematicians.

“Mathematics requires this interesting combination of abstract, precise and creative reasoning,” Davies said. In part, he noted, this repertoire of abilities is what makes math a good litmus test for the ultimate goal: reaching so-called artificial general intelligence, or AGI, a system with capabilities ranging from emerging to competent to virtuoso to superhuman. Companies such as OpenAI, Meta AI and xAI are tracking similar goals.

Olympiad math problems have come to be considered a benchmark.

In January, a Google DeepMind system named AlphaGeometry solved a sampling of Olympiad geometry problems at nearly the level of a human gold medalist. “AlphaGeometry 2 has now surpassed the gold medalists in solving IMO problems,” Thang Luong, the principal investigator, said in an email.

Riding that momentum, Google DeepMind intensified its multidisciplinary Olympiad effort, with two teams: one led by Thomas Hubert, a research engineer in London, and another led by Luong and Quoc Le in Mountain View, each with some 20 researchers. For his “superhuman reasoning team,” Luong said he recruited a dozen IMO medalists — “by far the highest concentration of IMO medalists at Google!”

The lab’s strike at this year’s Olympiad deployed the improved version of AlphaGeometry. Not surprisingly, the model fared rather well on the geometry problem, polishing it off in 19 seconds.

Hubert’s team developed a new model that is comparable but more generalized. Named AlphaProof, it is designed to engage with a broad range of mathematical subjects. All told, AlphaGeometry and AlphaProof made use of a number of different AI technologies.

One approach was an informal reasoning system, expressed in natural language. This system leveraged Gemini, Google’s large language model. It used the English corpus of published problems and proofs and the like as training data.

The informal system excels at identifying patterns and suggesting what comes next; it is creative and talks about ideas in an understandable way. Of course, large language models are inclined to make things up — which may (or may not) fly for poetry and definitely not for math. But in this context, the LLM seems to have displayed restraint; it wasn’t immune to hallucination, but the frequency was reduced.

Another approach was a formal reasoning system, based on logic and expressed in code. It used theorem prover and proof-assistant software called Lean, which guarantees that if the system says a proof is correct, then it is indeed correct. “We can exactly check that the proof is correct or not,” Hubert said. “Every step is guaranteed to be logically sound.”

Another crucial component was a reinforcement learning algorithm in the AlphaGo and AlphaZero lineage. This type of AI learns by itself and can scale indefinitely, said Silver, who is Google DeepMind’s vice-president of reinforcement learning. Since the algorithm doesn’t require a human teacher, it can “learn and keep learning and keep learning until ultimately it can solve the hardest problems that humans can solve,” he said. “And then maybe even one day go beyond those.”

Hubert added, “The system can rediscover knowledge for itself.” That’s what happened with AlphaZero: It started with zero knowledge, Hubert said, “and by just playing games, and seeing who wins and who loses, it could rediscover all the knowledge of chess. It took us less than a day to rediscover all the knowledge of chess, and about a week to rediscover all the knowledge of Go. So, we thought, let’s apply this to mathematics.”

Gowers doesn’t worry — too much — about the long-term consequences. “It is possible to imagine a state of affairs where mathematicians are basically left with nothing to do,” he said. “That would be the case if computers became better, and far faster, at everything that mathematicians currently do.”

“There still seems to be quite a long way to go before computers will be able to do research-level mathematics,” he added. “It’s a fairly safe bet that if Google DeepMind can solve at least some hard IMO problems, then a useful research tool can’t be all that far away.”

A really adept tool might make mathematics accessible to more people, speed up the research process, nudge mathematicians outside the box. Eventually it might even pose novel ideas that resonate.

c.2024 The New York Times Company

TechCrunch
Watch a robot navigate the Google DeepMind offices using Gemini
Generative AI has already shown a lot of promise in robots. Applications include natural language interactions, robot learning, no-code programming and even design. Google’s DeepMind Robotics team this week is showcasing another potential sweet spot between the two disciplines: navigation.
Yahoo Sports
Mercedes shows it's back near the top of Formula 1 despite George Russell's Belgian Grand Prix DQ
Mercedes drivers have won three of the last four races.
Yahoo Sports
NFL training camp 2024: Tua talks 'life-changing' $212M contract for first time
Follow every development these next few weeks with Yahoo Sports before the NFL regular season starts.
TechCrunch
MatPat, the first big YouTuber to successfully exit his company, is lobbying for creators on Capitol Hill
Matthew Patrick’s most unique achievement isn’t that he grew his YouTube account into a multi-channel media business with over 40 million subscribers. In an emotional video posted earlier this year, Patrick – known as MatPat online – shocked viewers with his decision to step away from his illustrious career. When other YouTubers quit, they usually just stop uploading videos and move onto other pursuits.
Yahoo Life Shopping
'Perfect for my trip to Disney': This bestselling waist pack is on sale for $10
With a four-liter volume and four zipper pockets, you can carry all your essentials, hands-free.
Yahoo Finance
'Get on with it': Why the Fed's first rate cut wouldn't be that big of a deal
Inflation metrics have been coming down for two years and are now at levels that are just a rounding error away from the Fed’s 2% target rate.
Yahoo Finance
For the millions turning 65 this year, here’s what to know about navigating Medicare
Medicare coverage begins for most Americans at 65 who are not actively covered by an employer-provided healthcare plan. There are lots of complicated rules to know before you sign up.
Yahoo Life Shopping
'No sweat': Bali's beloved cooling bra is a mere $17 — that's 65% off
This full-coverage bra isn't just lightweight — it's made with a moisture-wicking "cool comfort" fabric that provides more built-in ventilation.
Yahoo Life Shopping
'Literally the comfiest shoes': These buzzy Dr. Scholl's sneakers are 30% off
These kicks went viral after a TikTok user wore them for 50,000 steps with no pain.
Yahoo Sports
2024 Hall of Fame: Randy Gradishar, 'Orange Crush' star, laid foundation for great Denver defenses to come
Gradishar waited a long time to get into Canton, which may have had something to do with the fact that stats such as tackles and sacks weren't official when Gradishar began playing.
Yahoo Life Shopping
The 10 most flattering tummy-control swimsuits now at Amazon — all under $40
Make a splash in these fan-favorite one-pieces.
Yahoo Life Shopping
Kyle Richards, 55, uses this thickening spray for fuller hair — it's down to $12
"I will never not use that again," said the RHOBH star.
Yahoo Life Shopping
'Sneakers look like new': 'Shark Tank' sponges will save your soles — just $12 for a 10-pack
These No. 1 bestsellers easily return grimy shoes to their fresh-from-the-box glory, fans say.
Yahoo Sports
2024 Paris Olympics Soccer: How to watch the USWNT vs. Germany today
Here's how to watch this weekend's USWNT game vs. Germany at the Paris 2024 Olympics.
Autoblog
The best YETI bags of 2024
YETI combines style and functionality with their bags. They have a lot to choose from with different features that make them stand out.
Yahoo Personal Finance
Savings interest rates today, July 28, 2024 (best accounts offering 5.50% APY)
If you’re searching for today’s best savings interest rates, we’ve narrowed down some of the top offers. Learn more about savings interest rates today.
Yahoo Sports
Australian female swimmers at the 2024 Olympics are now faster than Mark Spitz
The comparisons of contemporary women’s times to historical men’s times are a fascinating window into the progression of the sport of swimming.
Yahoo Sports
UFC 304 takeaways: Tom Aspinall earns shot at Jon Jones, Belal Muhammad deserves respect
To stick with the Jones vs. Miocic plan now would be all the way ludicrous.
Yahoo Life Shopping
I wore this $13 belt bag at Disney World and it made my visit so much easier
This popular waist pack boasts a secret pocket on the back that provides a safe haven for your phone or credit cards and ID.
Yahoo Sports
Team USA vs. Serbia: How to watch the first USA Men's Basketball game of the 2024 Olympics today
How to watch Team USA's first game of the 2024 Paris Olympics against Serbia live.

News

Life

Entertainment

Finance

Sports

New on Yahoo

Move Over, Mathematicians, Here Comes AlphaProof

Recommended Stories

Watch a robot navigate the Google DeepMind offices using Gemini

Mercedes shows it's back near the top of Formula 1 despite George Russell's Belgian Grand Prix DQ

NFL training camp 2024: Tua talks 'life-changing' $212M contract for first time

MatPat, the first big YouTuber to successfully exit his company, is lobbying for creators on Capitol Hill

'Perfect for my trip to Disney': This bestselling waist pack is on sale for $10

'Get on with it': Why the Fed's first rate cut wouldn't be that big of a deal

For the millions turning 65 this year, here’s what to know about navigating Medicare

'No sweat': Bali's beloved cooling bra is a mere $17 — that's 65% off

'Literally the comfiest shoes': These buzzy Dr. Scholl's sneakers are 30% off

2024 Hall of Fame: Randy Gradishar, 'Orange Crush' star, laid foundation for great Denver defenses to come

The 10 most flattering tummy-control swimsuits now at Amazon — all under $40

Kyle Richards, 55, uses this thickening spray for fuller hair — it's down to $12

'Sneakers look like new': 'Shark Tank' sponges will save your soles — just $12 for a 10-pack

2024 Paris Olympics Soccer: How to watch the USWNT vs. Germany today

The best YETI bags of 2024

Savings interest rates today, July 28, 2024 (best accounts offering 5.50% APY)

Australian female swimmers at the 2024 Olympics are now faster than Mark Spitz

UFC 304 takeaways: Tom Aspinall earns shot at Jon Jones, Belal Muhammad deserves respect

I wore this $13 belt bag at Disney World and it made my visit so much easier

Team USA vs. Serbia: How to watch the first USA Men's Basketball game of the 2024 Olympics today