Home/Newsletter/Weekly AI Signal (Copy)
Newsletter IssueMay 15, 2026

Weekly AI Signal (Copy)

This week, I kept coming back to one number: 77.1%. That's how Google's Gemini 3.1 Pro scored on ARC-AGI-2 — the benchmark François Chollet designed specifically to resist memorisation. Every other model had hit a ceiling. Gemini didn't. And while the labs were shipping, Pakistan quietly dropped two of the most significant AI workforce announcements in the country's history.

Key Highlights

  • Google's Gemini just hit 77% on ARC-AGI-2, Plus: Pakistan's 20K AI programmes, Claude Code goes autonomous
Newsletter cover image

Weekly AI Signal

Author image

Gul Jabeen

Read this article on LinkedIn to join the conversation

Read on LinkedIn

by techwithgul

Issue #13

Subject: Google's Gemini just hit 77% on ARC-AGI-2, Plus: Pakistan's 20K AI programmes, Claude Code goes autonomous


This week, I kept coming back to one number: 77.1%. That's how Google's Gemini 3.1 Pro scored on ARC-AGI-2 — the benchmark François Chollet designed specifically to resist memorisation. Every other model had hit a ceiling. Gemini didn't. And while the labs were shipping, Pakistan quietly dropped two of the most significant AI workforce announcements in the country's history. When I look at both together, I see the same thing: the gap between countries and companies that are building AI infrastructure now versus those waiting to see how it plays out. That gap is closing fast — but only for one side.


The Big Story

Google's Gemini 3.1 Pro Just Scored 77% on the Benchmark Designed to Beat AI

May 6, 2026

For the last two years, ARC-AGI-2 has been the benchmark that humbled every frontier model. François Chollet built it specifically to test fluid reasoning — the ability to recognise entirely new patterns from scratch, not retrieve answers from training data. GPT-4o managed around 4%. Claude 3.5 Sonnet topped out near 21%. The frontier felt stuck.

Then Google released Gemini 3.1 Pro, and it scored 77.1%.

This is not a small increment. This is a different category. The model achieves this through genuinely improved reasoning architecture — not larger context or more parameters — and it arrives alongside two features that matter enormously for builders.

...

Related Articles

Share This Issue