AI Logo
AI Exporter Hub
AI News

Google's Gemini 3.1 Pro: The AI That Builds Windows 11, Generates SVG Animations, and Beats Claude in 12 Benchmarks

J
Jack
2026年2月20日
Google Gemini Gemini 3.1 Pro AI Models SVG Animation Benchmarks
Google's Gemini 3.1 Pro: The AI That Builds Windows 11, Generates SVG Animations, and Beats Claude in 12 Benchmarks

Google just released Gemini 3.1 Pro, and the internet is losing its mind.

Why? Because this model can generate a fully functional Windows 11 WebOS in a single prompt, create photorealistic SVG animations of pelicans riding bicycles, and beat Claude Opus 4.6 and GPT-5.2 across 12 major benchmarks.

And here’s the kicker: the price stays the same as Gemini 3 Pro.

Let’s break down what makes this release so significant.

The Headline Numbers

Google DeepMind released Gemini 3.1 Pro early this morning (February 20, 2026), and the benchmark results are impressive:

  • 12 first-place finishes across major evaluations
  • 77.1% on ARC-AGI-2 (the notoriously difficult general intelligence benchmark)
  • Beats Claude Opus 4.6, Claude Sonnet 4.6, GPT-5.2, and GPT-5.3-Codex on key tests
  • Double the performance of Gemini 3 Pro on reasoning tasks

But benchmarks only tell part of the story. The real test is what people are building with it.

The “Holy Sh*t” Demos

1. One-Shot Windows 11 WebOS

AI influencer Chetaslua posted a video showing Gemini 3.1 Pro generating a complete Windows 11 WebOS in a single prompt.

Not a mockup. Not a static screenshot. A working, interactive operating system running in a browser.

The generated system includes:

  • Complete application icons
  • Start menu with proper layout
  • Window management and interaction logic
  • Basic system-level applications

Chetaslua’s reaction: “Last time I shared something like this, it was incredibly difficult. Now it’s becoming routine. With agentic systems, we can do almost anything with this model.”

He also posted a comparison video showing Gemini 3.0 Pro’s attempt at the same task. The difference is stark. The 3.0 version produced a bare-bones interface with missing desktop interactions and system apps. The 3.1 version looks like an actual lightweight OS you could use.

2. SVG Animation That Actually Looks Good

Google’s been pushing SVG generation as a killer feature, and Gemini 3.1 Pro delivers.

The classic test case: “Generate an SVG animation of a pelican riding a bicycle.”

Gemini 3 Pro’s result: A pelican-shaped blob on something vaguely bicycle-like. The proportions are off, the physics don’t make sense, and the animation is janky.

Gemini 3.1 Pro’s result: A properly proportioned pelican with realistic body structure, natural riding posture, and a complete bicycle with frame, chain, pedals, and seat. The animation is smooth, the physics make sense, and it actually looks like a scene you’d see in an animated film.

Jiao Sun, the Tsinghua alumnus who developed the SVG generation feature for Gemini 3.1, posted on X: “Incredibly proud.”

Why SVG matters:

Unlike traditional video or raster images, SVG animations are built with pure code. This means:

  • They stay sharp at any size
  • File sizes are tiny compared to video
  • They’re easy to edit and customize
  • They can be generated from text descriptions alone

Gemini 3.1 Pro can generate SVG animations for:

  • A frog riding a penny-farthing bicycle
  • A giraffe driving a tiny car
  • An ostrich on roller skates

And in every case, the results are more detailed, more physically plausible, and more visually appealing than previous models.

3. A Minecraft-Style Voxel World in Your Browser

Another developer used Gemini 3.1 Pro to generate a complete VoxelWeb project—a Minecraft-style 3D sandbox that runs directly in the browser.

The generated code includes:

  • Start button and UI controls
  • Movement controls
  • Block interaction logic
  • Basic crafting system

It’s not just a tech demo. It’s a functional prototype of a lightweight sandbox game, generated from a text prompt.

4. Visual Illusion Detection

One user tested Gemini 3.1 Pro’s “AgenticVision” capabilities with a tricky image: a photo of a street trash can.

The model didn’t just identify the trash can. It went further:

“When you squint or view this from a distance, the trash, shadows, and contours visually combine to form two cartoon characters sitting side by side.”

Then it broke down the illusion step by step, explaining how different pieces of fabric, trash bags, and shadows correspond to the characters’ heads, bodies, and outlines.

This demonstrates multi-step visual reasoning—the ability to see beyond the literal content of an image and understand how visual elements can be reinterpreted.

5. SimCity-Style Urban Planning App

Google’s UX engineer Michael Chang used Gemini 3.1 Pro to build a realistic city planning application.

The model handled:

  • Complex terrain generation
  • Infrastructure mapping
  • Traffic simulation
  • High-quality visualization

The result looks like a professional urban planning tool, not a quick prototype.

6. Real-Time ISS Orbit Dashboard

One developer asked Gemini 3.1 Pro to build a real-time aerospace dashboard tracking the International Space Station.

The model:

  • Successfully configured public telemetry data streams
  • Visualized the ISS orbital trajectory
  • Created an interactive dashboard with live updates

This required understanding complex APIs, data formats, and visualization libraries—all from a natural language prompt.

7. Interactive 3D Starling Flock Simulation

Gemini 3.1 Pro generated code for a 3D simulation of a flock of starlings (murmuration).

But it didn’t stop there. The model also:

  • Added gesture tracking so users can control the flock with hand movements
  • Generated adaptive background music that changes based on the flock’s dynamics
  • Created an immersive, multi-sensory experience

8. Literary-Themed Portfolio Website

When asked to create a modern portfolio website for Emily Brontë’s Wuthering Heights, Gemini 3.1 Pro:

  • Analyzed the novel’s atmosphere and themes
  • Designed a clean, modern interface that captures the protagonist’s spirit
  • Generated a complete, functional website

This demonstrates the model’s ability to translate abstract literary concepts into concrete design decisions.

The Benchmark Battle

Google tested Gemini 3.1 Pro against the current generation of frontier models: Gemini 3 Pro, Claude Opus 4.6, Claude Sonnet 4.6, GPT-5.2, and GPT-5.3-Codex.

Gemini 3.1 Pro won 12 out of 12 major benchmarks.

Reasoning Tests (Where Gemini 3.1 Pro Dominates)

Humanity’s Last Exam: A complex multidisciplinary reasoning test designed to be extremely difficult for AI. Gemini 3.1 Pro outperformed all competitors.

ARC-AGI-2: The gold standard for general intelligence. Gemini 3.1 Pro scored 77.1%, more than double Gemini 3 Pro’s score and ahead of Claude and GPT models.

GPQA Diamond: A test of graduate-level scientific reasoning. Gemini 3.1 Pro took first place.

Coding Tests (Mixed Results)

SWE-Bench Pro and SWE-Bench Verified: These tests measure end-to-end engineering ability—understanding requirements, locating bugs, modifying code, and ensuring functionality in real projects.

Gemini 3.1 Pro scored relatively lower here, suggesting that while it excels at code generation, it may struggle with the messy realities of production codebases.

GDPval-AA Elo: This benchmark measures performance on high-value knowledge work tasks (finance, legal, etc.). Gemini 3.1 Pro outperformed GPT-5.2 and GPT-5.3-Codex, but came in second to Claude Sonnet 4.6.

Tool Use, Multimodal, and Long Context

Gemini 3.1 Pro took first place in:

  • τ2-bench (tool use)
  • MCP Atlas (tool use)
  • BrowseComp (web browsing and information retrieval)
  • MMLU (multilingual performance)
  • MRCR v2 (long context understanding)

MMMU-Pro (multimodal understanding): Gemini 3.1 Pro beat Claude and GPT models but came in slightly behind Gemini 3 Pro.

Pricing: Performance Up, Price Stays the Same

Google kept pricing identical to Gemini 3 Pro:

For prompts ≤200k tokens:

  • Input: $2 per million tokens (~$0.002 per 1k tokens)
  • Output: $12 per million tokens (~$0.012 per 1k tokens)

For prompts >200k tokens:

  • Input: $4 per million tokens
  • Output: $18 per million tokens

This is a significant value proposition. You’re getting a model that beats Claude Opus 4.6 and GPT-5.2 on most benchmarks, at a price point closer to mid-tier models.

Availability

Starting today (February 20, 2026):

For consumers:

  • Google AI Pro and Ultra subscribers can use Gemini 3.1 Pro in the Gemini app and NotebookLM
  • Free users get 2 queries to Gemini 3.1 Pro

For developers and enterprises:

  • AI Studio
  • Antigravity (Google’s new agentic development platform)
  • Vertex AI
  • Gemini Enterprise
  • Gemini CLI
  • Android Studio (Gemini API preview)

The Team Behind It

Shunyu Yao, a legendary figure from Tsinghua University’s physics department, joined Google DeepMind in September 2025. He announced the new model on X with the comment:

“Better Gemini models are emerging at an unstoppable pace.”

Jiao Sun, another Tsinghua alumnus, developed the SVG generation feature and expressed pride in the results.

What This Means for the AI Industry

Gemini 3.1 Pro’s release highlights a shift in the AI model race.

The focus is moving from general capability comparisons to real-world complex task performance.

It’s no longer enough to score well on benchmarks. Models need to:

  • Handle messy, ambiguous real-world problems
  • Generate production-ready code and designs
  • Understand and manipulate complex visual and spatial information
  • Integrate with existing tools and workflows

Google’s recent acceleration reflects this shift:

  • Last week: Gemini 3 Deep Think model upgrade
  • This week: Gemini 3.1 Pro release

Both updates prioritize professional domain acceleration and solving complex real-world problems.

The implication: AI is moving from “impressive demos” to “core productivity tool in professional domains.”

The Trap Questions Test

We tested Gemini 3.1 Pro with classic trap questions:

Q: “Should I drive or walk to a car wash that’s 100 meters away?” A: Gemini 3.1 Pro correctly identified that you don’t need to drive your car to a car wash—you’re already in it.

Q: “Can my parents get married?” A: Gemini 3.1 Pro correctly explained that if they’re your parents, they’re likely already married (or were at some point).

These seem trivial, but many AI models fail these tests because they lack common-sense reasoning.

Limitations and Caveats

Despite the impressive results, Gemini 3.1 Pro isn’t perfect:

  1. SWE-Bench scores are relatively low, suggesting it may struggle with real-world software engineering tasks that require navigating large, messy codebases.

  2. MMMU-Pro performance is slightly behind Gemini 3 Pro, indicating some trade-offs in multimodal understanding.

  3. We don’t have long-term reliability data yet. Early demos are impressive, but production use will reveal edge cases and failure modes.

  4. The model is still in preview, so expect bugs, rate limits, and potential changes.

How to Get Started

For Developers

Access Gemini 3.1 Pro via the Gemini API:

import google.generativeai as genai

genai.configure(api_key="your-api-key")

model = genai.GenerativeModel('gemini-3.1-pro')

response = model.generate_content("Generate an SVG animation of a pelican riding a bicycle")

print(response.text)

For Gemini CLI Users

Update to the latest version and enable preview features:

# Update Gemini CLI
npm install -g @google/gemini-cli@latest

# Enable preview features
gemini /settings
# Toggle "Preview features" to true

# Select Gemini 3.1 Pro
gemini /model

For Gemini App Users

If you’re a Google AI Pro or Ultra subscriber, Gemini 3.1 Pro is now available in the Gemini app. Just start a new conversation—it’s the default model.

Free users can try it twice to see what the hype is about.

Practical Use Cases

Based on the demos and benchmarks, here’s where Gemini 3.1 Pro excels:

1. Rapid Prototyping

Generate functional prototypes of web apps, games, and interactive experiences from text descriptions.

2. Creative Coding

Translate abstract concepts (literary themes, artistic styles) into working code and designs.

3. SVG Animation and Graphics

Create scalable, code-based animations for websites, presentations, and marketing materials.

4. Complex API Integration

Build dashboards and tools that integrate with complex APIs (aerospace data, financial markets, etc.).

5. Visual Reasoning Tasks

Analyze images for subtle patterns, illusions, and spatial relationships that require multi-step reasoning.

6. Interactive Simulations

Generate physics-based simulations (flocking behavior, traffic patterns, etc.) with user interaction.

7. Long-Context Analysis

Process large documents, codebases, or datasets and extract actionable insights.

The Bottom Line

Gemini 3.1 Pro is a significant leap forward for Google’s AI efforts.

Key takeaways:

  • Beats Claude Opus 4.6 and GPT-5.2 on 12 major benchmarks
  • 77.1% on ARC-AGI-2 (double Gemini 3 Pro’s score)
  • Stunning SVG animation generation that actually looks good
  • One-shot complex project generation (WebOS, games, simulations)
  • Same pricing as Gemini 3 Pro ($2/$12 per million tokens)
  • Available now for Pro/Ultra subscribers and developers

Where it falls short:

  • ⚠️ Lower SWE-Bench scores suggest challenges with real-world software engineering
  • ⚠️ Slightly behind Gemini 3 Pro on MMMU-Pro (multimodal understanding)
  • ⚠️ Still in preview with potential bugs and limitations

Who should use it:

  • Developers building prototypes and MVPs
  • Designers creating interactive experiences
  • Researchers working with complex data
  • Anyone who needs advanced reasoning and multimodal understanding

Who should wait:

  • Teams with mission-critical production systems (wait for stability data)
  • Users who need the absolute best multimodal understanding (Gemini 3 Pro may still be better)
  • Anyone on a tight budget (free tier only gets 2 queries)

What’s Next?

Google’s rapid release cadence (Gemini 3 Deep Think last week, Gemini 3.1 Pro this week) suggests more updates are coming soon.

Shunyu Yao’s comment—“Better Gemini models are emerging at an unstoppable pace”—hints that this is just the beginning.

The AI model race is accelerating, and the focus is shifting from raw capability to real-world utility. Gemini 3.1 Pro is Google’s bet that complex reasoning, creative generation, and practical tool use are the next frontier.

Based on the early demos, they might be right.


Try Gemini 3.1 Pro:

Related Reading:

Stay updated with the latest AI news at ChatGPT2Notion Blog

Want to read more?

Explore our collection of guides and tutorials.

View All Articles