March 5, 2026✦ Featuredopenaigpt-5computer-use

GPT-5.4 Sets New Records for Computer Use — AI That Operates Your PC

Released March 5, 2026, GPT-5.4 achieved record scores on every major computer-use benchmark. It can browse the web, fill forms, and operate software autonomously. Here's what that actually means in practice.

◎ What happened?

On March 5, 2026, OpenAI shipped GPT-5.4 — a focused update to GPT-5 that pushed computer-use capabilities far beyond any previous model.

Key benchmark results at launch:

OSWorld-Verified: Record score — surpassing all prior models including Claude Sonnet 4.5
WebArena Verified: Record score on automated web navigation tasks
GDPval: 83% on OpenAI's internal test for knowledge work tasks — the first time any model crossed 80%
Context window: Expanded to 272,000 tokens from GPT-5's original 128K

GPT-5.4 can operate a computer the way a human does — clicking buttons, navigating browsers, reading screens, filling out forms, and executing multi-step workflows across applications. It doesn't just generate code; it can run the code, see the result on screen, and fix errors in a loop.

OpenAI also shipped GPT-5.5 Instant alongside it — a 400K token context window model with reasoning capabilities priced for high-volume API workloads.

◈ Why does it matter?

Computer use is the threshold where AI transitions from answering questions to getting things done on your behalf.

The implications of an 83% GDPval score are significant. GDPval tests whether a model can complete real knowledge work tasks: researching topics, synthesizing documents, drafting reports, filling spreadsheets, sending structured outputs to systems. At 83%, GPT-5.4 can reliably complete most office tasks without human intervention.

This is the beginning of a genuine agentic economy:

Customer support workflows that don't require a human in the loop
Data entry and form processing that runs overnight autonomously
Software QA agents that can actually use the UI, not just call APIs
Personal productivity agents that manage calendars, email, and documents

The model has limitations — it still makes errors on complex visual tasks and struggles with CAPTCHAs by design — but the trajectory is clear.

◇ Should you switch?

For agentic and automation tasks: yes, GPT-5.4 is now the benchmark.

Claude Sonnet 4.5 held the computer-use crown through most of 2025 (61.4% on OSWorld). GPT-5.4 has surpassed it on that specific benchmark. For teams building automation pipelines that interact with UIs, this matters.

For pure coding and long-context reasoning: Claude Fable 5 / Sonnet 4.6 still lead. This isn't a "GPT wins everything" moment — the models trade leadership on different dimensions.

Practical advice:

Building a web scraping / form automation agent → evaluate GPT-5.4 first
Building a code review or refactoring agent → evaluate Claude Fable 5 first
High-volume API apps → GPT-5.5 Instant's pricing is designed for this

✦ Who should care?

✅Developers

✅Businesses

✅AI agents builders

✅Automation engineers

✅Enterprises

Source ↗ https://openai.com/index/gpt-5/

GPT-5.4 Sets New Records for Computer Use — AI That Operates Your PC

◎ What happened?

◈ Why does it matter?

◇ Should you switch?

✦ Who should care?

Related Stories

Anthropic Launches Claude 5 — Then Immediately Blocks Its Most Powerful Version

MCP Hits 97 Million Installs — Agentic AI Has a Universal Language