100% free AI setup guides — no credit card needed
← AI News
March 5, 2026✦ Featuredopenaigpt-5computer-use

GPT-5.4 Sets New Records for Computer Use — AI That Operates Your PC

Released March 5, 2026, GPT-5.4 achieved record scores on every major computer-use benchmark. It can browse the web, fill forms, and operate software autonomously. Here's what that actually means in practice.

What happened?

On March 5, 2026, OpenAI shipped GPT-5.4 — a focused update to GPT-5 that pushed computer-use capabilities far beyond any previous model.

Key benchmark results at launch:

  • OSWorld-Verified: Record score — surpassing all prior models including Claude Sonnet 4.5
  • WebArena Verified: Record score on automated web navigation tasks
  • GDPval: 83% on OpenAI's internal test for knowledge work tasks — the first time any model crossed 80%
  • Context window: Expanded to 272,000 tokens from GPT-5's original 128K

GPT-5.4 can operate a computer the way a human does — clicking buttons, navigating browsers, reading screens, filling out forms, and executing multi-step workflows across applications. It doesn't just generate code; it can run the code, see the result on screen, and fix errors in a loop.

OpenAI also shipped GPT-5.5 Instant alongside it — a 400K token context window model with reasoning capabilities priced for high-volume API workloads.

Why does it matter?

Computer use is the threshold where AI transitions from answering questions to getting things done on your behalf.

The implications of an 83% GDPval score are significant. GDPval tests whether a model can complete real knowledge work tasks: researching topics, synthesizing documents, drafting reports, filling spreadsheets, sending structured outputs to systems. At 83%, GPT-5.4 can reliably complete most office tasks without human intervention.

This is the beginning of a genuine agentic economy:

  • Customer support workflows that don't require a human in the loop
  • Data entry and form processing that runs overnight autonomously
  • Software QA agents that can actually use the UI, not just call APIs
  • Personal productivity agents that manage calendars, email, and documents

The model has limitations — it still makes errors on complex visual tasks and struggles with CAPTCHAs by design — but the trajectory is clear.

Should you switch?

For agentic and automation tasks: yes, GPT-5.4 is now the benchmark.

Claude Sonnet 4.5 held the computer-use crown through most of 2025 (61.4% on OSWorld). GPT-5.4 has surpassed it on that specific benchmark. For teams building automation pipelines that interact with UIs, this matters.

For pure coding and long-context reasoning: Claude Fable 5 / Sonnet 4.6 still lead. This isn't a "GPT wins everything" moment — the models trade leadership on different dimensions.

Practical advice:

  • Building a web scraping / form automation agent → evaluate GPT-5.4 first
  • Building a code review or refactoring agent → evaluate Claude Fable 5 first
  • High-volume API apps → GPT-5.5 Instant's pricing is designed for this

Who should care?

Developers
Businesses
AI agents builders
Automation engineers
Enterprises

Related Stories