Benchmarks
GPT-5.5 logo

openai

GPT-5.5

5/5 at $1.21. Worth it when latency-vs-cost on the long-tail edge cases matters more than the median.

5/5
benchmarks passed
$1.21
spent in total
19
tool calls

What it is.

OpenAI's GPT-5.5 — the November 2026 thinking-heavy successor to GPT-5. The reasoning-tier OpenAI model most US enterprise agent stacks default to today.

What it does well.

  • Five for five with the most variance-free run profile in the matrix. Two tool calls on Coding, two on Web Scraping, 3-7 on the others. Doesn't get distracted.
  • Strongest performance on the Marketing rubric: ranked all five tools with mention counts and a one-line reason per item, no preamble.
  • Per-cell cost is high but never wild. The most expensive single call was $0.435 (Marketing) — Sonnet's Marketing call was $1.82.

Where to be careful.

  • It's a thinking model and it bills like one. ~3x the per-task cost of GPT-5 Mini for the same outcomes on Coding and Web Scraping.
  • We didn't observe schema drift, but GPT-5.5's instinct is to reach for prose explanations when the task is ambiguous — the strict-JSON wrapper is doing real work here.
  • Same short-horizon, no-adversarial caveats as the rest of the matrix.

Results by agent

Five real jobs.

SalesSales Research Analyst

Get financials on a company

Passed

Find Hightouch on Crunchbase and return their total funding, last round type, and a one-line description.

Seven tool calls, $0.105. Used Google to verify Crunchbase's slug, paged through to confirm the Series E, returned cleanly.

$0.1047 tool calls
MarketingMarketing Research Analyst

Find what customers are recommending on Reddit

Passed

Read a real r/sales thread on enrichment tools and rank the top 5 by how many people recommended them.

$0.435 — high but not wild. Three tool calls only; the cost is reasoning, not retrieval.

$0.4353 tool calls
Passed

Pull Google Shopping reviews for AirPods Pro 2 (USB-C) and return the top 3 recurring complaints, with verbatim quotes.

Five tool calls, $0.348. Pulled reviews, grouped by recurrence, picked verbatim snippets.

$0.3485 tool calls
CodingSenior Engineer Assistant

Read API docs and write working code

Passed

Read Stripe's official docs and write a real, working webhook-verification function in TypeScript.

Two tool calls, $0.266. Read Stripe's webhook docs, wrote the function with proper HMAC and timing-safe compare. Schema-clean.

$0.2662 tool calls
Web ScrapingSales Outreach Specialist

Scrape a competitor's pricing page

Passed

Scrape Apollo.io's pricing page and return every tier (Free, Basic, Professional, Organization) with name, price, and top 3 features.

Two tool calls, $0.054. Apollo tiers extracted, JSON returned.

$0.0532 tool calls