openai

GPT-5.5

5/5 at $1.21. Worth it when latency-vs-cost on the long-tail edge cases matters more than the median.

5/5

benchmarks passed

$1.21

spent in total

tool calls

What it is.

OpenAI's GPT-5.5 — the November 2026 thinking-heavy successor to GPT-5. The reasoning-tier OpenAI model most US enterprise agent stacks default to today.

What it does well.

Five for five with the most variance-free run profile in the matrix. Two tool calls on Coding, two on Web Scraping, 3-7 on the others. Doesn't get distracted.
Strongest performance on the Marketing rubric: ranked all five tools with mention counts and a one-line reason per item, no preamble.
Per-cell cost is high but never wild. The most expensive single call was $0.435 (Marketing) — Sonnet's Marketing call was $1.82.

Where to be careful.

It's a thinking model and it bills like one. ~3x the per-task cost of GPT-5 Mini for the same outcomes on Coding and Web Scraping.
We didn't observe schema drift, but GPT-5.5's instinct is to reach for prose explanations when the task is ambiguous — the strict-JSON wrapper is doing real work here.
Same short-horizon, no-adversarial caveats as the rest of the matrix.

Results by agent

Five real jobs.

SalesSales Research Analyst

Get financials on a company

Passed

Find Hightouch on Crunchbase and return their total funding, last round type, and a one-line description.

Seven tool calls, $0.105. Used Google to verify Crunchbase's slug, paged through to confirm the Series E, returned cleanly.

MarketingMarketing Research Analyst

Find what customers are recommending on Reddit

Passed

Read a real r/sales thread on enrichment tools and rank the top 5 by how many people recommended them.

$0.435 — high but not wild. Three tool calls only; the cost is reasoning, not retrieval.

CXCustomer Insights Analyst

Find what customers are complaining about

Passed

Pull Google Shopping reviews for AirPods Pro 2 (USB-C) and return the top 3 recurring complaints, with verbatim quotes.

Five tool calls, $0.348. Pulled reviews, grouped by recurrence, picked verbatim snippets.

CodingSenior Engineer Assistant

Read API docs and write working code

Passed

Read Stripe's official docs and write a real, working webhook-verification function in TypeScript.

Two tool calls, $0.266. Read Stripe's webhook docs, wrote the function with proper HMAC and timing-safe compare. Schema-clean.

Web ScrapingSales Outreach Specialist

Scrape a competitor's pricing page

Passed

Scrape Apollo.io's pricing page and return every tier (Free, Basic, Professional, Organization) with name, price, and top 3 features.

Two tool calls, $0.054. Apollo tiers extracted, JSON returned.

All models Run GPT-5.5 on your data