Get financials on a company
Find Hightouch on Crunchbase and return their total funding, last round type, and a one-line description.
3 tool calls, $0.048. One of the cheapest Sales passes in the matrix.
zhipu
Smart quotes in Stripe code. Empty final on Reddit. JSON encoding is GLM 5.1's nemesis.
Zhipu AI's GLM 5.1 — a Chinese reasoning model with open weights. We tested the production endpoint as of November 2026.
Results by agent
Find Hightouch on Crunchbase and return their total funding, last round type, and a one-line description.
3 tool calls, $0.048. One of the cheapest Sales passes in the matrix.
Read a real r/sales thread on enrichment tools and rank the top 5 by how many people recommended them.
✗ 7 tool calls, $0.000. Empty final message — the model read Reddit, then ran out of agent-loop budget without ever writing the JSON. No characters returned; no rubric to grade.
Pull Google Shopping reviews for AirPods Pro 2 (USB-C) and return the top 3 recurring complaints, with verbatim quotes.
4 tool calls, $0.420. Solid CX pass with three verbatim complaints.
Read Stripe's official docs and write a real, working webhook-verification function in TypeScript.
✗ 11 tool calls, $0.667. Visible attempt at the right answer, but the JSON had broken escaping — smart quotes inside the code string, unbalanced backtick template-literal fences. Schema parser rejected the envelope before grading the code.
Scrape Apollo.io's pricing page and return every tier (Free, Basic, Professional, Organization) with name, price, and top 3 features.
2 tool calls, $0.053. Clean Apollo pass.