Get financials on a company
Find Hightouch on Crunchbase and return their total funding, last round type, and a one-line description.
5 tool calls, $0.104. Standard Sales pass with appropriate verification.
moonshot
Confident prose, missing code. Token-repetition loop on Reddit. Read the actual outputs before you trust this one.
Moonshot AI's Kimi K2.6 — a Chinese long-context model with reasoning capabilities. We tested the production endpoint as of November 2026.
Results by agent
Find Hightouch on Crunchbase and return their total funding, last round type, and a one-line description.
5 tool calls, $0.104. Standard Sales pass with appropriate verification.
Read a real r/sales thread on enrichment tools and rank the top 5 by how many people recommended them.
✗ 3 tool calls, $0.423. Token-repetition loop mid-JSON. The model never recovered; the run was killed after the repetition pattern was detected.
Pull Google Shopping reviews for AirPods Pro 2 (USB-C) and return the top 3 recurring complaints, with verbatim quotes.
3 tool calls, $0.333. Cleanest Kimi run — read reviews, grouped complaints, returned three verbatim quotes.
Read Stripe's official docs and write a real, working webhook-verification function in TypeScript.
✗ 5 tool calls, $0.268. The dangerous one. Confident, schema-valid output. Read the Stripe docs, wrote a 3-sentence explanation of HMAC-SHA256, then in the code field omitted the `createHmac` call entirely. Described the algorithm without implementing it.
Scrape Apollo.io's pricing page and return every tier (Free, Basic, Professional, Organization) with name, price, and top 3 features.
3 tool calls, $0.145. Apollo tiers extracted; passable but the most expensive Apollo pass among the open-weights models in the matrix.