🧠 how the machine makes up its mind

The Heuristic

Every rating runs an assembly line of small AI agents — each with one job — and ends in a transparent points system. No black-box vibes; here's exactly how it works.

The assembly line 🛠️

The ScoutGPT-5 mini

Given any URL, it hunts down the real pricing page — following navigation links, the sitemap, and a site's own machine-readable index. If the page just links out to the rates (a docs or detail page), it digs one level deeper. It will never settle for a blog post that merely mentions pricing.

↓

The ReaderGPT-5

Reads the page as untrusted data and pulls out only the literal pricing structure: plans, prices, metered dimensions, add-ons, and in-plan options. It ignores any text trying to instruct or flatter it, so a page can't talk its way to a better score. If the quick read looks suspiciously thin, it does a full browser render and tries again.

↓

The LibrarianGPT-5

Looks at the whole page to name the company and file it into one shelf: DevTools, PaaS, AI Lab, SaaS, or Education. The category sets the expectations the scorer judges against.

↓

The AnalystGPT-5

Sees only the clean, structured data — never the raw page — and works out the billing model and a plain-language summary. Because it can't read the page text, injected prose can't sway its judgement.

↓

The Scorerpure math · no model

A transparent points system. It doesn't ask a model for a number — it counts the structure (plans, meters, add-ons, sales walls) and adds up named deductions. Every point on a result page is one of these lines.

↓

The JudgeGPT-5 mini

A final QA pass. It compares the finished result against the page it read and flags anything that went wrong — wrong page, wrong company, missing details, an implausible score — so regressions get caught before they're published.

The data narrows as it flows: a messy page becomes clean structured facts, and only those facts — never the raw page — reach the parts that judge and score. That's also the trick that makes it prompt-injection-proof.

How the score is built 🧮

Two scores, each starting at 100. A clear, normal pricing page keeps almost all of it — points only come off for genuine friction.

🧾

Pricing clarity

Can a buyer quickly understand what they'll pay and predict the bill?

Free baseline — no penalty

✓ up to 3 plans
✓ 4 in-plan options
✓ 2 add-ons
✓ 3 metered dimensions

Then points come off for friction

Each plan beyond 3 to compare (a free tier counts for less)−6, up to −24
Mixing pricing models (flat + usage + add-ons all at once)−9 per model
Each metered dimension beyond 3 to track−3, up to −28
Each add-on beyond 2−4, up to −16
Each in-plan config choice beyond 4 (machine sizes, regions…)−2, up to −20
No self-serve price anywhere — you must call sales−45 (the big one)
A contact-sales Enterprise tier sitting on top of real pricing−4 (barely)
Pricing page too sprawling to read in full−15

💡 Usage-based pricing is not punished for existing — it's fair, pay-for-what-you-use. Only the complexity of many meters counts, and barely so for dev tools, platforms, clouds & AI labs (it's expected there). Education is the exception — pricing should be flat.

🧱 Scope-aware: a platform that sells 40 things will naturally have more plans. That breadth is largely forgiven — but per-product complexity (lots of meters) is not.

🤖

Agent easiness

Could an AI agent actually read and understand the page on its own?

A clean, self-contained pricing page = 100

Points come off when reading is hard

Couldn't load the page at all−100
No clean machine-readable version — had to parse raw HTML−15
Couldn't find any real prices to read−40 to −55
Page too large to read in one go−20
Real prices weren't on the pricing page (scattered across sub-pages)−20
Prices gated behind a login or sales call−20

🔎 This is separate from clarity on purpose: the pricing might be perfectly clear to a human, but if an agent has to render JavaScript or dig through sub-pages to find the rates, that's a worse experience — and it shows here.

← back to the leaderboard