About · Ormas

The efficient frontier of model capability — applied to every turn.

The simple story

In portfolio theory, the efficient frontier is the set of portfolios that give you the best return for a given level of risk. The idea translates directly to AI inference: for any given task, there is a cheapest model that still produces the quality you need. Anything more capable is waste; anything less capable is risk.

Ormas finds that frontier turn by turn. It sits between your coding tool and the model API, classifies each task, and routes to the cheapest model that can do the work. An independent judge then confirms the answer held up — if it doesn't, the turn falls back to your declared model and you pay nothing extra. You never see a degraded response.

The savings go mostly to you. We take a small tax on the model you declared — not the routed cost, not a share of savings. The math is predictable: it's a percentage of a public price.

Why riding the frontier carries no risk

A lot of routing tools claim savings and deliver degraded answers. The Ormas model only charges on turns that passed an independent quality judgment — so a worse answer doesn't just disappoint you, it removes our fee. That alignment is structural, not a policy.

Beyond the judge gate, there are two more layers:

BYOK. Your provider key pays the inference — we never hold your model credits or key material. We can only tax the declared baseline, not charge for the routed cost we don't pay.
Fallback on every turn. If the router has no confident cheaper option, or the quality judge rejects the result, the turn falls back to your declared model. The fallback costs you nothing beyond what you'd have paid anyway.

The frontier improves with your traffic

The frontier isn't static. Every turn that passes the quality judge teaches the router which task archetypes your declared model is genuinely over-serving. Code edits, tool calls, short completions — each archetype accumulates its own per-model quality evidence.

As the router sees more of your traffic, the frontier sharpens. Savings on day 30 are higher than savings on day 1 — not because the models changed, but because the routing intelligence knows more about where your work actually sits on the capability curve.

You can watch this in the savings console: the routing ladder shows which rungs are filling with verdicts, and the quality accept-rate tracks whether the trust is earned.

Honest accounting

We distinguish two numbers: savings (an estimate — the counterfactual of what you would have paid) and fee (a real number — computed exactly per turn, reconcilable to the cent, and never larger than the quality-adjusted savings we delivered).

The fee is transparent: it's a percentage of the declared model's rack rate, which is a public number. You can verify it yourself. We don't hide the routed model — we hide it from the console display because it's our routing moat — but the fee math is fully reproducible from public inputs.

Principles

Only charge on proven savings

The fee runs only on turns where the judge confirmed quality held. A failed route falls back and costs nothing. We earn more only when you save more, on answers that actually worked.

Your key pays the inference

BYOK is the default external path. We route; we don't hold credits. The data moat is in the routing telemetry — not in controlling your keys.

Savings is an estimate; fee is a fact

We never bill on a counterfactual. The savings number is honest marketing. The fee is a real line item, computed per turn from a public price, verifiable to the cent.

Set up in 2 minutes See pricing