Introduction
Getting started
Mint an API key, point your coding tool at Ormas, and watch the savings console — under two minutes.
What Ormas does
Ormas is a drop-in API layer between your coding tool and the model API. It routes routine turns to verified cheaper models underneath, quality-checks each result with an independent judge, and falls back to your declared model if the answer doesn't hold up.
You keep your declared model — it's what you pay the tax on and what you'll fall back to. We handle the routing.
Supported clients today: Claude Code, Cursor, and any client that speaks the Anthropic Messages API format. OpenAI-compatible clients (Codex, GPT-based tools) are on the roadmap — the gateway currently speaks Anthropic in.
Step 1 — Sign in and mint a key
- Sign in at ormas.ai/signin (Google or GitHub).
- Go to Console → API Keys (
/app/keys). - Click New key, give it a name (e.g.
laptop), copy the key shown — it's only displayed once.
Your key looks like tb_live_a1b2…. Keep it secret; it authenticates you to the gateway.
Step 2 — Point your client at the gateway
Replace your existing ANTHROPIC_BASE_URL (or equivalent) with the Ormas gateway:
export ANTHROPIC_BASE_URL=https://api.ormas.ai
export ANTHROPIC_API_KEY=tb_live_<your-key>
For Claude Code specifically, add this to your shell profile or use the claude-ormas.sh wrapper:
ANTHROPIC_BASE_URL=https://api.ormas.ai ANTHROPIC_API_KEY=tb_live_<your-key> claude
The gateway speaks the Anthropic Messages API verbatim — no format changes, no SDK changes.
Step 3 — Pick your declared model
Use the model you actually want as your declared baseline — claude-opus-4-8, claude-sonnet-4-6, or any supported model. Ormas routes cheaper underneath; the declared model is what you fall back to on any turn the router can't improve.
BYOK: If you want your own Anthropic key to pay inference directly, pass it in X-Provider-Key:
curl https://api.ormas.ai/v1/messages \
-H "x-api-key: tb_live_<your-key>" \
-H "X-Provider-Key: sk-ant-<your-real-key>" \
-d '{"model":"claude-opus-4-8", "messages":[...]}'
With BYOK, your key pays the routed model — we tax the declared baseline rate.
Step 4 — Check the savings console
After a few turns, open the Savings page in the console (/app/savings). You'll see:
- Savings headline — total saved vs declaring-and-paying your baseline on every turn.
- Routing ladder — which model rungs are serving which turns.
- Quality signal — the judge's accept rate. If it's low, the router is still accumulating evidence for your task profile.
The router gets better as it sees more of your traffic. Check back after a day of real usage.
Supported models
| Declared baseline | Routed cheaper to |
|---|---|
claude-opus-4-8 | claude-sonnet-4-6, claude-haiku-4-5, grok-build-0.1 |
claude-sonnet-4-6 | claude-haiku-4-5, grok-build-0.1 |
claude-haiku-4-5 | (already floor — served as-is) |
Cross-provider routing (grok) requires the crossProviderEnabled flag on your API key — toggle it in the API Keys settings.
Troubleshooting
Gateway returns 401 — your tb_live_ key is wrong or revoked. Regenerate in /app/keys.
All turns fell back — the router is still accumulating evidence for your task archetypes. Normal for the first day. The savings console shows n_fell_back vs n_turns.
Savings show $0 — check that baseline_model in the console matches your declared model. A mismatch means the gateway received a different model name than you intended.