Reference

API reference

Ormas gateway endpoints, model IDs, BYOK header, and key management.

Base URL

https://api.ormas.ai

All endpoints speak the Anthropic Messages API format. Existing SDK clients work without changes — only the base URL and API key change.


Authentication

Pass your tb_live_ key as the x-api-key header (the Anthropic SDK's default auth header):

curl https://api.ormas.ai/v1/messages \
  -H "x-api-key: tb_live_<your-key>" \
  -H "content-type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model":"claude-opus-4-8","max_tokens":1024,"messages":[{"role":"user","content":"Hello"}]}'

Keys are issued and managed at /app/keys. A key scoped to your account can only read your own tenant's data.


BYOK — bring your own provider key

Add X-Provider-Key alongside x-api-key to have your own Anthropic key (or other supported provider key) pay the inference. The gateway routes using your key; we tax only the declared baseline rate.

-H "X-Provider-Key: sk-ant-<your-anthropic-key>"

Multi-provider format (for cross-provider routing with grok):

-H "X-Provider-Key: anthropic=sk-ant-<key>, xai=xai-<key>"

Without X-Provider-Key, Ormas uses managed keys and charges you accordingly.


POST /v1/messages

Drop-in replacement for https://api.anthropic.com/v1/messages. All Anthropic request/response fields pass through unchanged.

Routing behavior:

  • Ormas classifies the turn (rung: haiku / sonnet / opus / fable).
  • If a cheaper model is available for this rung with sufficient quality evidence, it serves the turn.
  • An async judge (haiku-class) grades the response. If rejected, the next request for this archetype gets the baseline.
  • Streaming and non-streaming both supported.

The model field in the response always reflects the declared model you requested, not the routed model (the routing is our moat — the fee math is fully reproducible from public inputs without it).


Supported model IDs

Use standard Anthropic model IDs as the model field:

Model IDNotes
claude-opus-4-8Highest rung — most aggressive down-routing
claude-sonnet-4-6Mid rung
claude-haiku-4-5-20251001Floor — served as-is, no down-routing
claude-fable-5Top rung

GET /v1/savings

Returns quality-verified savings data for the authenticated tenant. Used by the savings console.

Auth: x-api-key: tb_live_<your-key>

Response shape:

{
  "tenant": "your-tenant-id",
  "days": 30,
  "n_turns": 1247,
  "n_fell_back": 43,
  "total_cost_usd": 4.21,
  "total_baseline_usd": 12.88,
  "savings_usd": 8.67,
  "savings_pct": 0.673,
  "routing_ladder": [
    { "rung": "sonnet", "n_turns": 1100, "actual_usd": 3.80, "baseline_usd": 11.30, "savings_usd": 7.50 },
    { "rung": "haiku", "n_turns": 147, "actual_usd": 0.41, "baseline_usd": 1.58, "savings_usd": 1.17 }
  ],
  "quality": {
    "n_judged": 312,
    "n_accept": 289,
    "accept_rate": 0.926,
    "sample_coverage": 0.25
  }
}

POST /api/internal/verify-key

Internal gateway ↔ tb-web handshake. Not for customer use. The gateway resolves tb_live_ keys to tenant IDs + feature flags through this endpoint using a shared secret.