Aperture Docs

Aperture is an OpenAI-compatible gateway routing requests through Singapore to every major AI provider. Bring your own client (Cursor, Claude Code, Python SDK, anything that speaks the OpenAI Chat Completions format) and point it at us.

One endpoint, 20+ frontier models. No SDK changes needed. Switch between GPT-5.2, Claude Sonnet, Gemini Pro, DeepSeek, Qwen and more by changing the model string. Your code stays the same.

30-second quickstart

Register for an instant key — no card needed. Call any Aperture free model at $0 to start right away; subscribe or top up to unlock the full catalog and remove limits.

No credit yet? New keys can call any Aperture free model (ids ending in :free) at $0 immediately — a rate-limited sandbox (20/min · 200/day). Find free models in your dashboard catalog. Subscribe or top up to unlock all models and remove limits. Credits never expire; failed requests are never charged.

1. Grab your API key

Register at aperture-ai.run/register — instant sg-... key, no card. Already have one? Log in to copy from your dashboard.

2. Point your client at Aperture

# Drop these into your shell profile or .env export OPENAI_API_KEY="sg-YOUR-KEY-HERE" export OPENAI_BASE_URL="https://aperture-ai.run/api/v1"
from openai import OpenAI client = OpenAI( api_key="sg-YOUR-KEY-HERE", base_url="https://aperture-ai.run/api/v1", ) resp = client.chat.completions.create( model="anthropic/claude-haiku-4.5", messages=[{"role": "user", "content": "Hello from Singapore"}], ) print(resp.choices[0].message.content)
import OpenAI from "openai"; const client = new OpenAI({ apiKey: "sg-YOUR-KEY-HERE", baseURL: "https://aperture-ai.run/api/v1", }); const resp = await client.chat.completions.create({ model: "openai/gpt-5-mini", messages: [{ role: "user", content: "Hello" }], }); console.log(resp.choices[0].message.content);
curl https://aperture-ai.run/api/v1/chat/completions \ -H "Authorization: Bearer sg-YOUR-KEY-HERE" \ -H "Content-Type: application/json" \ -d '{ "model": "google/gemini-3.5-flash", "messages": [{"role":"user","content":"Hello"}] }'

3. Done. Switch models freely

Swap model to any ID from the catalog. Same code, different brain. Your client doesn't know or care.

Authentication

Send your Aperture key as a Bearer token. Aperture replaces the upstream key on the server — your provider credentials never touch the wire.

Authorization: Bearer sg-YOUR-KEY-HERE

Keys live in your dashboard. Rotate by clicking Reset key. Old keys invalidate immediately — update your env vars before rotating.

Never commit your key. Use .env files, secrets managers, or your shell profile. Aperture detects exposed keys via GitHub scanning and auto-rotates them within minutes.

Billing

API usage is charged to your Aperture token balance at upstream cost × 1.2. Every major model works here — pricing follows each model's real upstream cost, so cheaper models cost fewer tokens.

How tokens work

Usage is billed per token against your balance. 1 token = $0.0001 of the request's real upstream cost, with a small Aperture margin on top. Cheaper models and shorter requests always cost less — there's no fixed per-message fee. Your dashboard shows the balance in dollars.

Cursor

Cursor's OpenAI-compatible setting accepts any base URL. Works for chat, Cmd+K, and inline edits.

  1. 1. Open Settings → Models → OpenAI API Key
  2. 2. Toggle Override OpenAI Base URL on
  3. 3. Paste these values:
# In Cursor settings API Key: sg-YOUR-KEY-HERE Base URL: https://aperture-ai.run/api/v1 # Add custom model names (one per line) openai/gpt-5.2 anthropic/claude-sonnet-4.6 google/gemini-3.5-flash deepseek/deepseek-chat

Click Verify — Cursor pings /api/v1/models and confirms. Now Cmd+K and chat use Aperture under the hood.

Claude Code Pro feature

Claude Code uses Anthropic's native /v1/messages format, not the OpenAI-compatible /v1/chat/completions shape. So you can't point it directly at /api/v1 — it would send the wrong wire format.

The cleanest fix today is claude-code-router, an open-source tool that runs locally and translates Anthropic API calls to any OpenAI-compatible endpoint. We use it ourselves.

Step 1 — Install the router

npm install -g @musistudio/claude-code-router

Step 2 — Create the config

Save this to ~/.claude-code-router/config.json:

{ "Providers": [ { "name": "aperture", "api_base_url": "https://aperture-ai.run/api/v1/chat/completions", "api_key": "sg-YOUR-KEY-HERE", "models": [ "anthropic/claude-sonnet-4.6", "anthropic/claude-haiku-4.5", "openai/gpt-5.2", "deepseek/deepseek-chat" ] } ], "Router": { "default": "aperture,anthropic/claude-sonnet-4.6", "background": "aperture,anthropic/claude-haiku-4.5", "think": "aperture,anthropic/claude-sonnet-4.6", "longContext": "aperture,anthropic/claude-sonnet-4.6" } }

Step 3 — Launch Claude Code through the router

# Instead of `claude`, run: ccr code # That's it. Claude Code now routes through Aperture. # Latency drops by ~200ms in SEA, and you can switch models per-task.

Why bother? Claude Code on default routing hits api.anthropic.com direct (US-east). Through Aperture you save 100-300ms per call from SEA, can fall back to GPT-5.2 or DeepSeek when Anthropic rate-limits, and get unified billing.

For a managed adapter (no local router), see the native Anthropic endpoint — Pro/Enterprise only, currently in beta.

OpenAI Codex CLI

Codex CLI reads OPENAI_BASE_URL from env. One-liner:

export OPENAI_BASE_URL="https://aperture-ai.run/api/v1" export OPENAI_API_KEY="sg-YOUR-KEY-HERE" codex "refactor utils.py to use async/await"

To run Codex with a non-OpenAI model, pass --model:

codex --model deepseek/deepseek-chat "add tests for handlers/auth.go"

Continue.dev (VS Code · JetBrains)

Edit ~/.continue/config.json and add an Aperture-backed model:

{ "models": [ { "title": "Aperture · GPT-5.2", "provider": "openai", "model": "openai/gpt-5.2", "apiBase": "https://aperture-ai.run/api/v1", "apiKey": "sg-YOUR-KEY-HERE" }, { "title": "Aperture · Claude Sonnet", "provider": "openai", "model": "anthropic/claude-sonnet-4.6", "apiBase": "https://aperture-ai.run/api/v1", "apiKey": "sg-YOUR-KEY-HERE" } ] }

Reload Continue, pick the model from the dropdown — done. Ship code from Singapore latency.

Aider

Aider reads OpenAI env vars by default. Run from any repo:

export OPENAI_API_BASE="https://aperture-ai.run/api/v1" export OPENAI_API_KEY="sg-YOUR-KEY-HERE" aider --model openai/anthropic/claude-sonnet-4.6

Aider's openai/ prefix tells it to use the OpenAI-compatible client; the second slash is the model namespace.

OpenClaw

OpenClaw (and any OpenAI-compatible agent runner) takes a base URL + key + model. Point it at Aperture:

base_url: https://aperture-ai.run/api/v1 api_key: sg-YOUR-KEY-HERE model: openai/gpt-5.2 # or any id from the catalog

No credit yet? Set model to any free model (id ending in :free) to run in the $0 sandbox first.

Python — full example

Drop-in replacement for direct OpenAI calls. The openai package handles streaming, retries, async — Aperture inherits all of it.

# pip install openai from openai import OpenAI client = OpenAI( api_key="sg-YOUR-KEY-HERE", base_url="https://aperture-ai.run/api/v1", ) # Non-streaming resp = client.chat.completions.create( model="anthropic/claude-haiku-4.5", messages=[ {"role": "system", "content": "You are a SEA travel guide."}, {"role": "user", "content": "3 must-eat spots in Hanoi old quarter"}, ], temperature=0.7, max_tokens=300, ) print(resp.choices[0].message.content) print(f"Cost: input={resp.usage.prompt_tokens}, output={resp.usage.completion_tokens}")

Node.js — full example

Both ESM and CommonJS work. Streaming uses async iterators.

// npm install openai import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.APERTURE_KEY, baseURL: "https://aperture-ai.run/api/v1", }); async function ask(prompt) { const resp = await client.chat.completions.create({ model: "openai/gpt-5-mini", messages: [{ role: "user", content: prompt }], }); return resp.choices[0].message.content; } console.log(await ask("Best Singapore hawker in 1 sentence"));

cURL recipes

Basic completion

curl https://aperture-ai.run/api/v1/chat/completions \ -H "Authorization: Bearer sg-YOUR-KEY-HERE" \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek/deepseek-chat", "messages": [{"role":"user","content":"Translate to Vietnamese: hello world"}] }'

JSON mode

curl https://aperture-ai.run/api/v1/chat/completions \ -H "Authorization: Bearer sg-YOUR-KEY-HERE" \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-5-mini", "messages": [{"role":"user","content":"List 3 SEA capitals as JSON array"}], "response_format": {"type": "json_object"} }'

Streaming (Server-Sent Events)

Pass "stream": true in the request body. Aperture pipes the upstream SSE through unchanged — no buffering, no extra latency. Cache header on streamed responses is always X-Cache: MISS.

from openai import OpenAI client = OpenAI(api_key="sg-...", base_url="https://aperture-ai.run/api/v1") stream = client.chat.completions.create( model="openai/gpt-5-mini", messages=[{"role": "user", "content": "Write a haiku about Singapore"}], stream=True, ) for chunk in stream: delta = chunk.choices[0].delta.content or "" print(delta, end="", flush=True)
const stream = await client.chat.completions.create({ model: "anthropic/claude-haiku-4.5", messages: [{ role: "user", content: "Stream a poem" }], stream: true, }); for await (const chunk of stream) { process.stdout.write(chunk.choices[0].delta.content || ""); }
curl -N https://aperture-ai.run/api/v1/chat/completions \ -H "Authorization: Bearer sg-..." \ -H "Content-Type: application/json" \ -d '{"model":"openai/gpt-5-mini","stream":true,"messages":[{"role":"user","content":"hi"}]}'

Each data: line carries one chunk. Watch for data: [DONE] as the terminator.

Function calling (tools)

Tool/function calling works on every model that supports it — GPT-5.2, Claude Sonnet, Gemini Pro, Grok, Mistral Large. Use the standard OpenAI tools array.

resp = client.chat.completions.create( model="openai/gpt-5.2", messages=[{"role": "user", "content": "Weather in Jakarta?"}], tools=[{ "type": "function", "function": { "name": "get_weather", "parameters": { "type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"], }, }, }], ) tool_call = resp.choices[0].message.tool_calls[0] print(tool_call.function.name, tool_call.function.arguments)

Model catalog

Pass any of these IDs as the model field. New models drop weekly — check /api/v1/models for the live list.

Model IDProviderBest for
openai/gpt-5.2OpenAIGeneral reasoning · vision · tool use
openai/gpt-5-miniOpenAICheap fast tasks · summaries
openai/o1OpenAIHard reasoning · math · code
anthropic/claude-sonnet-4.6AnthropicCode editing · long context
anthropic/claude-haiku-4.5AnthropicSpeed · classification · routing
google/gemini-3.5-flashGoogleCheapest fast tier · 1M context
google/gemini-2.5-proGoogleLong-doc analysis · vision
deepseek/deepseek-chatDeepSeekStrong cheap general · Chinese
deepseek/deepseek-r1DeepSeekReasoning at 1/30 the cost of o1
qwen/qwen3.7-plusAlibabaChinese-first tasks · cheap
x-ai/grok-4.3xAIReal-time data · search-like queries
qwen/qwen3.7-maxAlibabaTop Qwen tier · strong general

For raw token prices see pricing → token rates.

Endpoint reference

MethodPathUse
GET/healthLiveness probe
GET/targetsGateway info (OpenAI-compatible base URL)
GET/auth/meValidate API key, return plan
ANY/api/v1/*Recommended path. OpenAI-compatible · 20+ frontier models

Response caching

GET requests with identical query strings are cached in Singapore Redis. Cached responses come back in <5ms with X-Cache: HIT. Streaming and POST requests bypass cache by design.

TTL by plan: Free 60s, Pro 5min, Enterprise custom.

curl -I https://aperture-ai.run/api/v1/models \ -H "Authorization: Bearer sg-..." HTTP/2 200 x-cache: HIT cache-control: max-age=300 x-aperture-region: sgp1

Error codes

CodeMeaningAction
400Bad requestCheck model ID and message structure
401Invalid API keyVerify sg-... is current
402insufficient_creditsYour token balance is empty. Top up or subscribe at aperture-ai.run/account.html.
404Unknown model or targetSee catalog
429Rate limitedBackoff per Retry-After header
502Upstream errorProvider returned non-2xx — error body shows raw response
504Upstream timeoutDefault 120s. Reduce max_tokens or retry

Rate limits

TierModelsLimit
No credit (sandbox)Free models only (ids ending in :free)20 req/min · 200 req/day
Paid (any top-up or plan)All 300+ modelsNo request cap — limited only by your token balance

Billing is per-token (upstream cost × 1.2), not per-request — there is no hard daily request limit once you have credit. Free models always run at $0; top up to unlock the full catalog and remove the sandbox limit.

Hitting the sandbox limit returns 429 with Retry-After seconds. Use exponential backoff. Failed requests are never counted against your limit and never charged.

FAQ

Why do some models say they're a different brand?

Models like DeepSeek or Qwen sometimes claim to be GPT-4 or Claude when asked "who built you?" — that's a training-data leak, not a routing mistake. Aperture passes responses through unchanged. The only way to verify the real upstream is the response's x-aperture-upstream header, which lists the actual provider.

Are my prompts logged?

No. We log metadata (model, token count, status, latency) for billing and analytics, but not prompt or response bodies. The cache stores hashed inputs, not raw text.

What happens when an upstream goes down?

You get a 502 with the upstream's actual error message. Pro users can configure auto-fallback to a similar model — set X-Aperture-Fallback: anthropic/claude-haiku-4.5 on your request and we retry automatically.

Can I bring my own upstream API key?

Enterprise plans support BYOK. Your key gets injected per-request, and you pay providers directly. Talk to us if you need this.

How do I rotate my Aperture key?

Log in, go to dashboard, click Reset key. Old key dies in <1 second across all edge nodes. Update your .env before resetting.