Aperture Docs
Aperture is an OpenAI-compatible gateway routing requests through Singapore to every major AI provider. Bring your own client (Cursor, Claude Code, Python SDK, anything that speaks the OpenAI Chat Completions format) and point it at us.
One endpoint, 20+ frontier models. No SDK changes needed. Switch between GPT-5.2, Claude Sonnet, Gemini Pro, DeepSeek, Qwen and more by changing the model string. Your code stays the same.
30-second quickstart
Register for an instant key — no card needed. Call any Aperture free model at $0 to start right away; subscribe or top up to unlock the full catalog and remove limits.
No credit yet? New keys can call any Aperture free model (ids ending in :free) at $0 immediately — a rate-limited sandbox (20/min · 200/day). Find free models in your dashboard catalog. Subscribe or top up to unlock all models and remove limits. Credits never expire; failed requests are never charged.
1. Grab your API key
Register at aperture-ai.run/register — instant sg-... key, no card. Already have one? Log in to copy from your dashboard.
2. Point your client at Aperture
# Drop these into your shell profile or .env
export OPENAI_API_KEY="sg-YOUR-KEY-HERE"
export OPENAI_BASE_URL="https://aperture-ai.run/api/v1"from openai import OpenAI
client = OpenAI(
api_key="sg-YOUR-KEY-HERE",
base_url="https://aperture-ai.run/api/v1",
)
resp = client.chat.completions.create(
model="anthropic/claude-haiku-4.5",
messages=[{"role": "user", "content": "Hello from Singapore"}],
)
print(resp.choices[0].message.content)import OpenAI from "openai";
const client = new OpenAI({
apiKey: "sg-YOUR-KEY-HERE",
baseURL: "https://aperture-ai.run/api/v1",
});
const resp = await client.chat.completions.create({
model: "openai/gpt-5-mini",
messages: [{ role: "user", content: "Hello" }],
});
console.log(resp.choices[0].message.content);curl https://aperture-ai.run/api/v1/chat/completions \
-H "Authorization: Bearer sg-YOUR-KEY-HERE" \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemini-3.5-flash",
"messages": [{"role":"user","content":"Hello"}]
}'3. Done. Switch models freely
Swap model to any ID from the catalog. Same code, different brain. Your client doesn't know or care.
Authentication
Send your Aperture key as a Bearer token. Aperture replaces the upstream key on the server — your provider credentials never touch the wire.
Authorization: Bearer sg-YOUR-KEY-HEREKeys live in your dashboard. Rotate by clicking Reset key. Old keys invalidate immediately — update your env vars before rotating.
Never commit your key. Use .env files, secrets managers, or your shell profile. Aperture detects exposed keys via GitHub scanning and auto-rotates them within minutes.
Billing
API usage is charged to your Aperture token balance at upstream cost × 1.2. Every major model works here — pricing follows each model's real upstream cost, so cheaper models cost fewer tokens.
How tokens work
Usage is billed per token against your balance. 1 token = $0.0001 of the request's real upstream cost, with a small Aperture margin on top. Cheaper models and shorter requests always cost less — there's no fixed per-message fee. Your dashboard shows the balance in dollars.
Cursor
Cursor's OpenAI-compatible setting accepts any base URL. Works for chat, Cmd+K, and inline edits.
- 1. Open Settings → Models → OpenAI API Key
- 2. Toggle Override OpenAI Base URL on
- 3. Paste these values:
# In Cursor settings
API Key: sg-YOUR-KEY-HERE
Base URL: https://aperture-ai.run/api/v1
# Add custom model names (one per line)
openai/gpt-5.2
anthropic/claude-sonnet-4.6
google/gemini-3.5-flash
deepseek/deepseek-chatClick Verify — Cursor pings /api/v1/models and confirms. Now Cmd+K and chat use Aperture under the hood.
Claude Code Pro feature
Claude Code uses Anthropic's native /v1/messages format, not the OpenAI-compatible /v1/chat/completions shape. So you can't point it directly at /api/v1 — it would send the wrong wire format.
The cleanest fix today is claude-code-router, an open-source tool that runs locally and translates Anthropic API calls to any OpenAI-compatible endpoint. We use it ourselves.
Step 1 — Install the router
npm install -g @musistudio/claude-code-routerStep 2 — Create the config
Save this to ~/.claude-code-router/config.json:
{
"Providers": [
{
"name": "aperture",
"api_base_url": "https://aperture-ai.run/api/v1/chat/completions",
"api_key": "sg-YOUR-KEY-HERE",
"models": [
"anthropic/claude-sonnet-4.6",
"anthropic/claude-haiku-4.5",
"openai/gpt-5.2",
"deepseek/deepseek-chat"
]
}
],
"Router": {
"default": "aperture,anthropic/claude-sonnet-4.6",
"background": "aperture,anthropic/claude-haiku-4.5",
"think": "aperture,anthropic/claude-sonnet-4.6",
"longContext": "aperture,anthropic/claude-sonnet-4.6"
}
}Step 3 — Launch Claude Code through the router
# Instead of `claude`, run:
ccr code
# That's it. Claude Code now routes through Aperture.
# Latency drops by ~200ms in SEA, and you can switch models per-task.Why bother? Claude Code on default routing hits api.anthropic.com direct (US-east). Through Aperture you save 100-300ms per call from SEA, can fall back to GPT-5.2 or DeepSeek when Anthropic rate-limits, and get unified billing.
For a managed adapter (no local router), see the native Anthropic endpoint — Pro/Enterprise only, currently in beta.
OpenAI Codex CLI
Codex CLI reads OPENAI_BASE_URL from env. One-liner:
export OPENAI_BASE_URL="https://aperture-ai.run/api/v1"
export OPENAI_API_KEY="sg-YOUR-KEY-HERE"
codex "refactor utils.py to use async/await"To run Codex with a non-OpenAI model, pass --model:
codex --model deepseek/deepseek-chat "add tests for handlers/auth.go"Continue.dev (VS Code · JetBrains)
Edit ~/.continue/config.json and add an Aperture-backed model:
{
"models": [
{
"title": "Aperture · GPT-5.2",
"provider": "openai",
"model": "openai/gpt-5.2",
"apiBase": "https://aperture-ai.run/api/v1",
"apiKey": "sg-YOUR-KEY-HERE"
},
{
"title": "Aperture · Claude Sonnet",
"provider": "openai",
"model": "anthropic/claude-sonnet-4.6",
"apiBase": "https://aperture-ai.run/api/v1",
"apiKey": "sg-YOUR-KEY-HERE"
}
]
}Reload Continue, pick the model from the dropdown — done. Ship code from Singapore latency.
Aider
Aider reads OpenAI env vars by default. Run from any repo:
export OPENAI_API_BASE="https://aperture-ai.run/api/v1"
export OPENAI_API_KEY="sg-YOUR-KEY-HERE"
aider --model openai/anthropic/claude-sonnet-4.6Aider's openai/ prefix tells it to use the OpenAI-compatible client; the second slash is the model namespace.
OpenClaw
OpenClaw (and any OpenAI-compatible agent runner) takes a base URL + key + model. Point it at Aperture:
base_url: https://aperture-ai.run/api/v1
api_key: sg-YOUR-KEY-HERE
model: openai/gpt-5.2 # or any id from the catalogNo credit yet? Set model to any free model (id ending in :free) to run in the $0 sandbox first.
Python — full example
Drop-in replacement for direct OpenAI calls. The openai package handles streaming, retries, async — Aperture inherits all of it.
# pip install openai
from openai import OpenAI
client = OpenAI(
api_key="sg-YOUR-KEY-HERE",
base_url="https://aperture-ai.run/api/v1",
)
# Non-streaming
resp = client.chat.completions.create(
model="anthropic/claude-haiku-4.5",
messages=[
{"role": "system", "content": "You are a SEA travel guide."},
{"role": "user", "content": "3 must-eat spots in Hanoi old quarter"},
],
temperature=0.7,
max_tokens=300,
)
print(resp.choices[0].message.content)
print(f"Cost: input={resp.usage.prompt_tokens}, output={resp.usage.completion_tokens}")Node.js — full example
Both ESM and CommonJS work. Streaming uses async iterators.
// npm install openai
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.APERTURE_KEY,
baseURL: "https://aperture-ai.run/api/v1",
});
async function ask(prompt) {
const resp = await client.chat.completions.create({
model: "openai/gpt-5-mini",
messages: [{ role: "user", content: prompt }],
});
return resp.choices[0].message.content;
}
console.log(await ask("Best Singapore hawker in 1 sentence"));cURL recipes
Basic completion
curl https://aperture-ai.run/api/v1/chat/completions \
-H "Authorization: Bearer sg-YOUR-KEY-HERE" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek/deepseek-chat",
"messages": [{"role":"user","content":"Translate to Vietnamese: hello world"}]
}'JSON mode
curl https://aperture-ai.run/api/v1/chat/completions \
-H "Authorization: Bearer sg-YOUR-KEY-HERE" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-5-mini",
"messages": [{"role":"user","content":"List 3 SEA capitals as JSON array"}],
"response_format": {"type": "json_object"}
}'Streaming (Server-Sent Events)
Pass "stream": true in the request body. Aperture pipes the upstream SSE through unchanged — no buffering, no extra latency. Cache header on streamed responses is always X-Cache: MISS.
from openai import OpenAI
client = OpenAI(api_key="sg-...", base_url="https://aperture-ai.run/api/v1")
stream = client.chat.completions.create(
model="openai/gpt-5-mini",
messages=[{"role": "user", "content": "Write a haiku about Singapore"}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)const stream = await client.chat.completions.create({
model: "anthropic/claude-haiku-4.5",
messages: [{ role: "user", content: "Stream a poem" }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0].delta.content || "");
}curl -N https://aperture-ai.run/api/v1/chat/completions \
-H "Authorization: Bearer sg-..." \
-H "Content-Type: application/json" \
-d '{"model":"openai/gpt-5-mini","stream":true,"messages":[{"role":"user","content":"hi"}]}'Each data: line carries one chunk. Watch for data: [DONE] as the terminator.
Function calling (tools)
Tool/function calling works on every model that supports it — GPT-5.2, Claude Sonnet, Gemini Pro, Grok, Mistral Large. Use the standard OpenAI tools array.
resp = client.chat.completions.create(
model="openai/gpt-5.2",
messages=[{"role": "user", "content": "Weather in Jakarta?"}],
tools=[{
"type": "function",
"function": {
"name": "get_weather",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
},
}],
)
tool_call = resp.choices[0].message.tool_calls[0]
print(tool_call.function.name, tool_call.function.arguments)Model catalog
Pass any of these IDs as the model field. New models drop weekly — check /api/v1/models for the live list.
| Model ID | Provider | Best for |
|---|---|---|
| openai/gpt-5.2 | OpenAI | General reasoning · vision · tool use |
| openai/gpt-5-mini | OpenAI | Cheap fast tasks · summaries |
| openai/o1 | OpenAI | Hard reasoning · math · code |
| anthropic/claude-sonnet-4.6 | Anthropic | Code editing · long context |
| anthropic/claude-haiku-4.5 | Anthropic | Speed · classification · routing |
| google/gemini-3.5-flash | Cheapest fast tier · 1M context | |
| google/gemini-2.5-pro | Long-doc analysis · vision | |
| deepseek/deepseek-chat | DeepSeek | Strong cheap general · Chinese |
| deepseek/deepseek-r1 | DeepSeek | Reasoning at 1/30 the cost of o1 |
| qwen/qwen3.7-plus | Alibaba | Chinese-first tasks · cheap |
| x-ai/grok-4.3 | xAI | Real-time data · search-like queries |
| qwen/qwen3.7-max | Alibaba | Top Qwen tier · strong general |
For raw token prices see pricing → token rates.
Endpoint reference
| Method | Path | Use |
|---|---|---|
| GET | /health | Liveness probe |
| GET | /targets | Gateway info (OpenAI-compatible base URL) |
| GET | /auth/me | Validate API key, return plan |
| ANY | /api/v1/* | Recommended path. OpenAI-compatible · 20+ frontier models |
Response caching
GET requests with identical query strings are cached in Singapore Redis. Cached responses come back in <5ms with X-Cache: HIT. Streaming and POST requests bypass cache by design.
TTL by plan: Free 60s, Pro 5min, Enterprise custom.
curl -I https://aperture-ai.run/api/v1/models \
-H "Authorization: Bearer sg-..."
HTTP/2 200
x-cache: HIT
cache-control: max-age=300
x-aperture-region: sgp1Error codes
| Code | Meaning | Action |
|---|---|---|
| 400 | Bad request | Check model ID and message structure |
| 401 | Invalid API key | Verify sg-... is current |
| 402 | insufficient_credits | Your token balance is empty. Top up or subscribe at aperture-ai.run/account.html. |
| 404 | Unknown model or target | See catalog |
| 429 | Rate limited | Backoff per Retry-After header |
| 502 | Upstream error | Provider returned non-2xx — error body shows raw response |
| 504 | Upstream timeout | Default 120s. Reduce max_tokens or retry |
Rate limits
| Tier | Models | Limit |
|---|---|---|
| No credit (sandbox) | Free models only (ids ending in :free) | 20 req/min · 200 req/day |
| Paid (any top-up or plan) | All 300+ models | No request cap — limited only by your token balance |
Billing is per-token (upstream cost × 1.2), not per-request — there is no hard daily request limit once you have credit. Free models always run at $0; top up to unlock the full catalog and remove the sandbox limit.
Hitting the sandbox limit returns 429 with Retry-After seconds. Use exponential backoff. Failed requests are never counted against your limit and never charged.
FAQ
Why do some models say they're a different brand?
Models like DeepSeek or Qwen sometimes claim to be GPT-4 or Claude when asked "who built you?" — that's a training-data leak, not a routing mistake. Aperture passes responses through unchanged. The only way to verify the real upstream is the response's x-aperture-upstream header, which lists the actual provider.
Are my prompts logged?
No. We log metadata (model, token count, status, latency) for billing and analytics, but not prompt or response bodies. The cache stores hashed inputs, not raw text.
What happens when an upstream goes down?
You get a 502 with the upstream's actual error message. Pro users can configure auto-fallback to a similar model — set X-Aperture-Fallback: anthropic/claude-haiku-4.5 on your request and we retry automatically.
Can I bring my own upstream API key?
Enterprise plans support BYOK. Your key gets injected per-request, and you pay providers directly. Talk to us if you need this.
How do I rotate my Aperture key?
Log in, go to dashboard, click Reset key. Old key dies in <1 second across all edge nodes. Update your .env before resetting.