ProxyLLM is a drop-in proxy that sits between your app and AI providers. Semantic caching, smart model routing, and a live cost dashboard — cutting up to 20–40% off cacheable workloads.
// Before — direct to OpenAI
const openai = new OpenAI()
// After — through ProxyLLM
const openai = new OpenAI({
baseURL: "https://api.proxyllm.dev/v1",
apiKey: "pl_your_api_key",
})No SDK to install. No code to refactor. Just change two lines.
Swap your baseURL and API key. The official OpenAI SDK and Anthropic SDK (and any compatible client) work as-is.
ProxyLLM caches semantically similar queries, routes simple requests to cheaper models, and logs every call.
Watch your costs drop in real time. Tag requests by feature to see exactly where your money goes.
One proxy between your app and AI providers. Full visibility, automatic savings, zero code changes beyond the initial setup.
Save 20-40% on repeated queries. We match semantically similar prompts — not just exact strings — so "What's the weather?" and "Tell me the weather" share one cached response.
Auto-route simple queries to GPT-4o Mini. Complex prompts go to your workspace's default model (must be on your plan's whitelist, or enable BYOK). Scale unlocks custom routing rules.
Add an x-proxyllm-tag header to group costs by feature, team, or customer. See exactly which part of your app is burning through tokens.
Drop-in replacement for the OpenAI API. Works with the official SDKs for Python, Node, Go, and anything else that speaks the chat completions format.
Real-time monitoring of requests, tokens, costs, cache hit rates, and model distribution. Filterable by tag, model, and time range.
Built-in usage tiers with per-minute and per-day rate limits. Upgrade plans in one click. No surprise bills.
Works with the OpenAI SDK, the Anthropic SDK, LangChain, LlamaIndex, or any compatible HTTP client. Here's the Node.js example:
import OpenAI from "openai"
const client = new OpenAI({
apiKey: "sk-...your-key",
})
const res = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ ]
role: "user",
content: "Hello!"
}],
})import OpenAI from "openai"
const client = new OpenAI({
baseURL: "https://api.proxyllm.dev/v1",
apiKey: "pl_your_api_key",
})
const res = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ ]
role: "user",
content: "Hello!"
}],
})Using Claude? Point the Anthropic SDK at https://api.proxyllm.dev and call /v1/messagesthe same way — one workspace key, both SDKs, shared cache. See the Anthropic SDK docs.
Start free. Upgrade when you need more. No hidden fees and no per-token markup — managed plans include a monthly usage allowance (requests + a fair-use budget); BYOK runs uncapped on your own provider key. See plan limits.
For side projects and experimentation.
Managed cross-format cache + single bill — or BYOK any model at 0% markup.
For teams running AI at scale.
Pro's value is the cache. Bringing your own provider key? Estimate how much it saves on your provider bill versus the $29/mo fee.
At these numbers Pro pays for itself — about $34/mo net after the $29 fee, before counting the dashboard, routing, and cost analytics.
Estimate using provider list prices (verified June 2026); your negotiated rates and prompt mix will differ. ProxyLLM measures your real cache-hit rate per request, so you can check this against actual usage. Managed gpt-4o-mini is different: a flat $29/mo for 50,000 requests with cache + dashboard, with no per-token cost to you — the calculator above is for BYOK on your own models.
| ProxyLLM | Helicone | Portkey | Vercel AI Gateway | |
|---|---|---|---|---|
| Entry price | $29/mo Pro | $79/mo Pro | $49/mo | Free + 0% markup |
| Semantic cache | Yes (cross-format) | Beta | Plugin | No |
| BYOK (zero markup) | Pro/Scale | Yes | Yes (BYOK-only) | Yes |
| Anthropic /v1/messages | Native | Native | Native | Native |
| Single bill (managed) | Yes | Token + Helicone | No | Token + Vercel |
| Active development | Yes | Maintenance only(acq. by Mintlify, Mar 2026) | Yes | Yes |
ProxyLLM's pitch: managed cross-format semantic cache, single bill, lowest entry price — with BYOK on Pro/Scale for zero-markup heavy users.