Cost-Control
Use Case
This scenario is for a team using one OpenAI account behind one internal gateway project.
The team wants a single OpenAI-compatible endpoint for its applications, but it does not want the project to have unlimited access to every model. The gateway accepts the team’s gateway credentials, forwards traffic to OpenAI with the operator-owned OpenAI API key, records cost by organization, project, client, and gateway key, and applies different project-level request and token limits for each allowed model.
The example exposes two OpenAI models:
| Model | Intended use | RPM | TPM |
|---|---|---|---|
gpt-5.4-mini | Default team work | 260 | 120,000 |
gpt-5.4 | Higher-quality work | 40 | 60,000 |
All three quotas are project-level constraints on the Team AI project: one monthly spend cap (scoped by cost unit) and per-model RPM and TPM limits on each model.
| Cost unit | Monthly cap |
|---|---|
| USD | 1,200 |
The rendered gateway config enforces RPM, TPM, model allowlist, and the project monthly spend budget. Spend enforcement is gateway-local, reset by the gateway’s local calendar month, and tracked in the configured cost unit.
Quota policies accept these enforcement modes:
enforcement_mode | Meaning | Use it when |
|---|---|---|
enforce | Apply the request and token limits at the gateway. Requests over the configured RPM/TPM policy are denied by the rate-limit module. | You want the gateway to actively protect the project budget and provider account. This scenario uses enforce. |
monitor | Store the quota policy for reporting and review without using it as a gateway deny policy. | You want to observe current usage before turning on active blocking. |
Replace the organization IDs, project/client names, gateway key IDs, rate card, and OPENAI_API_KEY value with your own values. The manifest stores gateway key IDs only; issue raw gateway credentials with the provisioning issuer so the secret part is generated and stored in the runtime issuer maps.
Keep deployment.gateway.upstreams[].ssl_name set to the provider hostname used for TLS SNI. For OpenAI, keep ssl_name: api.openai.com unless your account uses a different upstream hostname. The renderer turns this into proxy_ssl_name; removing or mismatching it can cause HTTPS handshake failures even when the upstream server value looks correct.
The renderer applies the manifest from deployment.gateway.upstreams, deployment.gateway.credentials, deployment.gateway.routes, organizations, projects, clients, and quota policies.
If your only provider is Anthropic instead of OpenAI, use the same structure: change the provider name, upstream host, credential environment variable, model names, endpoint dialect, and rate card to Anthropic values, then keep the same project/client credential and per-model quota pattern.
After applying this scenario and issuing credentials, internal clients should call the gateway at:
| Client | Gateway URL | Credential to issue | Header |
|---|---|---|---|
| Team AI Gateway Key | http://team-ai.gateway.internal/v1/chat/completions | da-sk-acme-team.<issued-secret> | Authorization: Bearer da-sk-acme-team.<issued-secret> |
The prefix before the dot is the key_id from the manifest. The secret suffix is printed once by packaging/provision/issue.ts --display-once; it is not stored in the manifest and should not be checked into source control.
Manifest
customer:
serial: "replace-with-customer-serial"
company: "Acme AI Team"
contact: "[email protected]"
product: enterprise
deployment:
environment_id: acme-prod
environment_name: "Acme Production"
deployment_id: "acme-prod-llm-gateway-001"
product_version: "1.0.6"
enabled_features:
- name: llm-auth
enabled: true
notes: gateway credentials are separated from the OpenAI upstream key
- name: llm-proxy
enabled: true
notes: one OpenAI endpoint with an explicit two-model catalog
- name: llm-ratelimit
enabled: true
notes: project quotas plus per-model request and token limits
- name: llm-cost
enabled: true
notes: per-request accounting using the OpenAI rate card below
config:
rate_card_version: openai-standard-2026-06-07
rate_units:
- { provider: openai, unit: usd }
rates:
- { provider: openai, model: gpt-5.4-mini, input: 0.75, output: 4.50 }
- { provider: openai, model: gpt-5.4, input: 2.50, output: 15.00 }
cached_rates:
- { provider: openai, model: gpt-5.4-mini, cache_read: 0.075 }
- { provider: openai, model: gpt-5.4, cache_read: 0.25 }
configured_providers:
- openai
gateway:
upstreams:
- name: openai_api
server: api.openai.com:443
ssl_name: api.openai.com
keepalive: 32
credentials:
- provider: openai
api_key: env:OPENAI_API_KEY
routes:
- location: /v1/chat/completions
provider: openai
dialect: openai
upstream: openai_api
auth_fail_closed: true
organizations:
- organization_id: "10000000-0000-0000-0000-000000000001"
organization_slug: acme
organization_name: "Acme AI Team"
environment_id: acme-prod
status: active
runtime: {}
quotas:
- scope_type: project
project_id: "10000000-0000-0000-0001-000000000001"
monthly_spend_limit: 1200
monthly_spend_unit: usd
enforcement_mode: enforce
notes: monthly project spend budget for the shared team project
- scope_type: project
project_id: "10000000-0000-0000-0001-000000000001"
rpm_limit: 260
tpm_limit: 120000
enforcement_mode: enforce
model_allowlist:
- gpt-5.4-mini
notes: default model budget for the shared team project
- scope_type: project
project_id: "10000000-0000-0000-0001-000000000001"
rpm_limit: 40
tpm_limit: 60000
enforcement_mode: enforce
model_allowlist:
- gpt-5.4
notes: high-capability model budget for the shared team project
projects:
- project_id: "10000000-0000-0000-0001-000000000001"
project_slug: team-ai
project_name: "Team AI"
status: active
clients:
- client_id: "10000000-0000-0001-0001-000000000001"
client_name: "Team AI Gateway Key"
client_type: gateway_key
status: active
api_keys:
- key_id: da-sk-acme-team
tier: basic
status: active
{
"customer": {
"serial": "replace-with-customer-serial",
"company": "Acme AI Team",
"contact": "[email protected]",
"product": "enterprise"
},
"deployment": {
"environment_id": "acme-prod",
"environment_name": "Acme Production",
"deployment_id": "acme-prod-llm-gateway-001",
"product_version": "1.0.6",
"enabled_features": [
{ "name": "llm-auth", "enabled": true, "notes": "gateway credentials are separated from the OpenAI upstream key" },
{ "name": "llm-proxy", "enabled": true, "notes": "one OpenAI endpoint with an explicit two-model catalog" },
{ "name": "llm-ratelimit", "enabled": true, "notes": "project quotas plus per-model request and token limits" },
{
"name": "llm-cost",
"enabled": true,
"notes": "per-request accounting using the OpenAI rate card below",
"config": {
"rate_card_version": "openai-standard-2026-06-07",
"rate_units": [{ "provider": "openai", "unit": "usd" }],
"rates": [
{ "provider": "openai", "model": "gpt-5.4-mini", "input": 0.75, "output": 4.5 },
{ "provider": "openai", "model": "gpt-5.4", "input": 2.5, "output": 15 }
],
"cached_rates": [
{ "provider": "openai", "model": "gpt-5.4-mini", "cache_read": 0.075 },
{ "provider": "openai", "model": "gpt-5.4", "cache_read": 0.25 }
]
}
}
],
"configured_providers": ["openai"],
"gateway": {
"upstreams": [
{ "name": "openai_api", "server": "api.openai.com:443", "ssl_name": "api.openai.com", "keepalive": 32 }
],
"credentials": [
{ "provider": "openai", "api_key": "env:OPENAI_API_KEY" }
],
"routes": [
{ "location": "/v1/chat/completions", "provider": "openai", "dialect": "openai", "upstream": "openai_api", "auth_fail_closed": true }
]
}
},
"organizations": [
{
"organization_id": "10000000-0000-0000-0000-000000000001",
"organization_slug": "acme",
"organization_name": "Acme AI Team",
"environment_id": "acme-prod",
"status": "active",
"runtime": {},
"quotas": [
{ "scope_type": "project", "project_id": "10000000-0000-0000-0001-000000000001", "monthly_spend_limit": 1200, "monthly_spend_unit": "usd", "enforcement_mode": "enforce", "notes": "monthly project spend budget for the shared team project" },
{ "scope_type": "project", "project_id": "10000000-0000-0000-0001-000000000001", "rpm_limit": 260, "tpm_limit": 120000, "enforcement_mode": "enforce", "model_allowlist": ["gpt-5.4-mini"], "notes": "default model budget for the shared team project" },
{ "scope_type": "project", "project_id": "10000000-0000-0000-0001-000000000001", "rpm_limit": 40, "tpm_limit": 60000, "enforcement_mode": "enforce", "model_allowlist": ["gpt-5.4"], "notes": "high-capability model budget for the shared team project" }
],
"projects": [
{
"project_id": "10000000-0000-0000-0001-000000000001",
"project_slug": "team-ai",
"project_name": "Team AI",
"status": "active",
"clients": [
{ "client_id": "10000000-0000-0001-0001-000000000001", "client_name": "Team AI Gateway Key", "client_type": "gateway_key", "status": "active", "api_keys": [{ "key_id": "da-sk-acme-team", "tier": "basic", "status": "active" }] }
]
}
]
}
]
}
Rendered nginx.conf
# generated by nginz provisioning
# manifest_id: acme-prod
# deployment_id: acme-prod-llm-gateway-001
# environment_id: acme-prod
# organizations: 1
# generated upstreams
upstream openai_api {
server api.openai.com:443;
keepalive 32;
}
# generated globals
llm_metrics_zone metrics 1m;
llm_cost_backend postgres;
llm_cost_dsn "host=nginz-db port=5432 dbname=darkanchor user=postgres password=changeme";
llm_cost_table llm_cost_events;
llm_cost_rate_card_version openai-standard-2026-06-07;
llm_cost_rate_unit openai usd;
llm_cost_rate openai gpt-5.4-mini 0.75 4.5;
llm_cost_rate openai gpt-5.4 2.5 15;
llm_cost_cached_rate openai gpt-5.4-mini 0.075;
llm_cost_cached_rate openai gpt-5.4 0.25;
# generated client auth base
map $http_authorization $da_gateway_bearer_token {
"~^Bearer[ \t]+(.+)$" $1;
default "";
}
map $da_gateway_bearer_token $da_gateway_credential {
default $da_gateway_bearer_token;
"" $http_x_api_key;
}
map $da_gateway_credential $da_client_auth_status {
default invalid;
include /runtime/generated/issuer/status.map;
}
map $da_gateway_credential $da_client_auth_key_id {
default "";
include /runtime/generated/issuer/key-id.map;
}
map $da_gateway_credential $da_client_auth_org_slug {
default "";
include /runtime/generated/issuer/org-slug.map;
}
map $da_gateway_credential $da_client_auth_project_slug {
default "";
include /runtime/generated/issuer/project-slug.map;
}
map $da_gateway_credential $da_client_auth_client_id {
default "";
include /runtime/generated/issuer/client-id.map;
}
map $da_gateway_credential $da_client_auth_tier {
default "";
include /runtime/generated/issuer/tier.map;
}
# generated manifest-driven gateway servers
log_format manifest_gateway_json escape=json
'{'
'"time":"$time_iso8601",'
'"org":"$org_id",'
'"project":"$project_id",'
'"client":"$client_id",'
'"gateway_key_id":"$gateway_key_id",'
'"provider":"$llm_effective_provider",'
'"model":"$llm_effective_model",'
'"rate_limit_reason":"$llm_ratelimit_deny_reason",'
'"request_remaining":"$llm_ratelimit_quota_remaining",'
'"token_remaining":"$llm_ratelimit_token_quota_remaining",'
'"input_cost":"$llm_cost_prompt",'
'"output_cost":"$llm_cost_completion",'
'"total_cost":"$llm_cost_total",'
'"cost_status":"$llm_cost_status",'
'"status":"$status",'
'"request_time":"$request_time"'
'}';
access_log /var/log/nginx/manifest-gateway.log manifest_gateway_json;
server {
listen 80;
server_name team-ai.gateway.internal;
location /v1/chat/completions {
if ($da_client_auth_status = invalid) {
return 401;
}
if ($da_client_auth_status = suspended) {
return 403;
}
if ($da_client_auth_status = revoked) {
return 403;
}
if ($da_client_auth_project_slug != "team-ai") {
return 403;
}
if ($da_client_auth_org_slug != "acme") {
return 403;
}
set $tenant_id $da_client_auth_project_slug;
set $project_id $da_client_auth_project_slug;
set $org_id $da_client_auth_org_slug;
set $client_id $da_client_auth_client_id;
set $gateway_key_id $da_client_auth_key_id;
llm_proxy;
llm_proxy_route openai openai_api;
llm_proxy_model_pattern gpt-5.4-mini openai;
llm_proxy_model_pattern gpt-5.4 openai;
llm_proxy_max_body_size 64k;
llm_proxy_inject_usage on;
llm_auth;
llm_auth_provider openai;
llm_auth_credential openai env:OPENAI_API_KEY;
llm_auth_org $org_id;
llm_auth_project $project_id;
llm_auth_fail_closed on;
llm_ratelimit;
llm_ratelimit_key $tenant_id;
llm_ratelimit_requests_per_minute 300;
llm_ratelimit_tokens_per_minute 180000;
llm_ratelimit_burst_requests 60;
llm_ratelimit_reserve_tokens 2000;
llm_ratelimit_model_rpm gpt-5.4-mini 260;
llm_ratelimit_model_tpm gpt-5.4-mini 120000;
llm_ratelimit_model_rpm gpt-5.4 40;
llm_ratelimit_model_tpm gpt-5.4 60000;
llm_ratelimit_spend_scope project usd $org_id $project_id 1200;
llm_ratelimit_model_basis effective;
llm_ratelimit_fail_open off;
llm_metrics;
llm_metrics_emit_usage on;
llm_metrics_label_model on;
llm_cost;
llm_cost_identity $gateway_key_id;
llm_cost_org $org_id;
llm_cost_project $project_id;
llm_cost_client $client_id;
llm_cost_team $project_id;
llm_cost_auth_fingerprint $llm_auth_key_fingerprint;
proxy_ssl_server_name on;
proxy_ssl_name api.openai.com;
proxy_pass https://openai_api;
proxy_set_header Host api.openai.com;
proxy_buffering off;
}
}