llm-metrics

Use this module when you need provider-level visibility into LLM traffic flowing through the gateway. It consumes normalized facts from llm-proxy and exposes per-provider, per-model, and per-tenant counters — without reparsing request or response bodies.

When to use this module

You need to answer operational questions: which provider is carrying the load, which models are slow, and what error rate is provider-specific versus gateway-specific.
You want Prometheus-compatible metrics exported directly from nginx without a sidecar or log scraper.
You need to measure how often usage data is missing (making cost data incomplete).
You want latency histograms broken down by provider.
You need per-tenant aggregation for chargeback or capacity planning.
You want to track translation and replacement rates across native and cross-dialect traffic.
You need auth-resolution health metrics (how often credential resolution fails).

nginx.conf synthesis

Basic metrics export with provider and model labels.

llm_metrics_zone metrics_zone 10m;

server {
    listen 18080;

    location /metrics {
        llm_metrics;
        llm_metrics_export prometheus;
    }

    location /v1 {
        llm_proxy;
        llm_proxy_route openai    openai_upstream;
        llm_proxy_route anthropic anthropic_upstream anthropic;
        llm_proxy_default_provider openai;

        llm_metrics;
        llm_metrics_label_model on;
        llm_metrics_emit_usage on;

        proxy_pass https://$llm_provider_upstream;
    }
}

Full production configuration with auth-status labels, tenant aggregation, and resolution-outcome tracking.

llm_metrics_zone metrics_zone 10m;

server {
    listen 18080;

    location /metrics {
        llm_metrics;
        llm_metrics_export prometheus;
    }

    location /v1 {
        llm_proxy;
        llm_proxy_route openai    openai_upstream;
        llm_proxy_route anthropic anthropic_upstream anthropic;
        llm_proxy_default_provider openai;

        llm_auth;
        llm_auth_credential openai    env:OPENAI_KEY;
        llm_auth_credential anthropic env:ANTHROPIC_KEY;
        llm_auth_tenant $http_x_tenant_id;
        llm_auth_fail_closed on;

        llm_metrics;
        llm_metrics_label_model on;
        llm_metrics_label_auth_status on;
        llm_metrics_label_resolution_outcome on;
        llm_metrics_label_tenant on;
        llm_metrics_tenant_source $http_x_tenant_id;
        llm_metrics_emit_usage on;

        proxy_pass https://$llm_provider_upstream;
    }
}

Directive reference

Core directives

Directive	Contexts	Default	Description
`llm_metrics`	`location`	—	Enable the module for this location.
`llm_metrics_zone`	`http`	`llm_metrics 10m`	Shared-memory backing for counters and histograms. Args: `<name> <size>`. Size accepts `k`/`K`/`m`/`M` suffixes. When unset, the module creates a default `llm_metrics` zone with the built-in default size.
`llm_metrics_export`	`location`	—	Select export mode. Currently only `prometheus` is supported. The content handler is registered on the configured location. Export is safe under subrequests: `HEAD` requests suppress body, subrequests return 403.

Label directives

Directive	Contexts	Default	Description
`llm_metrics_label_model`	`location`	`off`	Emit bounded model-labeled counter families. Uses a fixed-capacity shared-memory table; unknown or oversized model keys go to the `_overflow` bucket.
`llm_metrics_label_auth_status`	`location`	`off`	Emit bounded auth-status counter families when `llm-auth` status is available.
`llm_metrics_label_resolution_outcome`	`location`	`off`	Emit resolution-outcome counter family with 6 fixed buckets: `as_requested`, `replaced_by_policy`, `fallback_after_failure`, `rejected_out_of_scope`, `rejected_unresolvable`, `other`.
`llm_metrics_label_tenant`	`location`	`off`	Emit bounded per-tenant request and error counters. Uses a fixed-capacity 32-entry table with `_overflow` bucket.
`llm_metrics_tenant_source`	`location`	—	nginx variable that provides the tenant identity string. Required when `llm_metrics_label_tenant on`.
`llm_metrics_emit_usage`	`location`	`off`	Emit token counters (`prompt_tokens`, `completion_tokens`, `total_tokens`) only when usage was successfully extracted.

Exported metrics

Base counters (always emitted)

Metric	Labels	Description
`llm_requests_total`	`provider`, `streaming`	Total requests by provider and streaming mode.
`llm_requests_parse_fallback_total`	—	Requests that could not be parsed and fell back to default routing.
`llm_requests_error_provider_total`	`provider`	Semantic provider errors (4xx).
`llm_requests_error_gateway_total`	—	Gateway-level errors.
`llm_requests_usage_missing_total`	`provider`	Responses where usage was not extracted.
`llm_requests_translation_total`	`provider`	Requests translated across dialects.
`llm_requests_replacement_total`	`provider`	Requests where the provider was replaced by policy.

Latency histogram

Metric	Labels	Description
`llm_request_duration_seconds`	`provider`	5-bucket histogram: `<100ms`, `<500ms`, `<2s`, `<10s`, `≥10s`.

Token counters (when `llm_metrics_emit_usage on`)

Metric	Labels	Description
`llm_prompt_tokens_total`	`provider`	Total prompt tokens consumed.
`llm_completion_tokens_total`	`provider`	Total completion tokens consumed.

Opt-in label families

Metric	Labels	Description
`llm_requests_model_total`	`provider`, `model`	Requests by provider and model. Requires `llm_metrics_label_model on`.
`llm_requests_error_model_total`	`provider`, `model`	Errors by provider and model. Requires `llm_metrics_label_model on`.
`llm_requests_auth_status_total`	`provider`, `auth_status`	Requests by provider and auth resolution status. Requires `llm_metrics_label_auth_status on`.
`llm_requests_error_auth_status_total`	`provider`, `auth_status`	Errors by provider and auth status. Requires `llm_metrics_label_auth_status on`.
`llm_requests_resolution_outcome_total`	`outcome`	Requests by resolution outcome. Requires `llm_metrics_label_resolution_outcome on`.
`llm_requests_tenant_total`	`tenant`	Requests by tenant. Requires `llm_metrics_label_tenant on`.
`llm_requests_error_provider_tenant_total`	`tenant`	Provider errors by tenant. Requires `llm_metrics_label_tenant on`.
`llm_requests_error_gateway_tenant_total`	`tenant`	Gateway errors by tenant. Requires `llm_metrics_label_tenant on`.

Behavior notes

All counter increments happen in the LOG phase after the response is complete.
Provider labels are bounded to openai, anthropic, other, and total.
Model labels are lowercased and use a fixed-capacity table. Oversized keys (>63 chars) and table-overflow keys go to _overflow.
Tenant labels use a 32-entry fixed-capacity table. Oversized keys (>63 chars) go to _overflow.
Per-worker counter ownership means provider/auth/outcome increments are lock-free. Only model/tenant table inserts take the slab mutex and only on first encounter of a new label value.
Hot reload preserves counters only when the existing LlmMetricsStore layout matches the new binary.
The export handler detects buffer overflow: if the Prometheus text output exceeds the export buffer, it returns 503 rather than silently truncating.
Metric cardinality is bounded by configuration. Enable opt-in labels only when needed.