llm-metrics

Use this module when you need provider-level visibility into LLM traffic flowing through the gateway. It consumes normalized facts from llm-proxy and exposes per-provider, per-model, and per-tenant counters — without reparsing request or response bodies.

When to use this module

  • You need to answer operational questions: which provider is carrying the load, which models are slow, and what error rate is provider-specific versus gateway-specific.
  • You want Prometheus-compatible metrics exported directly from nginx without a sidecar or log scraper.
  • You need to measure how often usage data is missing (making cost data incomplete).
  • You want latency histograms broken down by provider.
  • You need per-tenant aggregation for chargeback or capacity planning.
  • You want to track translation and replacement rates across native and cross-dialect traffic.
  • You need auth-resolution health metrics (how often credential resolution fails).

nginx.conf synthesis

Basic metrics export with provider and model labels.

llm_metrics_zone metrics_zone 10m;

server {
    listen 18080;

    location /metrics {
        llm_metrics;
        llm_metrics_export prometheus;
    }

    location /v1 {
        llm_proxy;
        llm_proxy_route openai    openai_upstream;
        llm_proxy_route anthropic anthropic_upstream anthropic;
        llm_proxy_default_provider openai;

        llm_metrics;
        llm_metrics_label_model on;
        llm_metrics_emit_usage on;

        proxy_pass https://$llm_provider_upstream;
    }
}

Full production configuration with auth-status labels, tenant aggregation, and resolution-outcome tracking.

llm_metrics_zone metrics_zone 10m;

server {
    listen 18080;

    location /metrics {
        llm_metrics;
        llm_metrics_export prometheus;
    }

    location /v1 {
        llm_proxy;
        llm_proxy_route openai    openai_upstream;
        llm_proxy_route anthropic anthropic_upstream anthropic;
        llm_proxy_default_provider openai;

        llm_auth;
        llm_auth_credential openai    env:OPENAI_KEY;
        llm_auth_credential anthropic env:ANTHROPIC_KEY;
        llm_auth_tenant $http_x_tenant_id;
        llm_auth_fail_closed on;

        llm_metrics;
        llm_metrics_label_model on;
        llm_metrics_label_auth_status on;
        llm_metrics_label_resolution_outcome on;
        llm_metrics_label_tenant on;
        llm_metrics_tenant_source $http_x_tenant_id;
        llm_metrics_emit_usage on;

        proxy_pass https://$llm_provider_upstream;
    }
}

Directive reference

Core directives

DirectiveContextsDefaultDescription
llm_metricslocationEnable the module for this location.
llm_metrics_zonehttpllm_metrics 10mShared-memory backing for counters and histograms. Args: <name> <size>. Size accepts k/K/m/M suffixes. When unset, the module creates a default llm_metrics zone with the built-in default size.
llm_metrics_exportlocationSelect export mode. Currently only prometheus is supported. The content handler is registered on the configured location. Export is safe under subrequests: HEAD requests suppress body, subrequests return 403.

Label directives

DirectiveContextsDefaultDescription
llm_metrics_label_modellocationoffEmit bounded model-labeled counter families. Uses a fixed-capacity shared-memory table; unknown or oversized model keys go to the _overflow bucket.
llm_metrics_label_auth_statuslocationoffEmit bounded auth-status counter families when llm-auth status is available.
llm_metrics_label_resolution_outcomelocationoffEmit resolution-outcome counter family with 6 fixed buckets: as_requested, replaced_by_policy, fallback_after_failure, rejected_out_of_scope, rejected_unresolvable, other.
llm_metrics_label_tenantlocationoffEmit bounded per-tenant request and error counters. Uses a fixed-capacity 32-entry table with _overflow bucket.
llm_metrics_tenant_sourcelocationnginx variable that provides the tenant identity string. Required when llm_metrics_label_tenant on.
llm_metrics_emit_usagelocationoffEmit token counters (prompt_tokens, completion_tokens, total_tokens) only when usage was successfully extracted.

Exported metrics

Base counters (always emitted)

MetricLabelsDescription
llm_requests_totalprovider, streamingTotal requests by provider and streaming mode.
llm_requests_parse_fallback_totalRequests that could not be parsed and fell back to default routing.
llm_requests_error_provider_totalproviderSemantic provider errors (4xx).
llm_requests_error_gateway_totalGateway-level errors.
llm_requests_usage_missing_totalproviderResponses where usage was not extracted.
llm_requests_translation_totalproviderRequests translated across dialects.
llm_requests_replacement_totalproviderRequests where the provider was replaced by policy.

Latency histogram

MetricLabelsDescription
llm_request_duration_secondsprovider5-bucket histogram: <100ms, <500ms, <2s, <10s, ≥10s.

Token counters (when llm_metrics_emit_usage on)

MetricLabelsDescription
llm_prompt_tokens_totalproviderTotal prompt tokens consumed.
llm_completion_tokens_totalproviderTotal completion tokens consumed.

Opt-in label families

MetricLabelsDescription
llm_requests_model_totalprovider, modelRequests by provider and model. Requires llm_metrics_label_model on.
llm_requests_error_model_totalprovider, modelErrors by provider and model. Requires llm_metrics_label_model on.
llm_requests_auth_status_totalprovider, auth_statusRequests by provider and auth resolution status. Requires llm_metrics_label_auth_status on.
llm_requests_error_auth_status_totalprovider, auth_statusErrors by provider and auth status. Requires llm_metrics_label_auth_status on.
llm_requests_resolution_outcome_totaloutcomeRequests by resolution outcome. Requires llm_metrics_label_resolution_outcome on.
llm_requests_tenant_totaltenantRequests by tenant. Requires llm_metrics_label_tenant on.
llm_requests_error_provider_tenant_totaltenantProvider errors by tenant. Requires llm_metrics_label_tenant on.
llm_requests_error_gateway_tenant_totaltenantGateway errors by tenant. Requires llm_metrics_label_tenant on.

Behavior notes

  • All counter increments happen in the LOG phase after the response is complete.
  • Provider labels are bounded to openai, anthropic, other, and total.
  • Model labels are lowercased and use a fixed-capacity table. Oversized keys (>63 chars) and table-overflow keys go to _overflow.
  • Tenant labels use a 32-entry fixed-capacity table. Oversized keys (>63 chars) go to _overflow.
  • Per-worker counter ownership means provider/auth/outcome increments are lock-free. Only model/tenant table inserts take the slab mutex and only on first encounter of a new label value.
  • Hot reload preserves counters only when the existing LlmMetricsStore layout matches the new binary.
  • The export handler detects buffer overflow: if the Prometheus text output exceeds the export buffer, it returns 503 rather than silently truncating.
  • Metric cardinality is bounded by configuration. Enable opt-in labels only when needed.