llm-fallback

Use this module when the gateway must survive provider outages, transient 429s, or selectively retry transport failures without forcing every application to build its own provider-ordered retry tree.

When to use this module

  • You need ordered provider failover: when the primary provider fails, retry to a configured secondary.
  • You want fine-grained control over which failure classes are retryable: connect_error, transport_timeout, rate_limited, upstream_5xx.
  • You need to suppress fallback for streaming requests to avoid duplicating partial output to the client.
  • You want pre-send provider replacement (not just post-failure fallback) via llm_fallback_replace.
  • You need translation-aware replay policy: forbid or discourage fallback paths that would require cross-dialect translation.
  • You want per-route model overrides so a fallback to a different provider can also switch models.
  • You need observable fallback outcomes in response headers: X-Fallback-Attempted, X-Fallback-Primary, X-Fallback-Effective.

nginx.conf synthesis

Basic ordered failover with retryable failure classes and streaming suppression.

upstream openai_upstream   { server api.openai.com:443; }
upstream anthropic_upstream { server api.anthropic.com:443; }

location /v1 {
    llm_proxy;
    llm_proxy_route openai    openai_upstream;
    llm_proxy_route anthropic anthropic_upstream anthropic;
    llm_proxy_default_provider openai;

    llm_fallback;
    llm_fallback_mode basic;
    llm_fallback_route openai anthropic;
    llm_fallback_on connect_error transport_timeout upstream_5xx;
    llm_fallback_max_attempts 2;
    llm_fallback_allow_streaming off;

    # nginx must be configured to retry on matching failures
    proxy_next_upstream error timeout http_500 http_502 http_503;
    proxy_next_upstream_tries 2;

    proxy_pass https://$llm_provider_upstream;
}

Production configuration with pre-send replacement, model-override fallback, and translation-aware replay policy.

upstream openai_upstream    { server api.openai.com:443; }
upstream anthropic_upstream  { server api.anthropic.com:443; }
upstream azure_upstream      { server azure-openai.example.com:443; }
upstream bedrock_upstream    { server bedrock-runtime.amazonaws.com:443; }

location /v1 {
    llm_proxy;
    llm_proxy_route openai    openai_upstream;
    llm_proxy_route anthropic anthropic_upstream anthropic;
    llm_proxy_route azure     azure_upstream  openai;    # explicit dialect
    llm_proxy_route bedrock   bedrock_upstream anthropic; # explicit dialect
    llm_proxy_default_provider openai;

    llm_fallback;
    llm_fallback_mode basic;

    # Pre-send replacement: openai → azure before first upstream send
    llm_fallback_replace openai azure;

    # Post-failure fallback: azure → anthropic, azure → bedrock with model override
    llm_fallback_route azure anthropic;
    llm_fallback_route azure bedrock claude-sonnet-4;

    llm_fallback_on connect_error transport_timeout upstream_5xx;
    llm_fallback_max_attempts 3;
    llm_fallback_allow_streaming off;

    # Translation-aware replay: forbid cross-dialect fallback
    llm_fallback_translation_fallback forbid;

    proxy_next_upstream error timeout http_500 http_502 http_503;
    proxy_next_upstream_tries 3;

    proxy_pass https://$llm_provider_upstream;
}

Directive reference

Core directives

DirectiveContextsDefaultDescription
llm_fallbacklocationEnable fallback policy for this location.
llm_fallback_modelocationunsetEnforcement mode. Currently only basic is accepted; advanced is rejected by the parser.

Route directives

DirectiveContextsDefaultDescription
llm_fallback_routelocationOrdered failover edge from <primary> to <secondary>. Optional third argument <model> overrides the effective model after failover. Repeatable, max 16 routes. Duplicate primaries and cyclic graphs are rejected at startup.
llm_fallback_replacelocationPre-send replacement rule. When configured, llm-proxy replaces the primary provider with the replacement before the first upstream send — this is policy-driven substitution, not failure-driven retry.

Policy directives

DirectiveContextsDefaultDescription
llm_fallback_onlocationRetryable failure class bitmask. Accepts space-separated tokens: connect_error, transport_timeout, rate_limited, upstream_5xx.
llm_fallback_max_attemptslocationunsetUpper bound on total attempts, including the primary. When set, llm-proxy syncs nginx’s effective proxy_next_upstream_tries.
llm_fallback_allow_streaminglocationoffWhether fallback is allowed for streaming requests. When off, streaming fallback is suppressed and X-Fallback-Suppressed-Reason: streaming_not_allowed is set.
llm_fallback_translation_fallbacklocationallowCross-dialect replay policy. allow: cross-dialect retry permitted. discourage: marks the retry as policy_mismatch but does not suppress. forbid: marks the retry as policy_mismatch with X-Fallback-Suppressed-Reason: translation_forbidden.

Exported variables

VariableDescription
$llm_fallback_attempted0/1 — whether fallback was attempted.
$llm_fallback_suppressed0/1 — whether fallback was suppressed.
$llm_fallback_suppressed_reasonWhy fallback was suppressed (e.g., streaming_not_allowed, translation_forbidden).
$llm_fallback_reasonFailure class: none, connect_error, transport_timeout, rate_limited, upstream_5xx.
$llm_fallback_attempt_countNumber of upstream attempts, including the primary.
$llm_fallback_policy_allowed0/1 — whether the configured policy would allow the inferred first-hop failure reason.
$llm_fallback_policy_mismatch0/1 — whether nginx retried against the configured policy.
$llm_fallback_primary_providerProvider selected before retry analysis.
$llm_fallback_effective_providerProvider that actually served the final response.

Response headers

These headers are stamped by llm-proxy on responses from llm_fallback-enabled locations:

HeaderDescription
X-Fallback-Attempted1 when retry occurred.
X-Fallback-Suppressed1 when retry was suppressed.
X-Fallback-Attempt-CountNumber of upstream attempts nginx recorded.
X-Fallback-PrimaryFirst-hop provider name.
X-Fallback-EffectiveProvider that served the final response.
X-Fallback-Suppressed-ReasonWhy retry was suppressed.
X-Fallback-ReasonFailure classification string.
X-Fallback-Policy-AllowedWhether the configured fallback policy allowed the inferred reason.
X-Fallback-Policy-Mismatch1 when nginx retried against policy.

Behavior notes

  • llm-fallback is intentionally thin as a standalone runtime module. Fallback policy lives here; actual replay execution happens inside llm-proxy via nginx’s proxy_next_upstream path.
  • Pre-send replacement (llm_fallback_replace) fires in llm-proxy’s body handler before auth preparation, translation, and upstream send. It sets $llm_resolution_outcome = replaced_by_policy and replacement_happened = 1. It does NOT set fallback_attempted.
  • Post-failure fallback is detected in llm-proxy’s header filter via r->upstream_states.nelts > 1.
  • When a model override is configured on a fallback route, effective_model is updated to the override model after a successful retry.
  • The llm_fallback_translation_fallback forbid value marks a cross-dialect retry as policy_mismatch after the fact, but cannot prevent nginx’s proxy_next_upstream from firing. This is an architectural constraint of nginx’s phase model.
  • Replacement and fallback are distinct: replacement_happened=1, fallback_attempted=0 for pre-send substitution; fallback_attempted=1, replacement_happened=0 for failure-driven retry.
  • Route graphs are validated at startup: duplicate primary providers and cyclic graphs are rejected.
  • llm_fallback_max_attempts syncs nginx’s effective proxy_next_upstream_tries before upstream execution.
  • llm_fallback_on inheritance is replace-not-merge. Child locations must restate the full desired failure-class set when narrowing parent policy.
  • Child locations with any local llm_fallback_route entries do not inherit parent routes. This is replace-not-merge semantics.