llm-fallback
Use this module when the gateway must survive provider outages, transient 429s, or selectively retry transport failures without forcing every application to build its own provider-ordered retry tree.
When to use this module
- You need ordered provider failover: when the primary provider fails, retry to a configured secondary.
- You want fine-grained control over which failure classes are retryable:
connect_error, transport_timeout, rate_limited, upstream_5xx.
- You need to suppress fallback for streaming requests to avoid duplicating partial output to the client.
- You want pre-send provider replacement (not just post-failure fallback) via
llm_fallback_replace.
- You need translation-aware replay policy: forbid or discourage fallback paths that would require cross-dialect translation.
- You want per-route model overrides so a fallback to a different provider can also switch models.
- You need observable fallback outcomes in response headers:
X-Fallback-Attempted, X-Fallback-Primary, X-Fallback-Effective.
nginx.conf synthesis
Basic ordered failover with retryable failure classes and streaming suppression.
upstream openai_upstream { server api.openai.com:443; }
upstream anthropic_upstream { server api.anthropic.com:443; }
location /v1 {
llm_proxy;
llm_proxy_route openai openai_upstream;
llm_proxy_route anthropic anthropic_upstream anthropic;
llm_proxy_default_provider openai;
llm_fallback;
llm_fallback_mode basic;
llm_fallback_route openai anthropic;
llm_fallback_on connect_error transport_timeout upstream_5xx;
llm_fallback_max_attempts 2;
llm_fallback_allow_streaming off;
# nginx must be configured to retry on matching failures
proxy_next_upstream error timeout http_500 http_502 http_503;
proxy_next_upstream_tries 2;
proxy_pass https://$llm_provider_upstream;
}
Production configuration with pre-send replacement, model-override fallback, and translation-aware replay policy.
upstream openai_upstream { server api.openai.com:443; }
upstream anthropic_upstream { server api.anthropic.com:443; }
upstream azure_upstream { server azure-openai.example.com:443; }
upstream bedrock_upstream { server bedrock-runtime.amazonaws.com:443; }
location /v1 {
llm_proxy;
llm_proxy_route openai openai_upstream;
llm_proxy_route anthropic anthropic_upstream anthropic;
llm_proxy_route azure azure_upstream openai; # explicit dialect
llm_proxy_route bedrock bedrock_upstream anthropic; # explicit dialect
llm_proxy_default_provider openai;
llm_fallback;
llm_fallback_mode basic;
# Pre-send replacement: openai → azure before first upstream send
llm_fallback_replace openai azure;
# Post-failure fallback: azure → anthropic, azure → bedrock with model override
llm_fallback_route azure anthropic;
llm_fallback_route azure bedrock claude-sonnet-4;
llm_fallback_on connect_error transport_timeout upstream_5xx;
llm_fallback_max_attempts 3;
llm_fallback_allow_streaming off;
# Translation-aware replay: forbid cross-dialect fallback
llm_fallback_translation_fallback forbid;
proxy_next_upstream error timeout http_500 http_502 http_503;
proxy_next_upstream_tries 3;
proxy_pass https://$llm_provider_upstream;
}
Directive reference
Core directives
| Directive | Contexts | Default | Description |
|---|
llm_fallback | location | — | Enable fallback policy for this location. |
llm_fallback_mode | location | unset | Enforcement mode. Currently only basic is accepted; advanced is rejected by the parser. |
Route directives
| Directive | Contexts | Default | Description |
|---|
llm_fallback_route | location | — | Ordered failover edge from <primary> to <secondary>. Optional third argument <model> overrides the effective model after failover. Repeatable, max 16 routes. Duplicate primaries and cyclic graphs are rejected at startup. |
llm_fallback_replace | location | — | Pre-send replacement rule. When configured, llm-proxy replaces the primary provider with the replacement before the first upstream send — this is policy-driven substitution, not failure-driven retry. |
Policy directives
| Directive | Contexts | Default | Description |
|---|
llm_fallback_on | location | — | Retryable failure class bitmask. Accepts space-separated tokens: connect_error, transport_timeout, rate_limited, upstream_5xx. |
llm_fallback_max_attempts | location | unset | Upper bound on total attempts, including the primary. When set, llm-proxy syncs nginx’s effective proxy_next_upstream_tries. |
llm_fallback_allow_streaming | location | off | Whether fallback is allowed for streaming requests. When off, streaming fallback is suppressed and X-Fallback-Suppressed-Reason: streaming_not_allowed is set. |
llm_fallback_translation_fallback | location | allow | Cross-dialect replay policy. allow: cross-dialect retry permitted. discourage: marks the retry as policy_mismatch but does not suppress. forbid: marks the retry as policy_mismatch with X-Fallback-Suppressed-Reason: translation_forbidden. |
Exported variables
| Variable | Description |
|---|
$llm_fallback_attempted | 0/1 — whether fallback was attempted. |
$llm_fallback_suppressed | 0/1 — whether fallback was suppressed. |
$llm_fallback_suppressed_reason | Why fallback was suppressed (e.g., streaming_not_allowed, translation_forbidden). |
$llm_fallback_reason | Failure class: none, connect_error, transport_timeout, rate_limited, upstream_5xx. |
$llm_fallback_attempt_count | Number of upstream attempts, including the primary. |
$llm_fallback_policy_allowed | 0/1 — whether the configured policy would allow the inferred first-hop failure reason. |
$llm_fallback_policy_mismatch | 0/1 — whether nginx retried against the configured policy. |
$llm_fallback_primary_provider | Provider selected before retry analysis. |
$llm_fallback_effective_provider | Provider that actually served the final response. |
These headers are stamped by llm-proxy on responses from llm_fallback-enabled locations:
| Header | Description |
|---|
X-Fallback-Attempted | 1 when retry occurred. |
X-Fallback-Suppressed | 1 when retry was suppressed. |
X-Fallback-Attempt-Count | Number of upstream attempts nginx recorded. |
X-Fallback-Primary | First-hop provider name. |
X-Fallback-Effective | Provider that served the final response. |
X-Fallback-Suppressed-Reason | Why retry was suppressed. |
X-Fallback-Reason | Failure classification string. |
X-Fallback-Policy-Allowed | Whether the configured fallback policy allowed the inferred reason. |
X-Fallback-Policy-Mismatch | 1 when nginx retried against policy. |
Behavior notes
llm-fallback is intentionally thin as a standalone runtime module. Fallback policy lives here; actual replay execution happens inside llm-proxy via nginx’s proxy_next_upstream path.
- Pre-send replacement (
llm_fallback_replace) fires in llm-proxy’s body handler before auth preparation, translation, and upstream send. It sets $llm_resolution_outcome = replaced_by_policy and replacement_happened = 1. It does NOT set fallback_attempted.
- Post-failure fallback is detected in
llm-proxy’s header filter via r->upstream_states.nelts > 1.
- When a model override is configured on a fallback route,
effective_model is updated to the override model after a successful retry.
- The
llm_fallback_translation_fallback forbid value marks a cross-dialect retry as policy_mismatch after the fact, but cannot prevent nginx’s proxy_next_upstream from firing. This is an architectural constraint of nginx’s phase model.
- Replacement and fallback are distinct:
replacement_happened=1, fallback_attempted=0 for pre-send substitution; fallback_attempted=1, replacement_happened=0 for failure-driven retry.
- Route graphs are validated at startup: duplicate primary providers and cyclic graphs are rejected.
llm_fallback_max_attempts syncs nginx’s effective proxy_next_upstream_tries before upstream execution.
llm_fallback_on inheritance is replace-not-merge. Child locations must restate the full desired failure-class set when narrowing parent policy.
- Child locations with any local
llm_fallback_route entries do not inherit parent routes. This is replace-not-merge semantics.