Rollout Guides

Rollout guides demonstrate how to perform incremental deployment operations that gradually introduce new versions of your inference infrastructure with minimal service disruption.

Overview

These guides cover rollout strategies for LLM inference deployments, helping you choose the right approach based on your requirements.

Rollout Strategies

Rolling Update

A Rolling Update is the standard Kubernetes deployment strategy that updates pods gradually within a single InferencePool. This approach works in both standalone and llm-d router gateway modes.

How it works:

Updates pods incrementally (e.g., 25% at a time)
Old pods continue serving traffic until new pods are healthy
Built into Kubernetes Deployments

Use Rolling Updates for:

General, non-critical updates where strict traffic percentages do not matter
Scenarios where you want to conserve compute resources
Development and staging environments

Learn more: Kubernetes Rolling Update Tutorial

Blue-Green Update (HTTPRoute Traffic Splitting)

A Blue-Green Update creates a second complete InferencePool and uses HTTPRoute to control traffic distribution between the old (blue) and new (green) versions. This strategy requires llm-d router gateway mode.

How it works:

Deploy a complete new InferencePool alongside the existing one
Use HTTPRoute to gradually shift traffic (e.g., 1% → 5% → 10% → 50% → 100%)
Instant rollback by adjusting HTTPRoute weights

Use Blue-Green Updates for:

Critical, high-risk production deployments that require gradual canary rollouts
Scenarios requiring fast rollbacks
Header-based routing (e.g., routing beta users to new version)
Updates that need precise traffic control

Guide: Blue-Green Update

Comparison:

Feature	Rolling Update	Blue-Green Update
Routing Control	Random/Even across all healthy pods	Precise Percentage (e.g., exactly 1% or 10%)
Blast Radius	High (All users exposed randomly)	Low (Isolated to specified target weight)
Rollback Speed	Slow (Requires creating new pods in reverse)	Instant (Flip HTTPRoute weight back to 0)
Resource Costs	Low (Only temporary surge of pods)	High (Requires running two full environments)
Version Coexistence	Simultaneously active inside one Service	Strictly separated across two distinct Services
Deployment Mode	Standalone and Gateway	Gateway only

Note: Capacity management may also play a role in choosing between these strategies.

LoRA Adapter Rollout

LoRA (Low-Rank Adaptation) adapter rollouts allow you to update model customizations without changing the base model or infrastructure. This works in both standalone and llm-d router gateway modes.

How it works:

Use InferenceModelRewrite to map model names to specific adapter versions
Gradually shift traffic between adapter versions
No infrastructure changes required

Use LoRA Adapter Rollouts when:

You need to deploy new versions of LoRA adapters without disrupting service
You want to test adapter changes with a subset of traffic
You need to maintain multiple adapter versions simultaneously

Guide: LoRA Adapter Rollout

General Rollout Pattern

All rollout guides follow a similar pattern:

Deploy new infrastructure - Create the new version alongside the existing one
Configure traffic splitting - Gradually shift traffic to the new version (e.g., 10% → 50% → 100%)
Monitor and validate - Verify the new version performs correctly at each stage
Complete rollout - Direct 100% of traffic to the new version
Clean up - Remove the old version once the new version is stable

Prerequisites

Before following these guides, ensure you have:

A working llm-d deployment (see getting started guide)
Access to kubectl and the Kubernetes cluster
Understanding of Kubernetes Gateway API concepts (for gateway mode)
Familiarity with your model serving infrastructure (vLLM, etc.)

Overview​

Rollout Strategies​

Rolling Update​

Blue-Green Update (HTTPRoute Traffic Splitting)​

LoRA Adapter Rollout​

General Rollout Pattern​

Prerequisites​

Overview

Rollout Strategies

Rolling Update

Blue-Green Update (HTTPRoute Traffic Splitting)

LoRA Adapter Rollout

General Rollout Pattern

Prerequisites