Gateway Guides

This directory contains guides for deploying a Kubernetes Gateway as a proxy for the llm-d Router.

note

Before deploying a Gateway provider, install the required CRDs using the CRD installation guide.

note

To have an end-to-end working Gateway configuration, the guides require deploying one of the well-lit paths.

Why do you need a Gateway?

The llm-d Router provides an extension to compatible Gateway providers that optimizes load balancing of LLM traffic across model server replicas.

The integration with a Gateway allows self-hosted models to be exposed in a wide variety of network topologies including:

Internet-facing services
Internal to your cluster
Through a service mesh

and take advantage of key Gateway features like:

Traffic splitting for incremental rollout of new models
TLS encryption of queries and responses

By integrating with a Gateway -- instead of developing an llm-d specific proxy layer -- llm-d can leverage the high performance of mature proxies and take advantage of existing operational tools for managing traffic to services. Compatible Gateway implementations may use proxies like Envoy or other high-performance data planes under the hood.

Overview

The key elements of llm-d's Gateway integration are:

The llm-d Endpoint Picker (EPP) is an external processing service that a Kubernetes Gateway consults to decide which model server a given request should go to
The InferencePool Custom Resource that includes the spec for Kubernetes Gateway controllers to provision an llm-d Endpoint Picker as an inference extension to a Kubernetes Gateway
The Gateway Custom Resources that define the Kubernetes-native Gateway API and how traffic reaches an InferencePool
A compatible Gateway implementation (control plane) that provisions and configures load balancers and endpoint pickers in response to the Gateway API and InferencePool API

After completing these gateway setup steps, you will be able to create InferencePool objects on your cluster and route traffic to them.

note

Setting up a Gateway generally requires cluster administration rights.

Supported Gateway Providers

llm-d requires you select a Gateway implementation that supports the Gateway API Inference Extension. Your infrastructure may provide a default compatible implementation, or you may choose to deploy a gateway implementation onto your cluster.

GKE Gateway - GKE's implementation of the Gateway API is through the GKE Gateway controller which provisions Google Cloud Load Balancers for Pods in GKE clusters. The GKE Gateway controller supports weighted traffic splitting, mirroring, advanced routing, multi-cluster load balancing and more. Official GKE Docs.
Istio - Istio is an open source service mesh and gateway implementation. It provides a fully compliant implementation of the Kubernetes Gateway API for cluster ingress traffic control. Official Istio docs
Agentgateway - Agentgateway is a high-performance, Rust-based AI gateway for LLM, MCP, and A2A workloads that can also serve as a Gateway API and Inference Gateway implementation. Official Agentgateway docs.

Other Providers

For other compatible Gateway implementations not listed above, follow the installation instructions for your selected Gateway provider. Ensure the necessary CRDs for Gateway API and the Gateway API Inference Extension are installed.

Why do you need a Gateway?​

Overview​

Supported Gateway Providers​

Other Providers​

Why do you need a Gateway?

Overview

Supported Gateway Providers

Other Providers