Skip to main content

API Reference

Core Kubernetes APIs

The following Kubernetes APIs are defined in the inference.networking.k8s.io (v1) and llm-d.ai (v1alpha2) groups.

ResourceAPI GroupVersionDescription
InferencePoolinference.networking.k8s.iov1Defines a pool of inference endpoints (model servers) to configure the Endpoint Picker (EPP) and Gateways for inference-optimized routing.
InferenceObjectivellm-d.aiv1alpha2Defines performance goals (priority, latency) for specific model workloads within a pool.
InferenceModelRewritellm-d.aiv1alpha2Specifies rules for rewriting model names in request bodies, enabling traffic splitting and canary rollouts.

Component Configuration

These schemas define the internal configuration for project components and are typically provided via ConfigMaps or local files, rather than as standalone Kubernetes objects.

SchemaAPI GroupVersionDescription
EndpointPickerConfigllm-d.aiv1alpha1Defines the internal configuration for the Endpoint Picker (EPP), including plugins and request scheduling profiles.

Recognized HTTP Headers

  • EPP HTTP Headers Reference: The EPP inspects specific HTTP headers to manage flow control and observability for inference requests.

Supported Request APIs

See Also

  • Glossary: Definitions of key terms and concepts used across this project.