Nvidia MIG with GPU Optimization in Kubernetes

Published in

vmacwrites

4 min readDec 7, 2024

Multi-Instance GPU (MIG) is a technology that allows partitioning of a single GPU into multiple smaller, isolated GPU instances. This feature enables better GPU utilization, especially for workloads that don’t fully saturate the GPU’s compute capacity.

NVIDIA MIG (Multi-Instance GPU) is a technology introduced by NVIDIA to partition their GPUs (particularly the A100 and A30 models) into smaller, isolated instances. This allows multiple workloads or containers to run on a single physical GPU, enabling more efficient resource utilization and optimizing GPU usage, especially in multi-tenant environments.

With Kubernetes, the integration of MIG provides dynamic and flexible GPU resource allocation, optimizing GPU resources at a granular level. Kubernetes supports resource management through the nvidia.com/gpu resource plugin, which is extended to handle the MIG-enabled GPUs using specific operators and controllers.

GPU Optimization Techniques

Single Strategy

In this approach, all MIG devices on a GPU are created with the same size. For example, on a P4d.24XL instance, you could create 56 slices of 1g.5gb or 24 slices of 2g.10gb.

Mixed Strategy

This technique allows for creating MIG devices of different sizes on the same GPU. For instance, you could partition a GPU into two 1g.10gb units, one 2g.20gb unit, and one 3g.40gb unit.

Kubernetes Operators for MIG

Two main operators facilitate MIG management in Kubernetes:

NVIDIA GPU Operator: This operator automates the creation of MIG partitions based on node labels. It uses a ConfigMap to define MIG profiles and applies them to labeled nodes.
k8s-mig-operator: This is a Python-based operator that provides more granular control over MIG configurations. It allows for custom MIG profiles and supports toggling MIG on/off for individual GPUs.

Key Concepts in NVIDIA MIG for Kubernetes

GPU Partitioning:

MIG allows a single GPU to be split into several smaller, independent instances (each with its own memory, compute cores, and other resources).
Each instance of MIG can be treated as a separate, discrete resource, enabling better isolation and multi-tenant support, similar to how virtual machines or containers work.

Dynamic Resource Allocation

With dynamic resource allocation, Kubernetes can allocate and deallocate GPU resources as needed. This provides flexibility in managing GPU workloads and ensures that resources are efficiently distributed among workloads based on real-time demand.

The MIG feature enables a more fine-grained allocation of GPU resources, allowing each container or pod to use a portion of the GPU instead of the whole GPU. This is especially beneficial in environments where GPU resources are a limited and valuable commodity.

Kubernetes can dynamically allocate MIG devices to pods using the resources section in pod specifications. For example:

resources:
  limits:
    nvidia.com/mig-1g.5gb: 1

This requests one 1g.5gb MIG device for the pod.

Benefits of MIG in Kubernetes

Improved GPU Utilization: MIG allows multiple pods to share a single GPU, increasing overall utilization.
Isolation: Each MIG instance is hardware-isolated, ensuring performance predictability.
Flexibility: Administrators can create various MIG profiles to match different workload requirements.
Cost Efficiency: By running more pods per GPU, organizations can optimize their hardware investments.

MIG technology, combined with Kubernetes operators, provides a powerful toolset for maximizing GPU resources in containerized environments, especially for AI and high-performance computing workloads.

Key Operators for Managing NVIDIA MIG in Kubernetes

To enable and manage NVIDIA MIG within Kubernetes clusters, several operators and tools are used. These operators automate the deployment, monitoring, and scaling of GPU resources (including MIG partitions).

NVIDIA GPU Operator:

The NVIDIA GPU Operator is a Kubernetes operator that simplifies the management of NVIDIA GPUs in Kubernetes clusters. It automates the deployment of NVIDIA drivers, runtime, monitoring tools, and device plugins.
The operator supports the configuration of MIG-enabled GPUs, dynamically allocating GPU instances to Kubernetes workloads.
It ensures that all necessary components (like device plugins) are installed and configured properly to expose MIG partitions for use by workloads.

2. NVIDIA Device Plugin for Kubernetes:

The NVIDIA Device Plugin is a Kubernetes device plugin that allows pods to request GPUs. When MIG is enabled, this plugin can be extended to manage MIG instances.
It facilitates the allocation of MIG instances to workloads and ensures that workloads are scheduled based on available GPU instances.

3. NVIDIA DCGM Exporter:

The NVIDIA Data Center GPU Manager (DCGM) Exporter is used to expose GPU metrics for monitoring within Kubernetes. It helps track GPU health, performance, and utilization, including MIG instances, which is crucial for dynamically adjusting resource allocation.

Conclusion

NVIDIA MIG, combined with Kubernetes, offers a powerful mechanism to optimize GPU usage in multi-tenant, resource-constrained environments. The ability to partition GPUs into smaller, isolated instances allows for better resource utilization, fault isolation, and flexibility in GPU scheduling. By leveraging dynamic resource allocation, Kubernetes can adjust to workloads in real-time, ensuring optimal performance and efficient scaling of GPU resources. The NVIDIA GPU Operator, along with the NVIDIA Device Plugin, helps automate and manage this complex ecosystem, making it easier for users to utilize MIG-enabled GPUs effectively in Kubernetes clusters.

Resources: