Nvidia GPU Operator

GPU Operator Architecture

Kubernetes provides access to special hardware resources such as NVIDIA GPUs, NICs, Infiniband adapters and other devices through the device plugin framework. However, configuring and managing nodes with these hardware resources requires configuration of multiple software components such as drivers, container runtimes or other libraries which are difficult and prone to errors. The NVIDIA GPU Operator uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPU. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Toolkit, automatic node labelling using GFD, DCGM based monitoring and others.

Install

install in MicroK8s

Install on MicroK8s Infer Repo

microk8s enable gpu

# verify installation (expected: 'all validations are successful')
kubectl logs -n gpu-operator-resources -lapp=nvidia-operator-validator -c nvidia-operator-validator

# see all resources in the namespace
kubectl get all -n gpu-operator-resources

# check the capability has been added (expected: 'nvidia.com/gpu: 1' in Capability)
kubectl describe node arman-gpu

GitHub

Other Docs

Golden Codes - armanexplorer planet

Practical code snippets for Django, Python, Bash, Git and All!

install in MicroK8s