Releases · NVIDIA/k8s-device-plugin

Added --cdi-enabled flag to GPU Device Plugin. With this enabled, the device plugin will generate CDI specifications for available NVIDIA devices. Allocation will add CDI anntiations (cdi.k8s.io/*) to the response. These are read by a CDI-enabled runtime to make the required modifications to a container being created.
Updated GFD subchard to version 0.8.0-rc.1
Bumped Golang version to 1.20.1
Bumped CUDA base images version to 12.1.0
Switched to klog for logging
Added a static deployment file for Microshift

Assets 2

30 Nov 14:21

elezar

v0.13.0

1f8a485

v0.13.0

Full Changelog: v0.12.2...v0.13.0

Changes

Skip NVIDIA DGX Display devices when generating labels.
Fail on startup if no valid resources are detected
Bump GFD subchart to version 0.7.0

Changes from `v0.13.0-rc.3`

Use nodeAffinity instead of nodeSelector by default in daemonsets
Add machine-file-path option to GFD config flags
Mount /sys instead of /sys/class/dmi/id/product_name in GPU Feature Discovery daemonset
Bump GFD subchard to version 0.7.0-rc.3

Changes from `v0.13.0-rc.2`

Bump cuda base image to 11.8.0
Use consistent indendation in YAML manifests
Fix bug from v0.13.0-rc.1 when using mig-strategy="mixed"
Add logged error message if setting up health checks fails
Support MIG devices with 1g.10gb+me profile
Distribute replicas evenly across GPUs during allocation
Bump GFD subchart to version 0.7.0-rc.2

Changes from `v0.13.0-rc.1`

Improve health checks to detect errors when waiting on device events
Log ECC error events detected during health check
Add the GIT sha to version information for the CLI and container images
Use NVML interfaces from go-nvlib to query devices
Refactor plugin creation from resources
Add a CUDA-based resource manager that can be used to expose integrated devices on Tegra-based systems
Bump GFD subchart to version 0.7.0-rc.1

Note:

The container image nvcr.io/nvidia/k8s-device-plugin:v0.13.0-ubi8 contains the following high-severity CVEs:

CVE-2022-42898 - Vulnerability found in os package type (rpm) - krb5-libs

Assets 2

07 Nov 10:58

elezar

v0.13.0-rc.3

b8682b1

v0.13.0-rc.3 Pre-release

Pre-release

Use nodeAffinity instead of nodeSelector by default in daemonsets
Add machine-file-path option to GFD config flags
Mount /sys instead of /sys/class/dmi/id/product_name in GPU Feature Discovery daemonset
Bump GFD subchard to version 0.7.0-rc.3

Full Changelog: v0.13.0-rc.2...v0.13.0-rc.3

Assets 2

21 Oct 09:07

elezar

v0.13.0-rc.2

a866314

v0.13.0-rc.2 Pre-release

Pre-release

Bump cuda base image to 11.8.0
Use consistent indendation in YAML manifests
Fix bug from v0.13.0-rc.1 when using mig-strategy="mixed"
Add logged error message if setting up health checks fails
Support MIG devices with 1g.10gb+me profile
Distribute replicas evenly across GPUs during allocation
Bump GFD subchart to version 0.7.0-rc.2

Assets 2

11 Oct 11:34

elezar

v0.13.0-rc.1

0930c36

v0.13.0-rc.1 Pre-release

Pre-release

Improve health checks to detect errors when waiting on device events
Log ECC error events detected during health check
Add the GIT sha to version information for the CLI and container images
Use NVML interfaces from go-nvlib to query devices
Refactor plugin creation from resources
Add a CUDA-based resource manager that can be used to expose integrated devices on Tegra-based systems
Bump GFD subchart to version 0.7.0-rc.1

Assets 2

12 Sep 14:03

klueska

v0.12.3

06c6e9a

v0.12.3

Bump cuda base image to 11.7.1
Remove CUDA compat libs from the device-plugin image in favor of libs installed by the driver
Fix securityContext.capabilities indentation
Add namespace override for multi-namespace deployments

Assets 2

16 Jun 20:02

klueska

v0.12.2

6815626

v0.12.2

Fix example configmap settings in values.yaml file
Fix assertions for panicking on uniformity with migStrategy=single
Make priorityClassName configurable through helm
Move NFD servicAccount info under 'master' in helm chart
Bump GFD subchart to version 0.6.1
Allow an empty config file and default to "version: v1"
Make config fallbacks for config-manager a configurable, ordered list
Add an 'empty' config fallback (but don't apply it by default)

Assets 2

13 Jun 21:53

klueska

v0.12.1

18119fc

v0.12.1

Exit the plugin and GFD sidecar containers on error instead of logging and continuing
Only force restart of daemonsets when using config files and allow overrides
Fix bug in calculation for GFD security context in helm chart
Fix bug prohibiting GFD from being started from the plugin helm chart

Assets 2

06 Jun 20:27

klueska

v0.12.0

e800686

v0.12.0

This release is a promotion of v0.12.0-rc.6 to v0.12.0

v0.12.0-rc.6

Send SIGHUP from GFD sidecar to GFD main container on config change
Reuse main container's securityContext in sidecar containers
Update GFD subchart to v0.6.0-rc.1
Bump CUDA base image version to 11.7.0
Add a flag called FailRequestsGreaterThanOne for TimeSlicing resources

v0.12.0-rc.5

Allow either an external ConfigMap name or a set of configs in helm
Handle cases where no default config is specified to config-manager
Update API used to pass config files to helm to use map instead of list
Fix bug that wasn't properly stopping plugins across a soft restart

v0.12.0-rc.4

Disable support for resource-renaming in the config (will no longer be part of this release)
Add field for TimeSlicing.RenameByDefault to rename all replicated resources to .shared
Refactor main to allow configs to be reloaded across a (soft) restart
Add support to helm to provide multiple config files for the config map
Add new config-manager binary to run as sidecar and update the plugin's configuration via a node label
Make GFD and NFD (optional) subcharts of the device plugin's helm chart

v0.12.0-rc.3

Add ability to parse Duration fields from config file
Omit either the Plugin or GFD flags from the config when not present
Fix bug when falling back to none strategy from single strategy

v0.12.0-rc.2

Move MigStrategy from Sharing.Mig.Strategy back to Flags.MigStrategy
Remove TimeSlicing.Strategy and any allocation policies built around it
Add support for specifying a config file to the helm chart

v0.12.0-rc.1

Add API for specifying time-slicing parameters to support GPU sharing
Add API for specifying explicit resource naming in the config file
Update config file to be used across plugin and GFD
Stop publishing images to dockerhub (now only published to nvcr.io)
Add NVIDIA_MIG_MONITOR_DEVICES=all to daemonset envvars when mig mode is enabled
Print the plugin configuration at startup
Add the ability to load the plugin configuration from a file
Remove deprecated tolerations for critical-pod
Drop critical-pod annotation(removed from 1.16+) in favor of priorityClassName
Pass all parameters as env in helm chart and example daemonset.yamls files for consistency

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Changes

Uh oh!

Changes

Uh oh!

Changes

Changes from `v0.13.0-rc.3`

Changes from `v0.13.0-rc.2`

Changes from `v0.13.0-rc.1`

Note:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

v0.12.0-rc.6

v0.12.0-rc.5

v0.12.0-rc.4

v0.12.0-rc.3

v0.12.0-rc.2

v0.12.0-rc.1

Uh oh!

Releases: NVIDIA/k8s-device-plugin

v0.14.0-rc.2

Changes

Uh oh!

v0.14.0-rc.1

Changes

Uh oh!

v0.13.0

Changes

Changes from v0.13.0-rc.3

Changes from v0.13.0-rc.2

Changes from v0.13.0-rc.1

Note:

Uh oh!

v0.13.0-rc.3

Uh oh!

v0.13.0-rc.2

Uh oh!

v0.13.0-rc.1

Uh oh!

v0.12.3

Uh oh!

v0.12.2

Uh oh!

v0.12.1

Uh oh!

v0.12.0

v0.12.0-rc.6

v0.12.0-rc.5

v0.12.0-rc.4

v0.12.0-rc.3

v0.12.0-rc.2

v0.12.0-rc.1

Uh oh!

Changes from `v0.13.0-rc.3`

Changes from `v0.13.0-rc.2`

Changes from `v0.13.0-rc.1`