Releases: NVIDIA/k8s-device-plugin
Releases · NVIDIA/k8s-device-plugin
v0.14.0-rc.2
Full Changelog: v0.14.0-rc.1...v0.14.0-rc.2
Changes
- Fix bug from v0.14.0-rc.1 when using cdi-enabled=false
v0.14.0-rc.1
Full Changelog: v0.13.0...v0.14.0-rc.1
Changes
- Added --cdi-enabled flag to GPU Device Plugin. With this enabled, the device plugin will generate CDI specifications for available NVIDIA devices. Allocation will add CDI anntiations (
cdi.k8s.io/*
) to the response. These are read by a CDI-enabled runtime to make the required modifications to a container being created. - Updated GFD subchard to version 0.8.0-rc.1
- Bumped Golang version to 1.20.1
- Bumped CUDA base images version to 12.1.0
- Switched to klog for logging
- Added a static deployment file for Microshift
v0.13.0
Full Changelog: v0.12.2...v0.13.0
Changes
- Skip
NVIDIA DGX Display
devices when generating labels. - Fail on startup if no valid resources are detected
- Bump GFD subchart to version 0.7.0
Changes from v0.13.0-rc.3
- Use
nodeAffinity
instead ofnodeSelector
by default in daemonsets - Add
machine-file-path
option to GFD config flags - Mount
/sys
instead of/sys/class/dmi/id/product_name
in GPU Feature Discovery daemonset - Bump GFD subchard to version 0.7.0-rc.3
Changes from v0.13.0-rc.2
- Bump cuda base image to 11.8.0
- Use consistent indendation in YAML manifests
- Fix bug from v0.13.0-rc.1 when using mig-strategy="mixed"
- Add logged error message if setting up health checks fails
- Support MIG devices with 1g.10gb+me profile
- Distribute replicas evenly across GPUs during allocation
- Bump GFD subchart to version 0.7.0-rc.2
Changes from v0.13.0-rc.1
- Improve health checks to detect errors when waiting on device events
- Log ECC error events detected during health check
- Add the GIT sha to version information for the CLI and container images
- Use NVML interfaces from go-nvlib to query devices
- Refactor plugin creation from resources
- Add a CUDA-based resource manager that can be used to expose integrated devices on Tegra-based systems
- Bump GFD subchart to version 0.7.0-rc.1
Note:
The container image nvcr.io/nvidia/k8s-device-plugin:v0.13.0-ubi8
contains the following high-severity CVEs:
- CVE-2022-42898 - Vulnerability found in os package type (rpm) - krb5-libs
v0.13.0-rc.3
- Use
nodeAffinity
instead ofnodeSelector
by default in daemonsets - Add
machine-file-path
option to GFD config flags - Mount
/sys
instead of/sys/class/dmi/id/product_name
in GPU Feature Discovery daemonset - Bump GFD subchard to version 0.7.0-rc.3
Full Changelog: v0.13.0-rc.2...v0.13.0-rc.3
v0.13.0-rc.2
- Bump cuda base image to 11.8.0
- Use consistent indendation in YAML manifests
- Fix bug from v0.13.0-rc.1 when using mig-strategy="mixed"
- Add logged error message if setting up health checks fails
- Support MIG devices with 1g.10gb+me profile
- Distribute replicas evenly across GPUs during allocation
- Bump GFD subchart to version 0.7.0-rc.2
v0.13.0-rc.1
- Improve health checks to detect errors when waiting on device events
- Log ECC error events detected during health check
- Add the GIT sha to version information for the CLI and container images
- Use NVML interfaces from go-nvlib to query devices
- Refactor plugin creation from resources
- Add a CUDA-based resource manager that can be used to expose integrated devices on Tegra-based systems
- Bump GFD subchart to version 0.7.0-rc.1
v0.12.3
v0.12.2
- Fix example configmap settings in values.yaml file
- Fix assertions for panicking on uniformity with migStrategy=single
- Make priorityClassName configurable through helm
- Move NFD servicAccount info under 'master' in helm chart
- Bump GFD subchart to version 0.6.1
- Allow an empty config file and default to "version: v1"
- Make config fallbacks for config-manager a configurable, ordered list
- Add an 'empty' config fallback (but don't apply it by default)
v0.12.1
- Exit the plugin and GFD sidecar containers on error instead of logging and continuing
- Only force restart of daemonsets when using config files and allow overrides
- Fix bug in calculation for GFD security context in helm chart
- Fix bug prohibiting GFD from being started from the plugin helm chart
v0.12.0
This release is a promotion of v0.12.0-rc.6
to v0.12.0
v0.12.0-rc.6
- Send SIGHUP from GFD sidecar to GFD main container on config change
- Reuse main container's securityContext in sidecar containers
- Update GFD subchart to v0.6.0-rc.1
- Bump CUDA base image version to 11.7.0
- Add a flag called FailRequestsGreaterThanOne for TimeSlicing resources
v0.12.0-rc.5
- Allow either an external ConfigMap name or a set of configs in helm
- Handle cases where no default config is specified to config-manager
- Update API used to pass config files to helm to use map instead of list
- Fix bug that wasn't properly stopping plugins across a soft restart
v0.12.0-rc.4
- Disable support for resource-renaming in the config (will no longer be part of this release)
- Add field for TimeSlicing.RenameByDefault to rename all replicated resources to .shared
- Refactor main to allow configs to be reloaded across a (soft) restart
- Add support to helm to provide multiple config files for the config map
- Add new config-manager binary to run as sidecar and update the plugin's configuration via a node label
- Make GFD and NFD (optional) subcharts of the device plugin's helm chart
v0.12.0-rc.3
- Add ability to parse Duration fields from config file
- Omit either the Plugin or GFD flags from the config when not present
- Fix bug when falling back to none strategy from single strategy
v0.12.0-rc.2
- Move MigStrategy from Sharing.Mig.Strategy back to Flags.MigStrategy
- Remove TimeSlicing.Strategy and any allocation policies built around it
- Add support for specifying a config file to the helm chart
v0.12.0-rc.1
- Add API for specifying time-slicing parameters to support GPU sharing
- Add API for specifying explicit resource naming in the config file
- Update config file to be used across plugin and GFD
- Stop publishing images to dockerhub (now only published to nvcr.io)
- Add NVIDIA_MIG_MONITOR_DEVICES=all to daemonset envvars when mig mode is enabled
- Print the plugin configuration at startup
- Add the ability to load the plugin configuration from a file
- Remove deprecated tolerations for critical-pod
- Drop critical-pod annotation(removed from 1.16+) in favor of priorityClassName
- Pass all parameters as env in helm chart and example daemonset.yamls files for consistency