20.08
Pre-release
Pre-release
DeepOps 20.08 Release Notes
NOTE: Use 20.08.1 release instead of this one for various bug fixes.
What's New
- DGX A100 support
- NVIDIA HPC SDK
- Spack package manager
- HPL Burn-in test
- MPI Operator
Changes
- Slurm 20.02.4, Pyxis v0.8.0, Enroot v3.1.1
- Kubernetes v1.17.9 (Kubespray v2.13.3), Helm 3, GPU Operator v0.6.0
- Kubeflow v1.1.0 w/ MPI Operator (kfctl -> v1.1.0, istio_dex -> v1.0.2, istio -> v1.1.0)
- DGX OS 4.5
- DGX role updated to current versions/packages
- K8S DCGM Exporter 1.7.2 (port switch from 9101 to 9400)
- Bug fixes and enhancements
- Default nfs configurations have changed
Bugs/Enhancements
- General Kubeflow installation and polling improvements (along with Jenkins tests)
- Kubeflow deletion now actually deletes Kubeflow along with Istio, cert-manager, etc.
- Kubeflow installation now automatically installs the MPI Operator
- DCGM/Grafana dashboard updates
- General cleanup and version pinning in K8S monitoring deployment script
- Improved Jenkins testing (new tests: spack, kubeflow, centos tests; additional debugging/scale-tests/fixes)
- Peg Rook/Ceph versions
- Updated/improved/spell-checked documentation (slurm-perf, kubeflow, kubernetes, Lmod, Spack, EasyBuild)
- Slurm MPI now defaults to pmix if available
- golang galaxy role bumped to 2.4.0
- Improved Trident usability
- New default config variables (install_chrony, ...)
- General reorg of Slurm role and slurm-cluster.yml
- Dedicated lmod playbook
- Replaced a few helm repos with stable version
- gpu plugin now uses helm install
Upgrade Steps
If you are upgrading to this version of DeepOps from a previous release you will need to follow the upgrade section of the Slurm or Kubernetes Deployment Guides. In addition to this, the setup.sh
script must be re-run and any new variables in the config.example
files should be added to the existing config
. For a full diff from release 20.06
run git diff 20.08 20.06 -- config.example/
It is also necessary to upgrade helm on your provisioner node. This can be done manually using ./scripts/install_helm.sh
as a reference.