Releases
v1.4.0
Deep Learning Training Service v1.4.0
Job Manager
Improve 95th percentile job creation (from job submission to "scheduling") time from 400s to 46s.
Speed up job initialization by prebuilding and copying required apt packages from an init container
Per-user password for ssh login for user jobs
Azure blobfuse plugin(s) for a job
Custom docker registry secret(s) for a job
Scheduling jobs on pure CPU machines
VC machine hard assignment
Provide consistent environment variables for training in both interactive and non-interactive SSH
Restful API
Improve 95th percentile latency for job info and permission related Restful APIs from 2000ms to <500ms.
Web Portal (Dashboard)
Speed up page loading for "View and Manage Jobs" - "View and Manage Jobs V2"
Dashboard as a Kubernetes service
User Synchronization
Automate the user/group permission update process
Storage Manager
Scan NFS and send alert email for over-sized (boundary) paths when NFS storage usage exceeds threshold.
Repair Manager
Detect and send alert email for uncorrectable ECC errors
Fundamental
Fix occasionally failed NFS mounting upon machine restart
You can’t perform that action at this time.