Releases: clearml/clearml-agent
Releases · clearml/clearml-agent
PyPI v0.16.3
Features and Bug Fixes
- Update PyJWT requirement (v2.0.0 breaks interface)
- Update other requirements constraints
- Change k8s pod naming scheme in k8s glue to include queue name, conform queue name to k8s standard
PyPI v0.16.2
Features
- conda
- Add
agent.package_manager.conda_env_as_base_docker
allowing "docker_cmd" to contain link to a full pre-packaged conda environment (tar.gz
created byconda-pack
). UseTRAINS_CONDA_ENV_PACKAGE
environment variable to specifyconda tar.gz
file. - Add conda support for read-only pre-built environment (pass conda folder as
docker_cmd
on Task) - Improve trying to find conda executable
- Add
- k8s glue
- Add support for limited number of services exposing ports
- Add support for k8s pod custom user properties
- Allow selecting external
trains.conf
file for the pod itself - Allow providing pod template, extra bash init script, alternate SSH server port, gateway address (k8s ingress/ELB)
- Allow specifying
cudatoolkit
version in the "installed packages" section when using conda as package manager clearml/clearml#229 - Add
agent.package_manager.force_repo_requirements_txt
. If True, "Installed Packages" on Task are ignored, and only repositoryrequirements.txt
is used - Pass
TRAINS_DOCKER_IMAGE
into docker for interactive sessions - Add
torchcsprng
andtorchtext
to PyTorch resolving
Bug Fixes
- When logging suppress "\r" when reading a current chunk of a file/stream. Add
agent.suppress_carriage_return
(default True) to support previous behavior - Make sure
TRAINS_AGENT_K8S_HOST_MOUNT
is used only once per mount - Fix k8s glue script to trains-agent default docker script
- Fix apply git diff from submodule only
- conda
- Fix conda pip freeze to be consistent with trains 0.16.3
- Fix conda environment support for trains 0.16.3 full env. Add
agent.package_manager.conda_full_env_update
to allow conda to update back the requirements (default False, to preserve previous behavior) - Fix running from conda environment -
conda.sh
not found in first conda PATH match
- Fix docker mode ubuntu/debian support by making sure not to ask for input (fix
tzdata
install) - Fix repository detection - ignore environment
SSH_AUTH_SOCK
, only check if git user/pass are configured - git diff
- Fix support for non-ascii diff
- Fix diff with empty line at the end will cause corrupt diff apply message
- Allow zero context diffs (useful when blind patching repository)
- Fix
daemon --stop
when agent UID cannot be located - Fix nvidia docker support on some linux distros (SUSE)
- Fix nvidia pytorch dockers support
- Fix torch CUDA 11.1 support
- Fix requirements dict with null entry in
pip
should be considered None install from repository'srequirements.txt
PyPI v0.16.1
Features
- Add
sdk.metrics.plot_max_num_digits
configuration option to reduce plot storage size - Add
agent.package_manager.post_packages
andagent.package_manager.post_optional_packages
configuration options to control packages install order (e.g. horovod) - Add
agent.git_host
configuration option for limiting git credential usage for a specific host (overridable usingTRAINS_AGENT_GIT_HOST
environment variable) - Add
agent.force_git_ssh_port
configuration option to controlhttps
tossh
link conversion for non standardssh
ports - Add requirements detection features
- Improve support for detecting new pip version (20+) supporting
package @ scheme://link
- Improve support for detecting new pip version (20+) supporting
Bug Fixes
- Fix pre-installed packages are ignored when installing a git package wheel. Reinstalling a
git+http
link is enough to make sure all requirements are met/installed clearml/clearml#196 - Fix incorrect check for spaces in current execution folder
- Fix requirements detection
- Update torch version after using downloaded / system pre-installed version
- Do not install git packages twice when a new pip version is used (pip freeze will detect the correct git link version)
PyPI v0.16.0
Features
- Add
agent.docker_init_bash_script
configuration section to allow finer control over docker startup script - Changed default docker image from
nvidia/cuda
tonvidia/cuda:10.1-runtime-ubuntu18.04
to supportcudnn
frameworks (e.g. TF) - Improve support for dockers with preinstalled
conda
environment - Improve trains-agent-docker spinning
- Add
daemon --order-fairness
for round-robin queue pulling - Add
daemon --stop
to terminate a running agent (assuming other arguments are the same)- If no additional arguments, Agents are terminated in lexicographical order
- Support cleanup of all log files on termination unless executed with
--debug
- Add error message when Trains API Server is not accessible on startup
Bug Fixes
- Fix GPU Windows monitoring support clearml/clearml#177
- Fix
.git-credentials
and.gitconfig
mapping into docker - Fix non-root docker image usage
- Fix docker to use
UTF-8
encoding, so prints won't break it - Fix
--debug
to set all loggers toDEBUG
- Fix task status change to
queued
should never happen during Task runtime - Fix
requirement_parser
to supportpackage @ git+http
lines - Fix GIT user/password in requirements and support for
-e git+http
lines - Fix configuration wizard to generate
trains.conf
matching latest Trains definitions
PyPI v0.15.1
Features
- Add Trains Agent Daemon and Services docker files
Bug Fixes
- Fix initialization wizard (allow at most two verification retries, then print error)
- Add warning on
--gpus
with no detected CUDA version #24 - Add
agent.force_git_ssh_protocol
configuration option to force all git links tossh://
#16 - Add git user/pass permission into pip package installation from Git repository #22
PyPI v0.15.0
Features
- Add daemon Services Mode (
daemon --services-mode
) where the daemon spins a task in its own docker and verifies start-up and shut-down. This allows multiple tasks to be launched simultaneously on the same machine (currently in CPU mode only), where each task service will register itself as a worker for the lifetime of the task - Enhance
build --docker
mode- Add
--install-globally
option to install required packages in the docker's system python - Add
--entry-point
option to allow automatic task cloning when running the docker
- Add
- Support PyTorch Nightly builds using the
agent.torch_nightly
configuration flag. Iftrue
, the agent looks for a nightly build when a stable torch wheel is not found - Add environment variables support for git user/password
- Using
TRAINS_AGENT_GIT_USER
/TRAINS_AGENT_GIT_PASS
- Pass git credentials to dockerized experiment execution
- Using
- Support running code from module (i.e.
-m
in execution entry point) - Add daemon
--create-queue
to automatically create a queue and use it if queue name doesn't exist in the server - Move
--gpus
and--cpu-only
to worker args (used by daemon, execute and build)
Bug Fixes
- Fix init wizard, correctly display the input servers #19
- Fix version control links in requirements when using
conda
- Fix
build --docker
mode standalone docker execution - Improve docker host-mount support, use
TRAINS_AGENT_DOCKER_HOST_MOUNT
environment variable - Support
pip
v20.1 local/http package reference inpip freeze
- Fix detached mode to correctly use cache folder slots
- Fix
CUDA_VISIBLE_DEVICES
should never be set to "all" (Trains Slack channel thread) - Do not monitor GPU when running with
--cpu-only
PyPI v0.14.1
Features and Bug Fixes
- Add daemon detached mode (
--detached
,-d
) that runs the agent as daemon in the background and returns immediately - Auto mount
~/.git-credentials
into docker container (if file exists) - Add
TRAINS_AGENT_EXTRA_PYTHON_PATH
environment variable to allow adding additional python path during experiment execution (helpful when using extra un-tracked modules) - Fix "run as user" feature (using
TRAINS_AGENT_EXEC_USER
environment variable) - Fix PyTorch support to ignore minor versions when looking for package to install/download
- Fix experiment execution output handling
PyPI v0.14.0
Features and Bug Fixes
- Add support for
trains-agent execute --id <experiment-id> --docker
that allows executing a specific experiment inside a docker container - Add support for
trains-agent execute --id <template-experiment-id> --clone
that clones the provided experiment and executes the cloned experiment - Add support for
APIClient.models.delete()
to allow programmatically deleting a model clearml/clearml-server#32 - Add daemon support for passing storage-related OS environment variables to experiments executed inside a docker container (supported by
trains>=0.13.3
):- AWS:
AWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
andAWS_DEFAULT_REGION
- Azure:
AZURE_STORAGE_ACCOUNT
andAZURE_STORAGE_KEY
- Google:
GOOGLE_APPLICATION_CREDENTIALS
- AWS:
- Fix git checkout with submodules clearml/clearml#112
- Prefer docker image from command line over the one specified in experiment
PyPI v0.13.3
Features and Bug Fixes
- Allow providing queue names instead of queue IDs in daemon mode
- Docker mode improvements
- Support running as a specific user inside a docker using the
TRAINS_AGENT_EXEC_USER
environment flag - Pass correct GPU limit when skipping gpus flag
- Add
--force-current-version
daemon command-line flag
- Support running as a specific user inside a docker using the
- Add K8s/trains glue service example
- Added K8s support in daemon mode
- Running inside a K8s pod
- Mounting dockerized experiment folders to host
- Allow a specific network for the docker
- Add default storage environment vars (for AWS, GS and Azure) to generated agent configuration
- Improve Unicode/UTF stdout handling
PyPI v0.13.2
Features and Bug Fixes
- Pre-install
numpy
if it exists in the requirements - Add experiment archiving example
- Add
.bashrc
reloading before running trains-agent in the AWS dynamic cluster management service - Add support for pulling recursive git modules as as well as main project
- Limit
virtualenv
version to<20
due to an import issue in v20.0.0 - Fix
pip
install/upgrade with limit inconda
- Fix daemon monitor to not stop experiments if network is down