Releases: clearml/clearml-agent
Releases · clearml/clearml-agent
PyPI v2.0.4 - ClearML
PyPI v2.0.3 - ClearML
New Features and Bug Fixes
- Fix null ptr ref when skip python env enabled (#240, thanks @benled1!)
- Add support for Azure DevOps Git repositories authentication with Azure PAT and MS Entra Token using the
agent.git_use_azure_pat
andagent.git_use_ms_entra_token
configuration options (and environment variablesCLEARML_AGENT_GIT_USE_AZURE_PAT
andCLEARML_AGENT_GIT_USE_MS_ENTRA_TOKEN
)
PyPI v2.0.2 - ClearML
Bug Fixes
- Fix YAML dump failure causes agent to abort task execution (error log includes
expected SCALAR, SEQUENCE-START, MAPPING-START, or ALIAS
)
PyPI v2.0.1 - ClearML
Bug Fixes
- Fix regression issue
AttributeError: 'NoneType' object has no attribute 'pending'
caused by accessing a non-existing report (#241)
PyPI v2.0.0 - ClearML
New Features
- Add command line arguments for
k8s_glue_example.py
(#196, thanks @IlyaMescheryakov1402!) - Add initial support for
--break-system-packages
version detection, also make sure to userm /usr/lib/python3.*/EXTERNALLY-MANAGED
- Integrate docker port mapping to control non
network=host
port mapping, including port reassigning for multiple agents running on the same machine - Add support for container rulebook overrides (
force_container_rules: true
) and container rulebook task update (update_back_task: true
). This allows users to override container arguments forcefully based on the tasks properties (repository, tags, project, user etc.), as well as offer additional defaults based on python required packages or python versions - Add the
CLEARML_AGENT_ABORT_CALLBACK_CMD
andCLEARML_AGENT_ABORT_CALLBACK_TIMEOUT
(default 180 seconds) environment variables to define callback command to be called on abort status change - Add support for
${CLEARML_TASK.yyy}
as docker arguments parsed based on Task values - Add support for the
CLEARML_AGENT_QUEUE_POLL_FREQ_SEC
andCLEARML_AGENT_STATUS_REPORT_FREQ_SEC
environment variables to customize agent behavior - Add support for the
agent.translate_ssl_replacement_scheme
configuration option andCLEARML_AGENT_SSH_URL_REPLACEMENT_SCHEME
environment variable to support translating SSH URLs to HTTP instead of HTTPS - Set
urllib3
log level when connecting to server toWARNING
by default to provide visibility on certificate issues (use theCLEARML_AGENT_URLLIB3_CONNECT_LEVEL
environment variable to customize this behavior) - Optimize dynamic GPU to query only relevant workers (requires ClearML Server v2.0.0 or higher)
- Bump
urllib3
to "<3" to support newurllib3
v2 - Add note to
--use-owner-token
help (#235) - Update GPU Fractions information for more GPU types
- Add
CLEARML_AGENT_CONFIG_VERBOSE
for verbose configuration file loading - Add default support for dns i.e. rocky/centos/fedora containers
- Support
NVIDIA_VISIBLE_DEVICES
containing volume mounts - Add better debug logging when task session creation fails
- Add
is_daemon
indication in status report - Reduce required packages
- Add SECURITY.md
Bug Fixes
- Fix
uv_replace_pip
feature - Fix UV cache based on sync/pip-replacement
- Fix if UV fails but lock file is missing, revert to UV as pip drop in replacement
- Fix use UV bin instead of UV python package to avoid nested
VIRTUAL_ENV
issues - Fix UV pip freeze fails
- Fix UV as pip drop-in replacement print
- Fix cached venv tries to reinstall priority packages even through they are preinstalled
- Fix pip freeze dump to comply with YAML fancy print
- Fix pip requirements print dump should be sorted
- Fix force the stop command to avoid a potential race
- Fix make sure that if we fail to kill a child processes we continue to try the rest
- Fix untitled file based on binary is now
py
/sh
based on requested binary - Fix session should retry on Any error if send fails
- Fix potential issue with agent not sending queues in status report
- Fix do not set task to Aborted if it is already set to Failed
- Fix installing venv from the agent's python binary when the selected python failed - this could be the cause of missing pip or venv in the selected python
- Fix fallback to system path
python3
if we fail to have pip used with the selected python - this could happen if preinstalled python is in path and it does not contain pip package (e.g. NIM containers) - Fix fallback to system python should not change the python bin inside the new venv
PyPI v1.9.3 - ClearML
New Features
- Add support for
uv
as package manager (#218, thanks @mads-oestergaard!) - Add
agent.docker_args_filters
to configuration docs, to enforce filter whitelist on docker arguments allowing only those matching these filters to be used when running containers - Add support for Python 3.13
- Remove Python 3.5 support
- Add
win32file
on windows (pywin32
dependency) - Scan more Python 3 versions
- Support ignoring
kubectl
errors - Support creating queue with tags
Bug Fixes
- Fix managed python environment inside container (PEP 668)
- Fix default value handling in
merge_dicts()
utility function - Fix python 3.6 compatibility (no
:=
operator)
PyPI v1.9.2 - ClearML
New Features and Bug Fixes
- Handle OSError when checking for is_file (#215, thanks @materight!)
- Add support for pip legacy resolver for versions specified using the
agent.package_manager.pip_legacy_resolver
configuration option - Add skip existing packages
- Fix report index not advancing in resource monitoring causes more than one GPU not to be reported
- Fix use
req_token_expiration_sec
and not the default value when creating a task session - Fix reload method is found in the config object causing periodic agent error printouts
PyPI v1.9.1 - ClearML
New Features and Bug Fixes
- Add default pip version support for Python 3.12
PyPI v1.9.0 - ClearML
New Features
- Add NO_DOCKER flag to clearml-agent-services entrypoint (#206, thanks @valentinschabschneider!)
- Use
venv
module ifvirtualenv
is not supported - Find the correct python version when using a pre-installed python environment
- Add
/bin/bash
support in the task'sscript.binary
property - Add support for
.ipynb
script entry files (install nbconvert in runtime, convert file to python and execute the python script). IncludesCLEARML_AGENT_FORCE_TASK_INIT
patching of.ipynb
files (post python conversion) - Add
CLEARML_MULTI_NODE_SINGLE_TASK
(values -1, 0, 1, 2) for easier multi-node single Task workloads - Add default docker
agent.default_docker.match_rules
configuration option supported by enterprise servers (note: matching_rules are ignored if--docker container
is passed in command line) - Add
-m module args
in script entry now supports standalone script. Standalone script is placed in a file specified by theworking_dir
setting in the<dir>:<target_file>
format (e.g.:standalone.py
), or in untitled.py if not specified - Add
K8S_GLUE_POD_USE_IMAGE_ENTRYPOINT
env var to allow running k8s pods without overriding the image entrypoint (useful for agents using prebuilt images in k8s) - Add venv cache mount override for non-root containers (use:
agent.docker_internal_mounts.venvs_cache
) - Add
/bin/bash -c "command"
support. Taskbinary
should be set to/bin/bash
andentry_point
should be set to-c command
- Add queue priority info to CLI help (#211)
- Add support for tasks containing only bash script or python module command
- Add support for skipping container apt installs using
CLEARML_AGENT_SKIP_CONTAINER_APT
env var in k8s
Bug Fixes
- Fix git fetch did not update new tags #209
- Fix file mode should be optional in configuration
files
section - Fix
-m module $env
to support parsing$env
before launching - Fix setting tasks that were just marked as
aborted
tostarted
- only force task tostarted
after dequeuing it otherwise do nothing - Fix slurm multi-node rank detection
- Fix pass
--docker only
(i.e. no default container image) when using--dynamic-gpus
feature - Fix logger object was used even if
None
- Fix a race condition where in rare conditions popping a task from a queue that was aborted did not set it to started before the watchdog killed it (not applicable in k8s/slurm)
- Fix multi-node support to only send pip freeze update, only set task as started and only update task status on exit for RANK 0
- Fix do not cache venv cache if venv/python skip env var was set
- Fix use same state transition in k8s if supported by the server (instead of stopping the task before re-enqueue)
- Fix failed Task in services mode logged as "User aborted" instead of failed, add Task state reason string
- Fix remove task from pending queue and set to failed in k8s when applying the pod template fails
PyPI v1.8.1 - ClearML
New Features and Bug Fixes
- Add option to set daemon polling interval (#197, thanks @ilouzl!)
- Add Python 3.12 support
- Fix git pulling on cached invalid git entry. On error, re-clone the entire repository again (enable using
agent.vcs_cache.clone_on_pull_fail: true
) - Fix conda env should not be cached if installing into base conda or conda existing env exists
- Fix cached repositories were not passing user/token when pulling
- Fix when disabling vcs cache do not add vcs mount point to container