Skip to content

Releases: Netflix/metaflow

2.2.6 (Jan 26th, 2021)

26 Jan 21:51
20f584f
Compare
Choose a tag to compare

Metaflow 2.2.6 Release Notes

The Metaflow 2.2.6 release is a minor patch release.

Features

Support AWS Fargate as compute backend for Metaflow tasks launched on AWS Batch

At AWS re:invent 2020, AWS announced support for AWS Fargate as a compute backend (in addition to EC2) for AWS Batch. With this feature, Metaflow users can now submit their Metaflow jobs to AWS Batch Job Queues which are connected to AWS Fargate Compute Environments as well. By setting the environment variable - METAFLOW_ECS_FARGATE_EXECUTION_ROLE , users can configure the ecsTaskExecutionRole for the AWS Batch container and AWS Fargate agent. PR: #402

Support shared_memory, max_swap, swappiness attributes for Metaflow tasks launched on AWS Batch

The @batch decorator now supports shared_memory, max_swap, swappiness attributes for Metaflow tasks launched on AWS Batch to provide a greater degree of control for memory management. PR: #408

Support wider very-wide workflows on top of AWS Step Functions

The tag metaflow_version: and runtime: is now available for all packaged executions and remote executions as well. This ensures that every run logged by Metaflow will have metaflow_version and runtime system tags available. PR: #403

Bug Fixes

Assign tags to Run objects generated through AWS Step Functions executions

Run objects generated by flows executed on top of AWS Step Functions were missing the tags assigned to the flow; even though the tags were correctly persisted to tasks. This release fixes and brings inline the tagging behavior as observed with local flow executions. PR: #386

Pipe all workflow set-up logs to stderr

Execution set-up logs for @conda and IncludeFile were being piped to stdout which made manipulating the output of commands like python flow.py step-functions create --only-json a bit difficult. This release moves the workflow set-up logs to stderr. PR: #379

Handle null assignment to IncludeFile properly

A workflow executed without a required IncludeFile parameter would fail when the parameter was referenced inside the flow. This release fixes the issue by assigning a null value to the parameter in such cases. PR: #421

2.2.5 (Nov 11th, 2020)

11 Nov 10:43
6977633
Compare
Choose a tag to compare

Metaflow 2.2.5 Release Notes

The Metaflow 2.2.5 release is a minor patch release.

  • Features

    • Log metaflow_version: and runtime: tag for all executions
  • Bug Fixes

    • Handle inconsistently cased file system issue when creating @conda environments on macOS for linux-64

Features

Log metaflow_version: and runtime: tag for all executions

The tag metaflow_version: and runtime: is now available for all packaged executions and remote executions as well. This ensures that every run logged by Metaflow will have metaflow_version and runtime system tags available. PR: #376, #375

Bug Fixes

Handle inconsistently cased file system issue when creating @conda environments on macOS for linux-64

Conda fails to correctly set up environments for linux-64 packages on macOS at times due to inconsistently cased filesystems. Environment creation is needed to collect the necessary metadata for correctly setting up the conda environment on AWS Batch. This fix simply ignores the error-checks that conda throws while setting up the environments on macOS when the intended destination is AWS Batch. PR: #377

2.2.4 (Oct 28th, 2020)

28 Oct 23:30
c3aab7e
Compare
Choose a tag to compare

Metaflow 2.2.4 Release Notes

The Metaflow 2.2.4 release is a minor patch release.

  • Features

    • Metaflow is now compliant with AWS GovCloud & AWS CN regions
  • Bug Fixes

    • Address a bug with overriding the default value for IncludeFile
    • Port AWS region check for AWS DynamoDb from curl to requests

Features

Metaflow is now compliant with AWS GovCloud & AWS CN regions

AWS GovCloud & AWS CN users can now enjoy all the features of Metaflow within their region partition with no change on their end. PR: #364

Bug Fixes

Address a bug with overriding the default value for IncludeFile

Metaflow v2.1.0 introduced a bug in IncludeFile functionality which prevented users from overriding the default value specified. PR: #346

Port AWS region check for AWS DynamoDb from curl to requests

Metaflow's AWS Step Functions' integration relies on AWS DynamoDb to manage foreach constructs. Metaflow was leveraging curl at runtime to detect the region for AWS DynamoDb. Some docker images don't have curl installed by default; moving to requests (a metaflow dependency) fixes the issue. PR: #343

2.2.3 (Sep 8th, 2020)

08 Sep 18:22
6b87fdc
Compare
Choose a tag to compare

Metaflow 2.2.3 Release Notes

The Metaflow 2.2.3 release is a minor patch release.

  • Bug Fixes
    • Fix #305 : Default 'help' for parameters was not handled properly
    • Pin the conda library versions for metaflow default dependencies based on the Python version
    • Add conda bin path to the PATH environment variable during Metaflow step execution
    • Fix a typo in metaflow/debug.py

Bug Fixes

Fix #305 : Default 'help' for parameters was not handled properly

Fix the issue where default help for parameters was not handled properly. #305 Flow fails because IncludeFile's default value for the help argument is None. PR: #318

Pin the conda library versions for metaflow default dependencies based on the Python version.

The previously pinned library version does not work with python 3.8. Now we have two sets of different version combinations which should work for python 2.7, 3.5, 3.6, 3.7, and 3.8. PR: #308

Add conda bin path to the PATH environment variable during Metaflow step execution

Previously the executable installed in conda environment was not visible inside metaflow steps. Fixing this issue by appending conda bin path to the PATH environment variable PR: #307

Fix a typo in metaflow/debug.py

A typo fix. PR: #304

2.2.2 (Aug 20th, 2020)

20 Aug 08:00
bfe41df
Compare
Choose a tag to compare

Metaflow 2.2.2 Release Notes

The Metaflow 2.2.2 release is a minor patch release.

  • Bug Fixes
    • Fix a regression introduced in 2.2.1 related to Conda environments
    • Clarify Pandas requirements for Tutorial Episode 04
    • Fix an issue with the metadata service

Bug Fixes

Fix a regression with Conda

Metaflow 2.2.1 included a commit which was merged too early and broke the use of Conda. This release reverses this patch.

Clarify Pandas version needed for Episode 04

Recent versions of Pandas are not backward compatible with the one used in the tutorial; a small comment was added to warn of this fact.

Fix an issue with the metadata service

In some cases, the metadata service would not properly create runs or tasks.

PRs #296, #297, #298

2.2.1 (Aug 17th, 2020)

18 Aug 05:38
b181ff3
Compare
Choose a tag to compare

Metaflow 2.2.1 Release Notes

The Metaflow 2.2.1 release is a minor patch release.

  • Features
    • Add include parameter to merge_artifacts.
  • Bug Fixes
    • Fix a regression introduced in 2.1 related to S3 datatools
    • Fix an issue where Conda execution would fail if the Conda environment was not writeable
    • Fix the behavior of uploading artifacts to the S3 datastore in case of retries

Features

Add include parameter for merge_artifacts

You can now specify the artifacts to be merged explicitly by the merge_artifacts method as opposed to just specifying the ones that should not be merged.

Bug Fixes

Fix a regression with datatools

Fixes the regression described in #285.

Fix an issue with Conda in certain environments

In some cases, Conda is installed system wide and the user cannot write to its installation directory. This was causing issues when trying to use the Conda environment. Fixes #179.

Fix an issue with the S3 datastore in case of retries

Retries were not properly handled when uploading artifacts to the S3 datastore. This fix addresses this issue.

PRs #282, #286, #287, #288, #289, #290, #291

2.2.0 (Aug 4th, 2020)

05 Aug 05:13
2cd1974
Compare
Choose a tag to compare

Metaflow 2.2.0 Release Notes

The Metaflow 2.2.0 release is a minor release and introduces Metaflow's support for R lang.

Features

Support for R lang.

This release provides an idiomatic API to access Metaflow in R lang. It piggybacks on the Pythonic implementation as the backend providing most of the functionality previously accessible to the Python community. With this release, R users can structure their code as a metaflow flow. Metaflow will snapshot the code, data, and dependencies automatically in a content-addressed datastore allowing for resuming of workflows, reproducing past results, and inspecting anything about the workflow e.g. in a notebook or RStudio IDE. Additionally, without any changes to their workflows, users can now execute code on AWS Batch and interact with Amazon S3 seamlessly.

PR #263 and PR #214 .

2.1.1 (Jul 30th, 2020)

30 Jul 23:58
7b37934
Compare
Choose a tag to compare

Metaflow 2.1.1 Release Notes

The Metaflow 2.1.1 release is a minor patch release.

  • Bug Fixes
    • Handle race condition for /step endpoint of metadata service.

Bug Fixes

Handle race condition for /step endpoint of metadata service.

The foreach step in AWS Step Functions launches multiple AWS Batch tasks, each of which tries to register the step metadata, if it already doesn't exist. This can result in a race condition and cause the task to fail. This patch properly handles the 409 response from the service.

PR #258 & PR #260

2.1.0 (Jul 29th, 2020)

29 Jul 20:09
db11d7e
Compare
Choose a tag to compare

Metaflow 2.1.0 Release Notes

The Metaflow 2.1.0 release is a minor release and introduces Metaflow's integration with AWS Step Functions.

  • Features
    • Add capability to schedule Metaflow flows with AWS Step Functions.
  • Improvements
    • Fix log indenting in Metaflow.
    • Throw exception properly if fetching code package from Amazon S3 on AWS Batch fails.
    • Remove millisecond information from timestamps returned by Metaflow client.
    • Handle CloudWatchLogs resource creation delay gracefully.

Features

Add capability to schedule Metaflow flows with AWS Step Functions.

Netflix uses an internal DAG scheduler to orchestrate most machine learning and ETL pipelines in production. Metaflow users at Netflix can seamlessly deploy and schedule their flows to this scheduler. Now, with this release, we are introducing a similar integration with AWS Step Functions where Metaflow users can easily deploy & schedule their flows by simply executing

python myflow.py step-functions create

which will create an AWS Step Functions state machine for them. With this feature, Metaflow users can now enjoy all the features of Metaflow along with a highly available, scalable, maintenance-free production scheduler without any changes in their existing code.

We are also introducing a new decorator - @schedule, which allows Metaflow users to instrument time-based triggers via Amazon EventBridge for their flows deployed on AWS Step Functions.

With this integration, Metaflow users can inspect their flows deployed on AWS Step Functions as before and debug and reproduce results from AWS Step Functions on their local laptop or within a notebook.

Documentation
Launch Blog Post

PR #211 addresses Issue #2 .

Improvements

Fix log indenting in Metaflow.

Metaflow was inadvertently removing leading whitespace from user-visible logs on the console. Now Metaflow presents user-visible logs with the correct formatting.

PR #244 fixed issue #223.

Throw exception properly if fetching code package from Amazon S3 on AWS Batch fails.

Due to malformed permissions, AWS Batch might not be able to fetch the code package from Amazon S3 for user code execution. In such scenarios, it wasn't apparent to the user, where the code package was being pulled from, making triaging any permission issue a bit difficult. Now, the Amazon S3 file location is part of the exception stack trace.

PR #243 fixed issue #232.

Remove millisecond information from timestamps returned by Metaflow client.

Metaflow uses time to store the created_at and finished_at information for the Run object returned by Metaflow client. time unfortunately does not support the %f directive, making it difficult to parse these fields by datetime or time. Since Metaflow doesn't expose timings at millisecond grain, this PR drops the %f directive.

PR #227 fixed issue #224.

Handle CloudWatchLogs resource creation delay gracefully.

When launching jobs on AWS Batch, the CloudWatchLogStream might not be immediately created (and may never be created if say we fail to pull the docker image for any reason whatsoever). Metaflow will now simply retry again next time.

PR #209.

2.0.5 (Apr 30th, 2020)

01 May 01:22
76a54ae
Compare
Choose a tag to compare

Metaflow 2.0.5 Release Notes

  • Improvements
    • Fix logging of prefixes in datatools.S3._read_many_files.
    • Increase retry count for AWS Batch logs streaming.
    • Upper-bound pylint version to < 2.5.0 for compatibility issues.

The Metaflow 2.0.5 release is a minor patch release.

Improvements

Fix logging of prefixes in datatools.S3._read_many_files

Avoid a cryptic error message when datatools.S3._read_many_files is unsuccessful by converting prefixes from a generator to a list.

Increase retry count for AWS Batch logs streaming.

Modify the retry behavior for log fetching on AWS Batch by adding jitters to exponential backoffs as well as reset the retry counter for every successful request.

Additionally, fail the metaflow task when we fail to stream the task logs back to the user's terminal even if AWS Batch task succeeds.

Upper-bound pylint version to < 2.5.0.

pylint version 2.5.0 would mark Metaflow's self.next() syntax as an error. As a result, python helloworld.py run would fail at the pylint check step unless we run with --no-pylint. This version upper-bound is supposed to automatically downgrade pylint during metaflow installation if pylint==2.5.0 has been installed.