Skip to content

Releases: Netflix/metaflow

2.3.3 (Jul 29th, 2021)

30 Jul 00:24
9f832e6
Compare
Choose a tag to compare

Metaflow 2.3.3 Release Notes

The Metaflow 2.3.3 release is a patch release.

Features

Support resource tags for Metaflow's integration with AWS Batch

Metaflow now supports setting resource tags for AWS Batch jobs and propagating them to the underlying ECS tasks. The following tags are attached to the AWS Batch jobs now -

  • metaflow.flow_name
  • metaflow.run_id
  • metaflow.step_name
  • metaflow.user / metaflow.owner
  • metaflow.version
  • metaflow.production_token

To enable this feature, set the environment variable (or alternatively in the metaflow config) METAFLOW_BATCH_EMIT_TAGS to True. Keep in mind that the IAM role (MetaflowUserRole, StepFunctionsRole) submitting the jobs to AWS Batch will need to have the Batch:TagResource permission.

Bug Fixes

Properly handle None as defaults for parameters for AWS Step Functions execution

Prior to this release, a parameter specification like -

Parameter(name="test_param", type=int, default=None)

will result in an error even though the default has been specified

Flow failed:
    The value of parameter test_param is ambiguous. It does not have a default and it is not required.

This release fixes this behavior by allowing the flow to execute as it would locally.

Fix return value of IncludeFile artifacts

The IncludeFile parameter would return JSONified metadata about the file rather than the file contents when accessed through the Metaflow Client. This release fixes that behavior by returning instead the file contents, just like any other Metaflow data artifact.

2.3.2 (Jun 29th 2021)

29 Jun 22:17
bd9b594
Compare
Choose a tag to compare

Metaflow 2.3.2 Release Notes

The Metaflow 2.3.2 release is a minor release.

  • Features
    • step-functions trigger command now supports --run-id-file option

Features

step-functions trigger command now supports --run-id-file option

Similar to run , you can now pass --run-id-file option to step-function trigger. Metaflow then will write the triggered run id to the specified file. This is useful if you have additional scripts that require the run id to examine the run or wait until it finishes.

2.3.1 (Jun 23rd 2021)

23 Jun 18:29
4b82c58
Compare
Choose a tag to compare

Metaflow 2.3.1 Release Notes

The Metaflow 2.3.1 release is a minor release.

Features

Performance optimizations for merge_artifacts

Prior to this release, FlowSpec.merge_artifacts was loading all of the merged artifacts into memory after doing all of the consistency checks with hashes. This release now avoids the memory and compute costs of decompressing, de-pickling, re-pickling, and recompressing each merged artifact - resulting in improved performance of merge_artifacts.

2.3.0 (May 27th. 2021)

27 May 19:13
0b5cf10
Compare
Choose a tag to compare

Metaflow 2.3.0 Release Notes

The Metaflow 2.3.0 release is a minor release.

Features

Coordinate larger Metaflow projects with @project

It's not uncommon for multiple people to work on the same workflow simultaneously. Metaflow makes it possible by keeping executions isolated through independently stored artifacts and namespaces. However, by default, all AWS Step Functions deployments are bound to the name of the workflow. If multiple people call step-functions create independently, each deployment will overwrite the previous one.
In the early stages of a project, this simple model is convenient but as the project grows, it is desirable that multiple people can test their own AWS Step Functions deployments without interference. Or, as a single developer, you may want to experiment with multiple independent AWS Step Functions deployments of their workflow.
This release introduces a @project decorator to address this need. The @project decorator is used at the FlowSpec-level to bind a Flow to a specific project. All flows with the same project name belong to the same project.

from metaflow import FlowSpec, step, project, current

@project(name='example_project')
class ProjectFlow(FlowSpec):

    @step
    def start(self):
        print('project name:', current.project_name)
        print('project branch:', current.branch_name)
        print('is this a production run?', current.is_production)
        self.next(self.end)

    @step
    def end(self):
        pass

if __name__ == '__main__':
    ProjectFlow()
python flow.py run

The flow works exactly as before when executed outside AWS Step Functions and introduces project_name, branch_name & is_production in the current object.

On AWS Step Functions, however, step-functions create will create a new workflow example_project.user.username.ProjectFlow (where username is your user name) with a user-specific isolated namespace and a separate production token.

For deploying experimental (test) versions that can run in parallel with production, you can deploy custom branches with --branch

python flow.py --branch foo step-functions create

To deploy a production version, you can deploy with --production flag (or pair it up with --branch if you want to run multiple variants in production)

python project_flow.py --production step-functions create

Note that the isolated namespaces offered by @project work best when your code is designed to respect these boundaries. For instance, when writing results to a table, you can use current.branch_name to choose the table to write to or you can disable writes outside production by checking current.is_production.

Hyphenated-parameters support in AWS Step Functions

Prior to this release, hyphenated parameters in AWS Step Functions weren't supported through CLI.

from metaflow import FlowSpec, Parameter, step

class ParameterFlow(FlowSpec):
    foo_bar = Parameter('foo-bar',
                      help='Learning rate',
                      default=0.01)

    @step
    def start(self):
        print('foo_bar is %f' % self.foo_bar)
        self.next(self.end)

    @step
    def end(self):
        print('foo_bar is still %f' % self.foo_bar)

if __name__ == '__main__':
    ParameterFlow()

Now, users can create their flows as usual on AWS Step Functions (with step-functions create) and trigger the deployed flows through CLI with hyphenated parameters -

python flow.py step-functions trigger --foo-bar 42

State Machine execution history logging for AWS Step Functions

Metaflow now logs State Machine execution history in AWS CloudWatch Logs for deployed Metaflow flows. You can enable it by specifying --log-execution-history flag while creating the state machine

python flow.py step-functions create --log-execution-history

Note that you would need to set the environment variable (or alternatively in your Metaflow config) METAFLOW_SFN_EXECUTION_LOG_GROUP_ARN to your AWS CloudWatch Logs Log Group ARN to pipe the execution history logs to AWS CloudWatch Logs

2.2.13 (May 19th, 2021)

20 May 00:53
bb2dd71
Compare
Choose a tag to compare

Metaflow 2.2.13 Release Notes

The Metaflow 2.2.13 release is a minor patch release.

Bug Fixes

Handle regression with @batch execution on certain docker images

Certain docker images override the entrypoint by executing eval on the user-supplied command. The 2.2.10 release impacted these docker images where we modified the entrypoint to support datastore based logging. This release fixes that regression.

2.2.12 (May 18th, 2021)

19 May 01:21
ae9dcab
Compare
Choose a tag to compare

Metaflow 2.2.12 Release Notes

The Metaflow 2.2.12 release is a minor patch release.

Features

Add capability to override AWS Step Functions state machine name while deploying flows to AWS Step Functions

Prior to this release, the State Machines created by Metaflow while deploying flows to AWS Step Functions had the same name as that of the flow. With this release, Metaflow users can now override the name of the State Machine created by passing in a --name argument : python flow.py step-functions --name foo create or python flow.py step-functions --name foo trigger.

Introduce heartbeats for Metaflow flows

Metaflow now registers heartbeats at the run level and the task level for all flow executions (with the exception of flows running on AWS Step Functions where only task-level heartbeats are captured). This provides the necessary metadata to ascertain if a run/task has been lost. Subsequent releases of Metaflow will expose this information through the client.

Bug Fixes

Handle regression with Click >=8.0.x

The latest release of Click (8.0.0) broke certain idempotency assumptions in Metaflow which PR #526 addresses.

2.2.11 (Apr 30th, 2021)

30 Apr 21:25
8fac145
Compare
Choose a tag to compare

Metaflow 2.2.11 Release Notes

The Metaflow 2.2.11 release is a minor patch release.

Bug Fixes

Fix regression that broke compatibility with Python 2.7

shlex.quote, introduced in #493, is not compatible with Python 2.7. pipes.quote is now used for Python 2.7.

Fix a corner case when converting options to CL arguments

Some plugins may need to escape shell variables when using them in command lines. This patch allows this to work.

Fix a bug in case of a hard crash in a step

In some cases, a hard crash in a step would cause the status of the step to not be properly reported.

The Conda environment now delegates to the default environment properly for get_environment_info

The Conda environment now delegates get_environment_info to the DEFAULT_ENVIRONMENT as opposed to the MetaflowEnvironment. This does not change the current default behavior.

2.2.10 (Apr 22nd, 2021)

22 Apr 20:56
015e1c9
Compare
Choose a tag to compare

Metaflow 2.2.10 Release Notes

The Metaflow 2.2.10 release is a minor patch release.

Features

AWS Logs Group, Region and Stream are now available in metadata for tasks executed on AWS Batch

For tasks that execute on AWS Batch, Metaflow now records the location where the AWS Batch instance writes the container logs in AWS Logs. This can be handy in locating the logs through the client API -

Step('Flow/42/a').task.metadata_dict['aws-batch-awslogs-group']
Step('Flow/42/a').task.metadata_dict['aws-batch-awslogs-region']
Step('Flow/42/a').task.metadata_dict['aws-batch-awslogs-stream']

PR: #478

Execution logs are now available for all tasks in Metaflow universe

All Metaflow runtime/task logs are now published via a sidecar process to the datastore. The user-visible logs on the console are streamed directly from the datastore. For Metaflow's integrations with the cloud (AWS at the moment), the compute tasks logs (AWS Batch) are directly written by Metaflow into the datastore (Amazon S3) independent of where the flow is launched from (User's laptop or AWS Step Functions). This has multiple benefits

  • Metaflow no longer relies on AWS Cloud Watch for fetching the AWS Batch execution logs to the console - AWS Cloud Watch has rather low global API limits which have caused multiple issues in the past for our users
  • Logs for AWS Step Functions executions are now also available in Amazon S3 and can be easily fetched by simply doing python flow.py logs 42/start or Step('Flow/42/start').task.stdout. PR: #449

Bug Fixes

Fix regression with ping/ endpoint for Metadata service

Fix a regression introduced in v2.2.9 where the endpoint responsible for ascertaining the version of the deployed Metadata service was erroneously moved to ping/ from ping PR: #484

Fix the behaviour of --namespace= CLI args when executing a flow

python flow.py run --namespace= now correctly makes the global namespace visible within the flow execution. PR: #461

Metaflow 2.2.9 (April 19th, 2021)

19 Apr 22:28
Compare
Choose a tag to compare

Metaflow 2.2.9 Release Notes

The Metaflow 2.2.9 release is a minor patch release.

Bugs

Remove pinned pylint dependency

Pylint dependency was unpinned and made floating. See PR #462.

Improve handling of / in image parameter for batch

You are now able to specify docker images of the form foo/bar/baz:tag in the batch decorator. See PR #466.

List custom FlowSpec parameters in the intended order

The order in which parameters are specified by the user in the FlowSpec is now preserved when displaying them with --help. See PR #456.

2.2.8 (Mar 15th, 2021)

15 Mar 19:56
dac1301
Compare
Choose a tag to compare

Metaflow 2.2.8 Release Notes

The Metaflow 2.2.8 release is a minor patch release.

Bugs

Fix @environment behavior for conflicting attribute values

Metaflow was incorrectly handling environment variables passed through the @environment decorator in some specific instances. When @environment decorator is specified over multiple steps, the actual environment that's available to any step is the union of attributes of all the @environment decorators; which is incorrect behavior. For example, in the following workflow -

from metaflow import FlowSpec, step, batch, environment
import os
class LinearFlow(FlowSpec):
    @environment(vars={'var':os.getenv('var_1')})
    @step
    def start(self):
        print(os.getenv('var'))
        self.next(self.a)
    @environment(vars={'var':os.getenv('var_2')})
    @step
    def a(self):
        print(os.getenv('var'))
        self.next(self.end)
    @step
    def end(self):
        pass
if __name__ == '__main__':
    LinearFlow()
var_1=foo var_2=bar python flow.py run

will result in

Metaflow 2.2.7.post10+gitb7d4c48 executing LinearFlow for user:savin
Validating your flow...
    The graph looks good!
Running pylint...
    Pylint is happy!
2021-03-12 20:46:04.161 Workflow starting (run-id 6810):
2021-03-12 20:46:04.614 [6810/start/86638 (pid 10997)] Task is starting.
2021-03-12 20:46:06.783 [6810/start/86638 (pid 10997)] foo
2021-03-12 20:46:07.815 [6810/start/86638 (pid 10997)] Task finished successfully.
2021-03-12 20:46:08.390 [6810/a/86639 (pid 11003)] Task is starting.
2021-03-12 20:46:10.649 [6810/a/86639 (pid 11003)] foo
2021-03-12 20:46:11.550 [6810/a/86639 (pid 11003)] Task finished successfully.
2021-03-12 20:46:12.145 [6810/end/86640 (pid 11009)] Task is starting.
2021-03-12 20:46:15.382 [6810/end/86640 (pid 11009)] Task finished successfully.
2021-03-12 20:46:15.563 Done!

Note the output for the step a which should have been bar. PR #452 fixes the issue.

Fix environment is not callable error when using @environment

Using @environment would often result in an error from pylint - E1102: environment is not callable (not-callable). Users were getting around this issue by launching their flows with --no-pylint. PR #451 fixes this issue.