Skip to content

[Core feature] Reuse same literals in the dynamic task  #6032

@pingsutw

Description

@pingsutw

Motivation: Why do you think this is important?

Running this workflow will upload the input df of conc_prediction 15x with different filenames to the connected blob storage. Each instance of predict_wf gets his own version of input.

@task()
def load_model(name: str) -> pd.DataFrame:
    return pd.DataFrame({name: [1, 2, 3, 4, 5]})


@task()
def predict_df(model: pd.DataFrame, n: int):
    print(model)
    print(n)


@workflow
def predict_wf(n: int, model: pd.DataFrame):
    predict_df(model=model, n=n)


@dynamic()
def conc_prediction(input: pd.DataFrame):
    for n in range(1, 15):
        predict_wf(model=input, n=n)


@workflow
def wf():
    output = load_model(name="foo")
    conc_prediction(input=output)

Goal: What should the final outcome look like, ideally?

We should only serialize the Pandas dataframe and upload it once. Each predict_wf should reuse the same input (parquet file).

We could probably add a local cache for the dynamic workflow. If the Python value has already been serialized, we can just load the literal from the cache

Describe alternatives you've considered

NA

Propose: Link/Inline OR Additional context

NA

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes

Metadata

Metadata

Labels

Type

No type

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions