-
Notifications
You must be signed in to change notification settings - Fork 745
Open
flyteorg/flytekit
#3307Labels
enhancementNew feature or requestNew feature or requestflytekitFlyteKit Python related issueFlyteKit Python related issuegood first issueGood for newcomersGood for newcomers
Description
Motivation: Why do you think this is important?
Running this workflow will upload the input df
of conc_prediction
15x with different filenames to the connected blob storage. Each instance of predict_wf gets his own version of input.
@task()
def load_model(name: str) -> pd.DataFrame:
return pd.DataFrame({name: [1, 2, 3, 4, 5]})
@task()
def predict_df(model: pd.DataFrame, n: int):
print(model)
print(n)
@workflow
def predict_wf(n: int, model: pd.DataFrame):
predict_df(model=model, n=n)
@dynamic()
def conc_prediction(input: pd.DataFrame):
for n in range(1, 15):
predict_wf(model=input, n=n)
@workflow
def wf():
output = load_model(name="foo")
conc_prediction(input=output)
Goal: What should the final outcome look like, ideally?
We should only serialize the Pandas dataframe and upload it once. Each predict_wf
should reuse the same input (parquet file).
We could probably add a local cache for the dynamic workflow. If the Python value has already been serialized, we can just load the literal from the cache
Describe alternatives you've considered
NA
Propose: Link/Inline OR Additional context
NA
Are you sure this issue hasn't been raised already?
- Yes
Have you read the Code of Conduct?
- Yes
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestflytekitFlyteKit Python related issueFlyteKit Python related issuegood first issueGood for newcomersGood for newcomers
Type
Projects
Status
In progress