Skip to content

[BUG] Flyte Admin Creates New Workflow CRDs When Attempting To Create Duplicate Executions #6577

@Sovietaced

Description

@Sovietaced

Flyte & Flytekit version

Flyte v1.15.X

Describe the bug

We had some alerts fire that the P99 acceptance latency of workflows was greater than 30s. We use these alerts to understand if something is wrong with Flyte Propeller. The workflow latency was extremely high and didn't seem to match up with any recent executions.

Image

After doing some investigation I found a workflow that had completed several hours ago that was somehow re-evaluated by Flyte Propeller. After doing some additional investigation it appears that Flyte Scheduler was the trigger for the workflow as it was on a scheduled launch plan.
It appears that the first workflow execution failed, which was tracked in the propeller's terminated tracking store. A couple hours later the second workflow execution was triggered. This created a new CRD but did not create a new DB model since the previous execution already exists. The second workflow CRD was ignored because it was still tracked as terminated by Flyte Propeller until the statically configured LRU cache evicted the entry and the second workflow CRD was finally processed, thus creating a very large acceptance latency.

Expected behavior

I would expect that duplicate executions would not create a duplicate workflow CRD

Additional context to reproduce

No response

Screenshots

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions