Skip to content

Only restart failed libc++ jobs, not cancelled ones. #146397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 2, 2025

Conversation

EricWF
Copy link
Member

@EricWF EricWF commented Jun 30, 2025

Despite the error message for preempted jobs containing the words "cancelled", these are considered workflow "failures" by github.

This is important, because if we fail to distinguish between "failed" and "cancelled" jobs, the restarter will fight to restart jobs a user intentionally cancelled (either by pressing the "cancel" button, or by pushing an update to a PR).

This reverts commit 3ea7fc7. This also reverts earlier attempts to solve this problem by matching the messages to detect manual cancellations.

This change also removes ldionne's test workflow, as its hard to correctly keep in sync.

This change does not attempt to address the maintainability or testability of this script, which continues to be an issue. If asked to address these issues, my plan is to write the script in python (which most people are more familar with), and turn this action into a "docker action" using a container with the python action and dependencies built into it. Let me know if that's a direction we're interested in heading.

Despite the error message for preempted jobs containing the words "cancelled",
these are considered workflow "failures" by github.

This is important, because if we fail to distinguish between "failed"
and "cancelled" jobs, the restarter will fight to restart jobs a user
intentionally cancelled (either by pressing the "cancel" button, or by
pushing an update to a PR).

This reverts commit 3ea7fc7.
This also reverts earlier attempts to solve this problem by  matching
the messages to detect manual cancellations.

This change also removes ldionne's test workflow, as its hard
to correctly keep in sync.

This change does not attempt to address the maintainability or
testability of this script, which continues to be an issue. If asked to
address these issues, my plan is to write the script in python (which
most people are more familar with), and turn this action into a "docker
action" using a container with the python action and dependencies built
into it. Let me know if that's a direction we're interested in heading.
@EricWF EricWF requested a review from ldionne June 30, 2025 17:50
@llvmbot llvmbot added libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. github:workflow labels Jun 30, 2025
@EricWF EricWF requested a review from boomanaiden154 June 30, 2025 17:50
@llvmbot
Copy link
Member

llvmbot commented Jun 30, 2025

@llvm/pr-subscribers-github-workflow

Author: Eric (EricWF)

Changes

Despite the error message for preempted jobs containing the words "cancelled", these are considered workflow "failures" by github.

This is important, because if we fail to distinguish between "failed" and "cancelled" jobs, the restarter will fight to restart jobs a user intentionally cancelled (either by pressing the "cancel" button, or by pushing an update to a PR).

This reverts commit 3ea7fc7. This also reverts earlier attempts to solve this problem by matching the messages to detect manual cancellations.

This change also removes ldionne's test workflow, as its hard to correctly keep in sync.

This change does not attempt to address the maintainability or testability of this script, which continues to be an issue. If asked to address these issues, my plan is to write the script in python (which most people are more familar with), and turn this action into a "docker action" using a container with the python action and dependencies built into it. Let me know if that's a direction we're interested in heading.


Full diff: https://github.com/llvm/llvm-project/pull/146397.diff

1 Files Affected:

  • (modified) .github/workflows/libcxx-restart-preempted-jobs.yaml (+4-92)
diff --git a/.github/workflows/libcxx-restart-preempted-jobs.yaml b/.github/workflows/libcxx-restart-preempted-jobs.yaml
index 06ac6a2b7291b..accb84efb5c90 100644
--- a/.github/workflows/libcxx-restart-preempted-jobs.yaml
+++ b/.github/workflows/libcxx-restart-preempted-jobs.yaml
@@ -20,7 +20,7 @@ permissions:
 
 jobs:
   restart:
-    if: github.repository_owner == 'llvm' && (github.event.workflow_run.conclusion == 'failure' || github.event.workflow_run.conclusion == 'cancelled')
+    if: github.repository_owner == 'llvm' && (github.event.workflow_run.conclusion == 'failure')
     name: "Restart Job"
     permissions:
       statuses: read
@@ -35,8 +35,8 @@ jobs:
             // The "The run was canceled by" message comes from a user manually canceling a workflow
             // the "higher priority" message comes from github canceling a workflow because the user updated the change.
             // And the "exit code 1" message indicates a genuine failure.
-            const failure_regex = /(Process completed with exit code 1.)|(Canceling since a higher priority waiting request)|(The run was canceled by)/
-            const preemption_regex = /(The runner has received a shutdown signal)/
+            const failure_regex = /(Process completed with exit code 1.)/
+            const preemption_regex = /(The runner has received a shutdown signal)|(The operation was canceled)/
 
             const wf_run = context.payload.workflow_run
             core.notice(`Running on "${wf_run.display_title}" by @${wf_run.actor.login} (event: ${wf_run.event})\nWorkflow run URL: ${wf_run.html_url}`)
@@ -77,7 +77,7 @@ jobs:
                 console.log('Check run was not completed. Skipping.');
                 continue;
               }
-              if (check_run.conclusion != 'failure' && check_run.conclusion != 'cancelled') {
+              if (check_run.conclusion != 'failure') {
                 console.log('Check run had conclusion: ' + check_run.conclusion + '. Skipping.');
                 continue;
               }
@@ -156,91 +156,3 @@ jobs:
                 run_id: context.payload.workflow_run.id
               })
             await create_check_run('success', 'Restarted workflow run due to preempted job')
-
-  restart-test:
-    if: github.repository_owner == 'llvm' && (github.event.workflow_run.conclusion == 'failure' || github.event.workflow_run.conclusion == 'cancelled') && github.event.actor.login == 'ldionne' # TESTING ONLY
-    name: "Restart Job (test)"
-    permissions:
-      statuses: read
-      checks: write
-      actions: write
-    runs-on: ubuntu-24.04
-    steps:
-      - name: "Restart Job (test)"
-        uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea #v7.0.1
-        with:
-          script: |
-            const FAILURE_REGEX = /(Process completed with exit code 1.)|(Canceling since a higher priority waiting request)|(The run was canceled by)/
-            const PREEMPTION_REGEX = /(The runner has received a shutdown signal)|(The operation was canceled)/
-
-            function log(msg) {
-              core.notice(msg)
-            }
-
-            const wf_run = context.payload.workflow_run
-            log(`Running on "${wf_run.display_title}" by @${wf_run.actor.login} (event: ${wf_run.event})\nWorkflow run URL: ${wf_run.html_url}`)
-
-            log('Listing check runs for suite')
-            const check_suites = await github.rest.checks.listForSuite({
-              owner: context.repo.owner,
-              repo: context.repo.repo,
-              check_suite_id: context.payload.workflow_run.check_suite_id,
-              per_page: 100 // FIXME: We don't have 100 check runs yet, but we should handle this better.
-            })
-
-            preemptions = [];
-            legitimate_failures = [];
-            for (check_run of check_suites.data.check_runs) {
-              log(`Checking check run: ${check_run.id}`);
-              if (check_run.status != 'completed') {
-                log('Check run was not completed. Skipping.');
-                continue;
-              }
-
-              if (check_run.conclusion != 'failure' && check_run.conclusion != 'cancelled') {
-                log(`Check run had conclusion: ${check_run.conclusion}. Skipping.`);
-                continue;
-              }
-
-              annotations = await github.rest.checks.listAnnotations({
-                owner: context.repo.owner,
-                repo: context.repo.repo,
-                check_run_id: check_run.id
-              })
-
-              preemption_annotation = annotations.data.find(function(annotation) {
-                return annotation.annotation_level == 'failure' &&
-                       annotation.message.match(PREEMPTION_REGEX) != null;
-              });
-              if (preemption_annotation != null) {
-                log(`Found preemption message: ${preemption_annotation.message}`);
-                preemptions.push(check_run);
-                break;
-              }
-
-              failure_annotation = annotations.data.find(function(annotation) {
-                return annotation.annotation_level == 'failure' &&
-                       annotation.message.match(FAILURE_REGEX) != null;
-              });
-              if (failure_annotation != null) {
-                log(`Found legitimate failure annotation: ${failure_annotation.message}`);
-                legitimate_failures.push(check_run);
-                break;
-              }
-            }
-
-            if (preemptions) {
-              log('Found some preempted jobs');
-              if (legitimate_failures) {
-                log('Also found some legitimate failures, so not restarting the workflow.');
-              } else {
-                log('Did not find any legitimate failures. Restarting workflow.');
-                await github.rest.actions.reRunWorkflowFailedJobs({
-                  owner: context.repo.owner,
-                  repo: context.repo.repo,
-                  run_id: context.payload.workflow_run.id
-                })
-              }
-            } else {
-              log('Did not find any preempted jobs. Not restarting the workflow.');
-            }

@llvmbot
Copy link
Member

llvmbot commented Jun 30, 2025

@llvm/pr-subscribers-libcxx

Author: Eric (EricWF)

Changes

Despite the error message for preempted jobs containing the words "cancelled", these are considered workflow "failures" by github.

This is important, because if we fail to distinguish between "failed" and "cancelled" jobs, the restarter will fight to restart jobs a user intentionally cancelled (either by pressing the "cancel" button, or by pushing an update to a PR).

This reverts commit 3ea7fc7. This also reverts earlier attempts to solve this problem by matching the messages to detect manual cancellations.

This change also removes ldionne's test workflow, as its hard to correctly keep in sync.

This change does not attempt to address the maintainability or testability of this script, which continues to be an issue. If asked to address these issues, my plan is to write the script in python (which most people are more familar with), and turn this action into a "docker action" using a container with the python action and dependencies built into it. Let me know if that's a direction we're interested in heading.


Full diff: https://github.com/llvm/llvm-project/pull/146397.diff

1 Files Affected:

  • (modified) .github/workflows/libcxx-restart-preempted-jobs.yaml (+4-92)
diff --git a/.github/workflows/libcxx-restart-preempted-jobs.yaml b/.github/workflows/libcxx-restart-preempted-jobs.yaml
index 06ac6a2b7291b..accb84efb5c90 100644
--- a/.github/workflows/libcxx-restart-preempted-jobs.yaml
+++ b/.github/workflows/libcxx-restart-preempted-jobs.yaml
@@ -20,7 +20,7 @@ permissions:
 
 jobs:
   restart:
-    if: github.repository_owner == 'llvm' && (github.event.workflow_run.conclusion == 'failure' || github.event.workflow_run.conclusion == 'cancelled')
+    if: github.repository_owner == 'llvm' && (github.event.workflow_run.conclusion == 'failure')
     name: "Restart Job"
     permissions:
       statuses: read
@@ -35,8 +35,8 @@ jobs:
             // The "The run was canceled by" message comes from a user manually canceling a workflow
             // the "higher priority" message comes from github canceling a workflow because the user updated the change.
             // And the "exit code 1" message indicates a genuine failure.
-            const failure_regex = /(Process completed with exit code 1.)|(Canceling since a higher priority waiting request)|(The run was canceled by)/
-            const preemption_regex = /(The runner has received a shutdown signal)/
+            const failure_regex = /(Process completed with exit code 1.)/
+            const preemption_regex = /(The runner has received a shutdown signal)|(The operation was canceled)/
 
             const wf_run = context.payload.workflow_run
             core.notice(`Running on "${wf_run.display_title}" by @${wf_run.actor.login} (event: ${wf_run.event})\nWorkflow run URL: ${wf_run.html_url}`)
@@ -77,7 +77,7 @@ jobs:
                 console.log('Check run was not completed. Skipping.');
                 continue;
               }
-              if (check_run.conclusion != 'failure' && check_run.conclusion != 'cancelled') {
+              if (check_run.conclusion != 'failure') {
                 console.log('Check run had conclusion: ' + check_run.conclusion + '. Skipping.');
                 continue;
               }
@@ -156,91 +156,3 @@ jobs:
                 run_id: context.payload.workflow_run.id
               })
             await create_check_run('success', 'Restarted workflow run due to preempted job')
-
-  restart-test:
-    if: github.repository_owner == 'llvm' && (github.event.workflow_run.conclusion == 'failure' || github.event.workflow_run.conclusion == 'cancelled') && github.event.actor.login == 'ldionne' # TESTING ONLY
-    name: "Restart Job (test)"
-    permissions:
-      statuses: read
-      checks: write
-      actions: write
-    runs-on: ubuntu-24.04
-    steps:
-      - name: "Restart Job (test)"
-        uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea #v7.0.1
-        with:
-          script: |
-            const FAILURE_REGEX = /(Process completed with exit code 1.)|(Canceling since a higher priority waiting request)|(The run was canceled by)/
-            const PREEMPTION_REGEX = /(The runner has received a shutdown signal)|(The operation was canceled)/
-
-            function log(msg) {
-              core.notice(msg)
-            }
-
-            const wf_run = context.payload.workflow_run
-            log(`Running on "${wf_run.display_title}" by @${wf_run.actor.login} (event: ${wf_run.event})\nWorkflow run URL: ${wf_run.html_url}`)
-
-            log('Listing check runs for suite')
-            const check_suites = await github.rest.checks.listForSuite({
-              owner: context.repo.owner,
-              repo: context.repo.repo,
-              check_suite_id: context.payload.workflow_run.check_suite_id,
-              per_page: 100 // FIXME: We don't have 100 check runs yet, but we should handle this better.
-            })
-
-            preemptions = [];
-            legitimate_failures = [];
-            for (check_run of check_suites.data.check_runs) {
-              log(`Checking check run: ${check_run.id}`);
-              if (check_run.status != 'completed') {
-                log('Check run was not completed. Skipping.');
-                continue;
-              }
-
-              if (check_run.conclusion != 'failure' && check_run.conclusion != 'cancelled') {
-                log(`Check run had conclusion: ${check_run.conclusion}. Skipping.`);
-                continue;
-              }
-
-              annotations = await github.rest.checks.listAnnotations({
-                owner: context.repo.owner,
-                repo: context.repo.repo,
-                check_run_id: check_run.id
-              })
-
-              preemption_annotation = annotations.data.find(function(annotation) {
-                return annotation.annotation_level == 'failure' &&
-                       annotation.message.match(PREEMPTION_REGEX) != null;
-              });
-              if (preemption_annotation != null) {
-                log(`Found preemption message: ${preemption_annotation.message}`);
-                preemptions.push(check_run);
-                break;
-              }
-
-              failure_annotation = annotations.data.find(function(annotation) {
-                return annotation.annotation_level == 'failure' &&
-                       annotation.message.match(FAILURE_REGEX) != null;
-              });
-              if (failure_annotation != null) {
-                log(`Found legitimate failure annotation: ${failure_annotation.message}`);
-                legitimate_failures.push(check_run);
-                break;
-              }
-            }
-
-            if (preemptions) {
-              log('Found some preempted jobs');
-              if (legitimate_failures) {
-                log('Also found some legitimate failures, so not restarting the workflow.');
-              } else {
-                log('Did not find any legitimate failures. Restarting workflow.');
-                await github.rest.actions.reRunWorkflowFailedJobs({
-                  owner: context.repo.owner,
-                  repo: context.repo.repo,
-                  run_id: context.payload.workflow_run.id
-                })
-              }
-            } else {
-              log('Did not find any preempted jobs. Not restarting the workflow.');
-            }

@EricWF EricWF requested review from a team and removed request for ldionne and boomanaiden154 June 30, 2025 19:01
@EricWF EricWF merged commit d78036f into llvm:main Jul 2, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
github:workflow libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants