Skip to content

Conversation

@jfversluis
Copy link
Member

@jfversluis jfversluis commented Dec 16, 2025

Note

Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!

Description of Change

This PR adds automatic category detection for UI tests in pull requests. When a PR adds new UI tests with [Category(...)] attributes, only the corresponding test categories will run (though all matrix jobs still spin up, non-matching ones complete quickly with 0 tests).

How it works:

  1. Discovery stage (discover_ui_test_categories) - Only runs for PRs:

    • Runs the detect-ui-test-categories.ps1 script
    • Parses the git diff to find added lines with [Category(UITestCategories.X)] patterns
    • Outputs the detected categories as a comma-separated list
  2. Test stages - All stages now depend on discovery:

    • Capture detected categories from the discovery stage output
    • Each matrix job runs a "Calculate test filter" step that:
      • For non-PR builds or when no categories detected: runs all categories
      • For PR builds with detected categories: only runs categories that match
    • Uses _NO_MATCHING_CATEGORY_ as filter for non-matching jobs (runs 0 tests but completes quickly)

Files changed:

  • eng/scripts/detect-ui-test-categories.ps1 - New PowerShell script for category detection
  • eng/pipelines/common/ui-tests.yml - Updated pipeline with discovery stage and filtering logic

Test Validation

This PR includes a dummy Button test to validate the detection is working. The discovery stage should detect "Button" category, and only the Button matrix jobs should run actual tests.

Issues Fixed

Reduces UI test run time for PRs by only running relevant test categories.

- Update android_ui_tests, android_ui_tests_coreclr, ios_ui_tests_mono,
  ios_ui_tests_nativeaot, winui_ui_tests, and mac_ui_tests stages
- Add dependency on discover_ui_test_categories stage
- Add stage variables to capture detected categories from discovery
- Add Calculate test filter step to each job that:
  - For PRs with detected categories: filters to only matching categories
  - For non-PR builds or when no categories detected: runs all categories
  - Uses _NO_MATCHING_CATEGORY_ filter when no categories match (runs 0 tests)
- Update testFilter parameter to use EffectiveTestFilter variable
Remove stage-level variables and use direct stage dependency expression
in job-level variables instead. This ensures the detected categories
are properly accessible within each matrix job.

Changed from:
  stage:
    variables:
      detectedCategories: $[ stageDep... ]
    jobs:
      - job:
        variables:
          DETECTED_CATEGORIES: $(detectedCategories)

To:
  stage:
    jobs:
      - job:
        variables:
          DETECTED_CATEGORIES: $[ stageDep... ]
The testFilter parameter uses $(CATEGORYGROUP) which is already wired up
correctly in the ui-tests-steps.yml template. Instead of creating a new
EffectiveTestFilter variable and trying to pass it, simply overwrite
the existing CATEGORYGROUP variable with the filtered value.

This leverages the existing parameter passing mechanism that already works.
Matrix variables are readonly in Azure Pipelines and cannot be overwritten.
Create a new EFFECTIVE_FILTER variable instead and use that for testFilter.
Adopted the approach from PR #32935:
- Pass CATEGORYGROUP, DETECTED_CATEGORIES via env block
- Move intersection logic into ui-tests-steps.yml PowerShell
- Remove duplicate 'Calculate test filter' steps from all stages
- Read variables from environment instead of pipeline expressions

This simplifies the pipeline by consolidating the filtering logic
in one place rather than repeating it in every test stage.
When there are no matching categories for a PR, exit with success
immediately instead of running tests with a fake filter. This truly
skips the test execution rather than running the test framework
with no tests matching.
- Added early 'Check if category should run' step in ui-tests-steps.yml
- Set SHOULD_RUN_TESTS variable based on category match
- Added condition to skip: artifact downloads, clean bot, KVM enable,
  notification center disable, defaults modify, .NET install, PATH add,
  screen resolution, node install, Appium install, test execution

This saves time for PR builds where only specific test categories are
modified, by skipping setup for non-matching category groups.
Skip expensive provisioning steps (JDK, Android SDK, .NET SDK, Xcode,
simulator runtimes) when SHOULD_RUN_TESTS is False. This saves significant
build time when a UI test category doesn't match the detected categories
from the PR changes.

Uses a single early check step that sets SKIP_PROVISIONING variable,
which is then checked by only the expensive steps.
@jfversluis jfversluis force-pushed the feature/detect-uitest-categories branch from 6a983cf to f060cc5 Compare December 17, 2025 13:52
- Add discover_ui_test_categories dependency to CV2 and CARV2 stages
- Add DETECTED_CATEGORIES variable to CV2 and CARV2 jobs
- Update early check to use testFilter parameter as fallback when
  CATEGORYGROUP is not set (for non-matrix stages like CV2/CARV2)

Now CV2 will only run if CollectionView is detected in PR changes,
and CARV2 will only run if CarouselView is detected.
When CATEGORYGROUP variable is not defined (CV2/CARV2 stages without
matrix), Azure Pipelines passes the literal string '$(CATEGORYGROUP)'
instead of empty. Now we check for both empty AND the literal
unexpanded variable string before falling back to testFilter.
@jfversluis jfversluis added this to the .NET 10.0 SR3 milestone Dec 18, 2025
@jfversluis jfversluis added the p/0 Current heighest priority issues that we are targeting for a release. label Dec 18, 2025
@jfversluis jfversluis marked this pull request as ready for review December 18, 2025 12:54
Copilot AI review requested due to automatic review settings December 18, 2025 12:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces automatic UI test category detection for pull requests to optimize test execution time. When a PR adds new UI tests with [Category(...)] attributes, the pipeline detects these categories and runs only the relevant test matrix jobs, while non-matching jobs complete quickly with 0 tests.

Key Changes:

  • Adds a discovery stage that parses git diffs to detect new test categories in PRs
  • Implements early filtering logic to skip provisioning and test execution for non-matching category groups
  • Updates all test stages to depend on category discovery and use detected categories for filtering
  • Includes temporary dummy tests in ButtonUITests and LabelUITests to validate the detection mechanism

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
eng/scripts/detect-ui-test-categories.ps1 New PowerShell script that parses git diffs to extract Category attributes from added test code
eng/pipelines/common/ui-tests.yml Adds discovery stage and updates test stage dependencies to use detected categories
eng/pipelines/common/ui-tests-steps.yml Implements early category check and dynamic test filter calculation based on detected categories
eng/pipelines/common/provision.yml Adds skip logic to bypass provisioning when no tests will run for a category group
src/Controls/tests/TestCases.Shared.Tests/Tests/ButtonUITests.cs Adds dummy test to validate Button category detection
src/Controls/tests/TestCases.Shared.Tests/Tests/LabelUITests.cs Adds dummy test to validate Label category detection

Comment on lines +29 to +34
try {
git fetch origin "${targetBranch}" --no-tags --prune --depth=200 | Out-Null
} catch {
Write-Warning "Failed to fetch origin/${targetBranch}: $($_.Exception.Message). Falling back to running all categories."
return
}
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script uses Write-Warning but then returns early, which means detection silently falls back to running all categories. This could mask real problems like git configuration issues or network failures. Consider logging more explicitly what fallback behavior will occur, or consider whether some failures (like git fetch failures) should fail the pipeline rather than silently fall back.

Copilot uses AI. Check for mistakes.
Comment on lines +76 to +77
Write-Warning "Unrecognized category expression '$rawValue'. Falling back to running all categories."
return
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When an unrecognized category expression is found (line 76), the script returns early and falls back to running all categories. This is overly conservative - if a developer adds a new test with a malformed category attribute, it would be better to fail the pipeline or at least log more prominently. The current behavior means malformed category attributes will go unnoticed and cause all tests to run, defeating the purpose of the optimization.

Suggested change
Write-Warning "Unrecognized category expression '$rawValue'. Falling back to running all categories."
return
$message = "Unrecognized category expression '$rawValue'. Expected formats: UITestCategories.<Name>, nameof(UITestCategories.<Name>), or a quoted category string. Failing category detection."
Write-Error $message
throw $message

Copilot uses AI. Check for mistakes.
return
}

$categoryPattern = '^[+]{1}\s*\[Category\((?<value>[^\)]*)\)\]'
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern uses a single + quantifier which requires exactly one + character. In git diff output, added lines are prefixed with a single +, but the pattern ^[+]{1} is more verbose than necessary. The pattern could be simplified to ^\+ or ^[+] for better readability, though the current pattern is functionally correct.

Suggested change
$categoryPattern = '^[+]{1}\s*\[Category\((?<value>[^\)]*)\)\]'
$categoryPattern = '^\+\s*\[Category\((?<value>[^\)]*)\)\]'

Copilot uses AI. Check for mistakes.
Comment on lines +220 to +228
foreach ($cat in $categoryList) {
$cat = $cat.Trim()
foreach ($det in $detectedList) {
$det = $det.Trim()
if ($cat -eq $det) {
$matchingCategories += $cat
break
}
}
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The category comparison uses case-sensitive string equality ($cat -eq $det). However, the HashSet in the PowerShell detection script (detect-ui-test-categories.ps1 line 57) uses case-insensitive comparison ([System.StringComparer]::OrdinalIgnoreCase). This inconsistency could cause matching failures if category names differ in case. Use case-insensitive comparison here as well with -ieq or .ToLower() for consistency.

Copilot uses AI. Check for mistakes.
vmImage: ubuntu-latest
steps:
- checkout: self
fetchDepth: 0
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checkout step uses fetchDepth: 0 which fetches the entire git history. This is quite expensive and can slow down the discovery job significantly for repositories with long history. Consider using a more reasonable depth (e.g., fetchDepth: 200) since the script already uses --depth=200 when fetching the target branch at line 30. This would be consistent and still provide enough history for most PR comparisons.

Suggested change
fetchDepth: 0
fetchDepth: 200

Copilot uses AI. Check for mistakes.
Comment on lines +75 to +79
condition: |
or(
ne(variables['Build.Reason'], 'PullRequest'),
in(dependencies.discover_ui_test_categories.result, 'Succeeded', 'Skipped')
)
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition uses in(dependencies.discover_ui_test_categories.result, 'Succeeded', 'Skipped') which allows the build to proceed even if discovery was skipped. However, for non-PR builds, the discovery stage has condition: eq(variables['Build.Reason'], 'PullRequest') which means it will be skipped, not produce any output variables. The test stages depend on these output variables (DETECTED_CATEGORIES), so they need to handle the case where these variables are empty or undefined. Verify that the downstream logic correctly handles undefined DETECTED_CATEGORIES for non-PR builds.

Suggested change
condition: |
or(
ne(variables['Build.Reason'], 'PullRequest'),
in(dependencies.discover_ui_test_categories.result, 'Succeeded', 'Skipped')
)
condition: succeeded()

Copilot uses AI. Check for mistakes.
Comment on lines +48 to +59
foreach ($cat in $categoryList) {
$cat = $cat.Trim()
foreach ($det in $detectedList) {
$det = $det.Trim()
if ($cat -eq $det) {
$hasMatch = $true
Write-Host "Match found: '$cat'"
break
}
}
if ($hasMatch) { break }
}
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The category comparison at lines 52 and 224 uses case-sensitive string equality ($cat -eq $det). However, the HashSet used for collecting categories in the PowerShell detection script uses case-insensitive comparison (line 57: [System.StringComparer]::OrdinalIgnoreCase). This inconsistency could cause matching failures if category names differ in case between the detection script output and the matrix CATEGORYGROUP values. Use case-insensitive comparison here as well with -ieq or .ToLower() for consistency.

Copilot uses AI. Check for mistakes.
Comment on lines 196 to 243
$testFilter = ""
$testConfigrationArgs = "${{ parameters.testConfigurationArgs }}"
"${{ parameters.testFilter }}".Split(",") | ForEach {
$testFilter += "TestCategory=" + $_ + "|"
# Get test filter from environment variable (passed from matrix via env block)
$testFilterParam = $env:CATEGORY_GROUP
$detectedCategories = $env:DETECTED_CATEGORIES
$isPR = $env:BUILD_REASON -eq "PullRequest"
Write-Host "Category Group from matrix: '$testFilterParam'"
Write-Host "Detected Categories: '$detectedCategories'"
# Fallback to parameter for non-matrix builds
if (-not $testFilterParam) {
$testFilterParam = "${{ parameters.testFilter }}"
Write-Host "Using testFilter parameter: '$testFilterParam'"
}
# For PRs with detected categories, use only matching categories
if ($isPR -and -not [string]::IsNullOrWhiteSpace($detectedCategories) -and -not [string]::IsNullOrWhiteSpace($testFilterParam)) {
$categoryList = $testFilterParam -split ","
$detectedList = $detectedCategories -split ","
$matchingCategories = @()
foreach ($cat in $categoryList) {
$cat = $cat.Trim()
foreach ($det in $detectedList) {
$det = $det.Trim()
if ($cat -eq $det) {
$matchingCategories += $cat
break
}
}
}
$testFilterParam = $matchingCategories -join ","
Write-Host "Running matching categories: $testFilterParam"
}
Write-Host "Final test filter: '$testFilterParam'"
if ($testFilterParam) {
$testFilterParam.Split(",") | ForEach {
$testFilter += "TestCategory=" + $_ + "|"
}
$testFilter = $testFilter.TrimEnd("|")
}
$testFilter = $testFilter.TrimEnd("|")
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is duplicated category matching logic between the early check step (lines 20-69) and the test filter calculation step (lines 200-233). Both steps perform the same category matching operation with identical logic. Consider extracting this into a shared function or removing one of the checks. Since the early check already determines whether tests should run and sets SHOULD_RUN_TESTS, the test filter calculation step could be simplified to only handle the filter string construction when SHOULD_RUN_TESTS is True.

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +69
- pwsh: |
$testFilterParam = $env:CATEGORY_GROUP
$testFilterFallback = $env:TEST_FILTER_PARAM
$detectedCategories = $env:DETECTED_CATEGORIES
$isPR = $env:BUILD_REASON -eq "PullRequest"
Write-Host "=== Early Category Check ==="
Write-Host "Build Reason: $env:BUILD_REASON"
Write-Host "Category Group (from matrix): '$testFilterParam'"
Write-Host "Test Filter (from parameter): '$testFilterFallback'"
Write-Host "Detected Categories: '$detectedCategories'"
# Use testFilter parameter as fallback when CATEGORYGROUP is not set or not expanded
# When CATEGORYGROUP variable doesn't exist, it shows as literal '$(CATEGORYGROUP)'
$categoryGroupNotSet = [string]::IsNullOrWhiteSpace($testFilterParam) -or $testFilterParam -eq '$(CATEGORYGROUP)'
if ($categoryGroupNotSet -and -not [string]::IsNullOrWhiteSpace($testFilterFallback)) {
$testFilterParam = $testFilterFallback
Write-Host "Using testFilter parameter as category: '$testFilterParam'"
}
$shouldRun = $true
# For PRs with detected categories, check if this category group has any matches
if ($isPR -and -not [string]::IsNullOrWhiteSpace($detectedCategories) -and -not [string]::IsNullOrWhiteSpace($testFilterParam)) {
$categoryList = $testFilterParam -split ","
$detectedList = $detectedCategories -split ","
$hasMatch = $false
foreach ($cat in $categoryList) {
$cat = $cat.Trim()
foreach ($det in $detectedList) {
$det = $det.Trim()
if ($cat -eq $det) {
$hasMatch = $true
Write-Host "Match found: '$cat'"
break
}
}
if ($hasMatch) { break }
}
if (-not $hasMatch) {
$shouldRun = $false
Write-Host "##[warning]No matching categories - SKIPPING all steps for this job"
Write-Host "Category group '$testFilterParam' does not contain any detected categories: $detectedCategories"
}
}
Write-Host "Should run tests: $shouldRun"
Write-Host "##vso[task.setvariable variable=SHOULD_RUN_TESTS]$shouldRun"
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The early check step could skip the job entirely if SHOULD_RUN_TESTS is False, but instead it sets a variable and forces all subsequent steps to check this condition. Azure Pipelines supports job-level conditions that could terminate the job earlier. Consider using a job-level condition expression or setting the job result to "Skipped" when no matching categories are found, rather than requiring every subsequent step to check SHOULD_RUN_TESTS.

Copilot uses AI. Check for mistakes.
@build-analysis build-analysis bot mentioned this pull request Dec 18, 2025
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

p/0 Current heighest priority issues that we are targeting for a release.

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

2 participants