Skip to content

Remove path parts from component label suffixes #38527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
evan-bradley opened this issue Mar 11, 2025 · 9 comments · Fixed by #38622
Closed

Remove path parts from component label suffixes #38527

evan-bradley opened this issue Mar 11, 2025 · 9 comments · Fixed by #38622
Assignees
Labels
ci-cd CI, CD, testing, build issues enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@evan-bradley
Copy link
Contributor

evan-bradley commented Mar 11, 2025

Component(s)

No response

Is your feature request related to a problem? Please describe.

In our "create component labels" script we have a few components that are just barely over GitHub's 50 character limit for labels. From a recent workflow run:

'extension/encoding/awscloudwatchmetricstreamsencoding' exceeds GitHubs 50-character limit on labels, skipping
'receiver/hostmetrics/internal/scraper/filesystemscraper' exceeds GitHubs 50-character limit on labels, skipping
'receiver/hostmetrics/internal/scraper/memoryscraper' exceeds GitHubs 50-character limit on labels, skipping
'receiver/hostmetrics/internal/scraper/networkscraper' exceeds GitHubs 50-character limit on labels, skipping
'receiver/hostmetrics/internal/scraper/pagingscraper' exceeds GitHubs 50-character limit on labels, skipping
'receiver/hostmetrics/internal/scraper/processesscraper' exceeds GitHubs 50-character limit on labels, skipping
'receiver/hostmetrics/internal/scraper/processscraper' exceeds GitHubs 50-character limit on labels, skipping
'receiver/hostmetrics/internal/scraper/systemscraper' exceeds GitHubs 50-character limit on labels, skipping

Describe the solution you'd like

For each of these, there are suffixes in the last part of the path that can be removed since the suffix is already present in the path and therefore it's clear what type of component it is.

So extension/encoding/awscloudwatchmetricstreamsencoding could become extension/encoding/awscloudwatchmetricstreams and receiver/hostmetrics/internal/scraper/filesystemscraper could become receiver/hostmetrics/internal/scraper/filesystem.

This would get us under the 50-character limit for now and wouldn't require too many changes to how the script operates.

Describe alternatives you've considered

Support a field in mdatagen that allows us to manually shorten these, then generate the authoritative list of labels from mdatagen. The 50-character limit could be enforced during the generation step.

Additional context

No response

@evan-bradley evan-bradley added ci-cd CI, CD, testing, build issues enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Mar 11, 2025
@evan-bradley
Copy link
Contributor Author

evan-bradley commented Mar 11, 2025

I came up with the following process to generate shorter label names. All components would have labels under the 50-character limit if we applied this.

LABEL_NAME=$(echo "${COMPONENT}" | sed -E 's%^(.+)/(.+)\1%\1/\2%')
OIFS=${IFS}

IFS='/'
for SEGMENT in ${COMPONENT}
do
    # If a component is named pkg/mypackage and the segment is mypackage, it will always match. Skip the last part of the path.
    r="/${SEGMENT}\$"
    if [[ "${COMPONENT}" =~ ${r} ]]; then
        break
    fi
    LABEL_NAME=$(echo "${LABEL_NAME}" | sed -E "s%^(.+)${SEGMENT}\$%\1%")
done

IFS=${OIFS}

if (( "${#LABEL_NAME}" > 50 )); then
    echo "'${LABEL_NAME}' exceeds GitHubs 50-character limit on labels, skipping"
    continue
fi

However, while this handles the path --> label conversion, we need something afterward that can do label --> path, which is much harder since our path names have inconsistencies that make it basically impossible to determine which parts of the path were removed from the label.

I think our best bet will be to change label generation from an on-demand process to something that is generated and checked in to the repo. If we have a file that contains each path and its corresponding label on each line, we could do a lookup when converting in either direction.

@PhilemonBrain
Copy link

@evan-bradley, I am new to Go but confident I can give this a good shot. Can I get it assigned to me?

@PhilemonBrain
Copy link

/assign

@evan-bradley
Copy link
Contributor Author

Sure thing @PhilemonBrain. This will almost entirely consist of working in Bash/GitHub Actions/Makefiles, so no Go knowledge is necessary. 🙂

You'll want to take a look at the following:

  1. Adding a target to the Makefile at the root of the repo that converts the paths in the .github/CODEOWNERS file to space-delimited text file, probably somewhere in the .github directory.
  2. Adding a check to our build-and-test workflow in .github/workflows to the checks job that verifies this list is up-to-date, similar to checking for Go mod dependency changes or make gendistributions
  3. Updating all workflows that deal with component labels to use this list for lookups.

Thanks in advance for looking at this!

@PhilemonBrain
Copy link

PhilemonBrain commented Mar 11, 2025

I would love to work on this, however, I sincerely think this might be a bit too much for me 🥲. Thanks @evan-bradley. I would pick another issue

@evan-bradley
Copy link
Contributor Author

No problem, thanks for your interest!

@gabgg71
Copy link
Contributor

gabgg71 commented Mar 12, 2025

@evan-bradley I would love to work on this. Could you please assign it to me?

@evan-bradley
Copy link
Contributor Author

Can do. Thanks for looking into it!

@axw
Copy link
Contributor

axw commented Mar 19, 2025

Thanks for fixing this @gabgg71!

Fiery-Fenix pushed a commit to Fiery-Fenix/opentelemetry-collector-contrib that referenced this issue Apr 24, 2025
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
To generate labels for all components, including those with paths longer
than 50 characters:
In the root Makefile, a phony target is created to generate the
.github/component_labels file, which contains the component paths and
their corresponding labels (shortened paths for those exceeding 50
characters by removing repeated patterns in the string). The file is
space-delimited.
In the build-and-test workflow, under the checks job, a step is added to
verify that the .github/component_labels file exists.
The scripts used in the workflows are adjusted to use the
.github/component_labels file as a reference for mappings between
component paths and their labels.

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes open-telemetry#38527
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-cd CI, CD, testing, build issues enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants