CI/CD conventions for metrics

### Area(s)

area:cicd

### Is your change request related to a problem? Please describe.

This issue is to discuss attributes specific to metrics and as part of the CI/CD Working Group and Semantic Conventions WG.
Also a challenge specific to metrics can the time series cardinality when CICD observes metrics for individual builds.

### Describe the solution you'd like

Following https://github.com/open-telemetry/semantic-conventions/pull/1075 (by adjusting the vocabulary here below to align with #1075) we should define metric attributes for
* duration of pipelineRuns (by status, pipeline)
* count of pipelineRuns (by status, pipeline)
* count of agents
* queue length of pending pipelineRuns
* duration for how long a pipelineRun is in the queue before starting execution

Additionally it should be possible to opt-in to metrics specific to a particular pipelineRun.
These could be metrics about the agent which executes a pipelineRun, the OS, network, jvm, the number of failed/total tests …
We need to specify the attribute which should link these metrics to the pipelineRun, eg. `pipeline.run.id`

Metrics specific to a pipelineRun are of high cardinality. We should document this as a warning and give guidance how these metrics can be efficiently encoded in the OTel protocol, ie by using resource attributes instead of metric attributes wherever possible.


### Describe alternatives you've considered

Span metrics could be used for duration and count of pipelineRuns, however this relies on the pipelineRuns having completed.
This is due to limitations inherent in using traces to represent pipelineRuns, a span can only be sent when complete.
Due to this limitation it could be preferable for the CICD system to expose metrics directly about the duration, count and status of pipelineRuns. These pipelineRuns could account also for in progress builds.

### Additional context

CICD metrics were discussed at KubeCon March 2024 SemConv users meeting.
High cardinality was highlighted as an issue for per build metrics.
Notes on how to deal with cardinality were:
* Could we use Exemplars? We could link to the build trace from some metrics.
This added information might make it easier to identify pipelineRuns that need investigation.
* Using the resource attribute for the build ID is fine for the OTel protocol,
but backends (eg. Prometheus) would still have the cardinality issue when storing the time series
(metric / resource attributes would be flattened into time series).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CI/CD conventions for metrics #1111

Area(s)

Is your change request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CI/CD conventions for metrics #1111

Description

Area(s)

Is your change request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions