Skip to content

Generate the batch processor config from schema #13155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ptodev
Copy link

@ptodev ptodev commented Jun 4, 2025

Description

This is following up on #10694. Unlike the previous PR, this one is much smaller because it only adds schema for the batch processor. Also, thanks to @omissis's amazing support go-jsonschema has a lot more features now and we might be able to just use the upstream go-jsonschema without a need for a temporary fork 🙏

I'm hoping that we can implement this for the batch processor first, and then gradually expand into more and more components. If for some reason the autogeneration becomes an issue, it can be disabled by commenting it out in the metadata.yaml file and editing config.go manually.

The PR is still in draft stage because there are a few TODOs for which I need feedback from maintainers.

Link to tracking issue

Fixes #9769

Testing

There is a test in mdatagen, but the things it tests are limited to the functionality required for the batch processor.

Documentation

I only edited the mdatagen readme file.

@@ -1,70 +1,74 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0
// Code generated by github.com/atombender/go-jsonschema, DO NOT EDIT.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a problem that the file doesn't have this header:

// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need me to add more tests now for use cases not covered by the batch processor? Or are you ok with adding these gradually in subsequent PRs?

@@ -200,3 +200,5 @@ telemetry:
# Optional: array of attributes that were defined in the attributes section that are emitted by this metric.
# Note: Only the following attribute types are supported: <string|int|double|bool>
attributes: [string]

# TODO: Add the new "config" field here too.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to add schema for the "config" field before opening the PR for reviews.

// When this is set to zero, batched data will be sent immediately.
Timeout time.Duration `mapstructure:"timeout"`
// Prevent unkeyed literal initialization.
_ struct{} `mapstructure:"_"`
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better if this doesn't have mapstructure:"_" struct tags. I could try to add a feature to go-jsonschema which disables them on a per-attribute level. Something like disableStructTags:

    _:
      description: >-
        Prevent unkeyed literal initialization.
      type: integer
      goJSONSchema:
        identifier: _
        type: struct{}
        nillable: true
        disableStructTags: true

if v, ok := raw["send_batch_max_size"]; !ok || v == nil {
plain.SendBatchMaxSize = 0.0
}
if v, ok := raw["send_batch_size"]; !ok || v == nil {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the schema contains minimum: 0, normally go-jsonschema would also generate code like this:

if 0 > plain.SendBatchSize {
	return fmt.Errorf("field %s: must be >= %v", "send_batch_size", 0)
}

It doesn't do it here because he overrode the type to be uint32. It's not a problem right now, but it might be a problem if we wanted the minimum to be larger than 0. I should try to fix this upstream.

var _ component.Config = (*Config)(nil)

// Validate checks if the processor configuration is valid
func (cfg *Config) Validate() error {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To leverage go-jsonschema's validation abilities, it would be nice to marshal to json and unmarshal. Then we could see if boundaries checks such minimum and maximum for integers work ok. But to do this, I suppose we need json struct tags. Would that be ok? Otherwise we'd need a new feature in jsonschema to handle this without relying on json, which would be useful for mapstructure users.

MetadataCardinalityLimit: defaultMetadataCardinalityLimit,
}
cfg := &Config{}
json.Unmarshal([]byte("{}"), cfg) // Unmarshal empty JSON to get default values
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to have json struct tags if all we need is to fill in the defaults.

@omissis
Copy link

omissis commented Jun 5, 2025

Hi, thanks for the mention. Just a word of advice: some of the PRs I merged to add support for this project caused some BC breaks that I am going to address in the next release (v0.21.0) and that will revert some of the behaviors to the previous version of the library. the new behaviors will need to be explicitly activated: this will probably break your code, but it'll just be a matter of activating a flag to fix it. please use main if you want to incorporate those changes early on

Copy link

codecov bot commented Jun 5, 2025

Codecov Report

Attention: Patch coverage is 62.19512% with 62 lines in your changes missing coverage. Please review.

Project coverage is 91.09%. Comparing base (b1ce36b) to head (89b57be).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
cmd/mdatagen/internal/configgen.go 65.38% 28 Missing and 8 partials ⚠️
cmd/mdatagen/internal/command.go 0.00% 17 Missing and 1 partial ⚠️
processor/batchprocessor/config.go 66.66% 5 Missing and 3 partials ⚠️

❌ Your patch check has failed because the patch coverage (62.19%) is below the target coverage (95.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #13155      +/-   ##
==========================================
- Coverage   91.27%   91.09%   -0.18%     
==========================================
  Files         508      510       +2     
  Lines       28736    28883     +147     
==========================================
+ Hits        26228    26312      +84     
- Misses       1992     2043      +51     
- Partials      516      528      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve otel collector configuration w/ JSON schema
2 participants