Skip to content

[service/telemetry] Add Configurable Log Rotation Support Using Lumberjack #13084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

bvsvas
Copy link

@bvsvas bvsvas commented May 24, 2025

Description

This PR introduces optional log file rotation support to the OpenTelemetry Collector's internal telemetry logging system using the lumberjack log rolling library. The enhancement enables better control over log growth and retention without requiring external tools like logrotate.

Link to tracking issue

Issue# 10768

Fixes #

  • Log Rotation Enabled via Config:
    • New rotation block under service::telemetry::logs allows controlling rotation behavior:
service:
  telemetry:
    logs:
      output_paths: ["collector.log"]
      rotation:
        enabled: true
        max_megabytes: 100   # Max file size in MB before rotating (optional)
        max_backups: 3       # Max number of old files to retain (optional)
        max_age: 28          # Max days to keep old logs (optional)
        compress: true       # Whether to gzip old logs (optional)
  • Integrated via zap.Sink Registration:

    • A unique zap.Sink is registered for each log file using a UUID-prefixed lumberjack scheme.
    • This avoids conflicts during parallel test runs and ensures test isolation.
  • Dynamic Output Path Handling:

    • Applies rotation only if a valid file-based output_paths entry is provided.
    • Ignores console targets like "stdout", "stderr", and "console".

Note:
Backward-compatible: If rotation.enabled is false or the block is omitted, log behavior remains unchanged.
Only affects file-based logging, no impact on default stderr logging or console environments.

Testing

  • Unit tests added for:
    • newLogger with and without rotation enabled.
    • UUID-prefixed zap.Sink registration per test to avoid global zap sink collision.
    • Log file rotation behavior, file rollover validation.
  • Verified compatibility across platforms (including Windows).

Documentation

Documentation is part of LogsRotationConfig struct (refer to the v0.3.0.go file).
Will have to find if there is any README.md should be updated.

@bvsvas bvsvas requested a review from a team as a code owner May 24, 2025 06:51
@bvsvas bvsvas requested a review from mx-psi May 24, 2025 06:51
@bvsvas
Copy link
Author

bvsvas commented May 24, 2025

/label service/telemetry

@bvsvas
Copy link
Author

bvsvas commented May 26, 2025

@mx-psi - Hope all good here? Just wanted to follow up on this PR. Let me know if there’s anything I should update.
Thanks for your time!

Copy link

codecov bot commented May 26, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.61%. Comparing base (9b4911b) to head (14f700f).
Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #13084      +/-   ##
==========================================
+ Coverage   91.59%   91.61%   +0.02%     
==========================================
  Files         505      506       +1     
  Lines       28479    28602     +123     
==========================================
+ Hits        26085    26205     +120     
- Misses       1880     1882       +2     
- Partials      514      515       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bvsvas
Copy link
Author

bvsvas commented May 27, 2025

/label collector-telemetry

@bvsvas
Copy link
Author

bvsvas commented May 27, 2025

@open-telemetry/collector-approvers- Would really appreciate it if someone could take a look and provide feedback or guidance. Your input will help move this forward!

@srinivasvenkatabevara
Copy link

srinivasvenkatabevara commented May 28, 2025

To cover Lumberjack logger close method in service's Shutdown
makeLogger was split from newLogger to enable testing its lumberjack sink registration error path.

By allowing a predictable rotationSchema, the new TestRegisterLumberjackSink_ReturnsError test can now cover zap.RegisterSink failures by pre-registering a conflicting sink.

Keeping newLogger backward compatible.
@srinivasvenkatabevara
Copy link

@open-telemetry/collector-approvers - Could you please take a look when you get a chance?

@humbe
Copy link

humbe commented May 29, 2025

This change also addresses this issue: #7352

Copy link
Contributor

@iblancasa iblancasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your proposal but I have some concerns about introducing the usage of a library which last release was in 2023 https://github.com/natefinch/lumberjack/releases/tag/v2.2.1

# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: |
The logger now supports configurable file rotation via `rotation` settings, including max size, age, backup count, and compression.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some comments you added in the PR description that can be interesting to have here. Like this only affects file-based logging.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -278,6 +279,44 @@ func TestServiceTelemetry(t *testing.T) {
}
}

func TestServiceShutdown_LumberjackLoggerClose(t *testing.T) {
// Create a temporary file for logging to ensure lumberjack logger is initialized.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get rid of all those comments.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// MaxAge is the maximum number of days to retain old log files. The age is based on the timestamp encoded in the rotated filenames.
MaxAge int `mapstructure:"max_age"`
// Compress determines if the rotated log files should be compressed using gzip.
// This can save disk space, especially for verbose logs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This can save disk space, especially for verbose logs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 141 to 142
// The attributes in cfg.Resource are added as resource attributes for logs exported through the LoggerProvider
// To make sure they are also exposed in logs written to stdout, we add them as fields to the Zap core
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to split this to a function, please, keep the comment from lines 130 to 133 here. The reason is that this only applies to the statement behind but not to the others.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

// GetLumberjackLogger returns the global lumberjack logger instance.
func GetLumberjackLogger() *lumberjack.Logger {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be more like GetRotatedLogger or something (sorry I'm not great with naming). I think the name should be more general because if in the future, for whatever reason, we need to switch to other library, we will need to modify more stuff... and this can lead to more errors/issues.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

goleak.VerifyTestMain(m)
// goleak.VerifyTestMain(m)
// Ignore lumberjack millRun goroutine in all tests
goleak.VerifyTestMain(m,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@srinivasvenkatabevara
Copy link

Thanks for your proposal but I have some concerns about introducing the usage of a library which last release was in 2023 https://github.com/natefinch/lumberjack/releases/tag/v2.2.1

Thanks for the valid concern about lumberjack's maintenance status. Here's my perspective:

  • This is an optional dependency - only loaded when rotation is enabled(rotation.enabled), so it's not loaded by default.
  • Already used - This library is already in use in some exporters within the collector-contrib repo.
  • Widely adopted - While the last release was in 2023, lumberjack is actively used in major Go projects like Kubernetes, OSE etc...

I confirmed this approach in the Collector SIG meeting on May 21st, before PR contribution, not received any concerns.

Strategy:

  • It enables log rotation feature users are requesting with lumberjack's library.
  • Meanwhile, we can monitor lumberjack status and migrate to alternatives if needed.

The optional nature and existing usage minimize risk while providing immediate user value. It provides a low-friction way to support log rotation now while keeping future options open.

@thrinadhk
Copy link

This is one of the much-needed use cases and a long-pending issue: #7352, #10768 #7507 #7352

Hope to see this change included in an upcoming OpenTelemetry Collector release soon.

@iblancasa
Copy link
Contributor

  • This is an optional dependency - only loaded when rotation is enabled(rotation.enabled), so it's not loaded by default.

Not sure about this. When rotation.enabled is false, the feature will not be used but the library will be linked against the executable... what can lead to vulnerabilities in the package (which is ok and can happen with whatever package but not in an unmaintained package).

  • Already used - This library is already in use in some exporters within the collector-contrib repo.

Since the "good practice" is building your own distribution, if you include those components as part of your distribution, it's up to you. But in this case... it would be part of core... what makes me feel a bit uncomfortable.

The optional nature and existing usage minimize risk while providing immediate user value. It provides a low-friction way to support log rotation now while keeping future options open.

Not sure about this statement becase everything mentioned above.

@srinivasvenkatabevara
Copy link

srinivasvenkatabevara commented Jun 2, 2025

linked

Yes, it gets linked in the executable. When not enabled, lumberjack's lifecycle flow won't be triggered. And lumberjack hasn't announced end-of-support but is indeed slow in maintenance.

Since this is a core critical feature and long-pending one, do you have any alternative suggestions to replace lumberjack?

@humbe
Copy link

humbe commented Jun 2, 2025

Thanks for your proposal but I have some concerns about introducing the usage of a library which last release was in 2023 https://github.com/natefinch/lumberjack/releases/tag/v2.2.1

This library is being used by the Kubernetes project: https://github.com/kubernetes/kubernetes/blob/master/go.mod#L215
It also does not have any known vulnerabilities.

@ChrsMark
Copy link
Member

ChrsMark commented Jun 4, 2025

I wonder if and why we need the Collector to natively support log rotation when it can be handled by external tools like logrotate?

@mx-psi
Copy link
Member

mx-psi commented Jun 4, 2025

I wonder if and why we need the Collector to natively support log rotation when it can be handled by external tools like logrotate?

I would also like to understand the benefits of having them inside the Collector

@srinivasvenkatabevara
Copy link

I wonder if and why we need the Collector to natively support log rotation when it can be handled by external tools like logrotate?

There are a few reasons we found it valuable to have rotation in the Collector itself:

  • Distroless images: Common in production for security, and they don’t support external tools like logrotate.
  • Platform Agnostic: Unlike OS-dependent logrotate, a built-in solution is cross-platform.
  • Self-managed logging: Since the Collector generates its own logs, it’s safer and more practical to manage them internally without relying on external agents.
  • Community demand: This has been a recurring ask (see issues #7352, #7507, #10768).

While external tools are valid, an internal option adds flexibility and addresses these known needs.
Hope that clarifies the motivation!

@rohitkumarcs
Copy link

When this feature will be available?

@rogercoll
Copy link
Contributor

For posterity: judging by the activity in lumberjack issues, it doesn't look actively maintained. If we ever need to replace it (e.g., due to a high vulnerability), here's are two simple implementations worth taking a look: https://github.com/influxdata/telegraf/blob/master/internal/rotate/file_writer.go and https://github.com/newrelic/infrastructure-agent/blob/master/pkg/log/rotate.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants