-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[service/telemetry] Add Configurable Log Rotation Support Using Lumberjack #13084
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
[service/telemetry] Add log rotation support using lumberjack Rotate log file output using lumberjack.
Ignore reserve keywords like console, stdout and stderr
/label service/telemetry |
@mx-psi - Hope all good here? Just wanted to follow up on this PR. Let me know if there’s anything I should update. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #13084 +/- ##
==========================================
+ Coverage 91.59% 91.61% +0.02%
==========================================
Files 505 506 +1
Lines 28479 28602 +123
==========================================
+ Hits 26085 26205 +120
- Misses 1880 1882 +2
- Partials 514 515 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
/label collector-telemetry |
@open-telemetry/collector-approvers- Would really appreciate it if someone could take a look and provide feedback or guidance. Your input will help move this forward! |
To cover Lumberjack logger close method in service's Shutdown
makeLogger was split from newLogger to enable testing its lumberjack sink registration error path. By allowing a predictable rotationSchema, the new TestRegisterLumberjackSink_ReturnsError test can now cover zap.RegisterSink failures by pre-registering a conflicting sink. Keeping newLogger backward compatible.
@open-telemetry/collector-approvers - Could you please take a look when you get a chance? |
This change also addresses this issue: #7352 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your proposal but I have some concerns about introducing the usage of a library which last release was in 2023 https://github.com/natefinch/lumberjack/releases/tag/v2.2.1
# These lines will be padded with 2 spaces and then inserted directly into the document. | ||
# Use pipe (|) for multiline entries. | ||
subtext: | | ||
The logger now supports configurable file rotation via `rotation` settings, including max size, age, backup count, and compression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some comments you added in the PR description that can be interesting to have here. Like this only affects file-based logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
service/service_test.go
Outdated
@@ -278,6 +279,44 @@ func TestServiceTelemetry(t *testing.T) { | |||
} | |||
} | |||
|
|||
func TestServiceShutdown_LumberjackLoggerClose(t *testing.T) { | |||
// Create a temporary file for logging to ensure lumberjack logger is initialized. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can get rid of all those comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// MaxAge is the maximum number of days to retain old log files. The age is based on the timestamp encoded in the rotated filenames. | ||
MaxAge int `mapstructure:"max_age"` | ||
// Compress determines if the rotated log files should be compressed using gzip. | ||
// This can save disk space, especially for verbose logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// This can save disk space, especially for verbose logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
service/telemetry/logger.go
Outdated
// The attributes in cfg.Resource are added as resource attributes for logs exported through the LoggerProvider | ||
// To make sure they are also exposed in logs written to stdout, we add them as fields to the Zap core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to split this to a function, please, keep the comment from lines 130 to 133 here. The reason is that this only applies to the statement behind but not to the others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
service/telemetry/logger.go
Outdated
} | ||
|
||
// GetLumberjackLogger returns the global lumberjack logger instance. | ||
func GetLumberjackLogger() *lumberjack.Logger { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be more like GetRotatedLogger
or something (sorry I'm not great with naming). I think the name should be more general because if in the future, for whatever reason, we need to switch to other library, we will need to modify more stuff... and this can lead to more errors/issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
goleak.VerifyTestMain(m) | ||
// goleak.VerifyTestMain(m) | ||
// Ignore lumberjack millRun goroutine in all tests | ||
goleak.VerifyTestMain(m, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to add why
https://github.com/natefinch/lumberjack/issues?q=is%3Aissue%20state%3Aopen%20leak
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
Thanks for the valid concern about
I confirmed this approach in the Collector SIG meeting on May 21st, before PR contribution, not received any concerns. Strategy:
The optional nature and existing usage minimize risk while providing immediate user value. It provides a low-friction way to support log rotation now while keeping future options open. |
Not sure about this. When
Since the "good practice" is building your own distribution, if you include those components as part of your distribution, it's up to you. But in this case... it would be part of core... what makes me feel a bit uncomfortable.
Not sure about this statement becase everything mentioned above. |
Yes, it gets linked in the executable. When not enabled, lumberjack's lifecycle flow won't be triggered. And lumberjack hasn't announced end-of-support but is indeed slow in maintenance. Since this is a core critical feature and long-pending one, do you have any alternative suggestions to replace lumberjack? |
This library is being used by the Kubernetes project: https://github.com/kubernetes/kubernetes/blob/master/go.mod#L215 |
I wonder if and why we need the Collector to natively support log rotation when it can be handled by external tools like logrotate? |
I would also like to understand the benefits of having them inside the Collector |
There are a few reasons we found it valuable to have rotation in the Collector itself:
While external tools are valid, an internal option adds flexibility and addresses these known needs. |
When this feature will be available? |
For posterity: judging by the activity in lumberjack issues, it doesn't look actively maintained. If we ever need to replace it (e.g., due to a high vulnerability), here's are two simple implementations worth taking a look: https://github.com/influxdata/telegraf/blob/master/internal/rotate/file_writer.go and https://github.com/newrelic/infrastructure-agent/blob/master/pkg/log/rotate.go |
Description
This PR introduces optional log file rotation support to the OpenTelemetry Collector's internal telemetry logging system using the lumberjack log rolling library. The enhancement enables better control over log growth and retention without requiring external tools like logrotate.
Link to tracking issue
Issue# 10768
Fixes #
service::telemetry::logs
allows controlling rotation behavior:Integrated via zap.Sink Registration:
Dynamic Output Path Handling:
Note:
Backward-compatible: If rotation.enabled is false or the block is omitted, log behavior remains unchanged.
Only affects file-based logging, no impact on default stderr logging or console environments.
Testing
Documentation
Documentation is part of
LogsRotationConfig
struct (refer to the v0.3.0.go file).Will have to find if there is any README.md should be updated.