-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[telemetrySettting] Create sampled Logger #8134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
4394496
3192d88
39f95e4
5295e11
9534fbb
56b71f3
5aec726
5d2cdeb
7ab0227
9778c88
b6ac5ac
6c2051a
f53561a
310b747
9540c55
20536a1
0b60ef8
0add644
37fcef7
70f05d8
a5d054b
868041f
8a321ac
0cb309c
bc250fb
7180926
15b3b42
409f6ff
66fa58e
4b84a00
fcd2b83
ce3776a
bce885d
f9ca8a7
8c3cddd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -83,9 +83,9 @@ type queuedRetrySender struct { | |
requestUnmarshaler internal.RequestUnmarshaler | ||
} | ||
|
||
func newQueuedRetrySender(id component.ID, signal component.DataType, qCfg QueueSettings, rCfg RetrySettings, reqUnmarshaler internal.RequestUnmarshaler, nextSender requestSender, logger *zap.Logger) *queuedRetrySender { | ||
func newQueuedRetrySender(id component.ID, signal component.DataType, qCfg QueueSettings, rCfg RetrySettings, lCfg SampledLoggerSettings, reqUnmarshaler internal.RequestUnmarshaler, nextSender requestSender, logger *zap.Logger) *queuedRetrySender { | ||
retryStopCh := make(chan struct{}) | ||
sampledLogger := createSampledLogger(logger) | ||
newLogger := createSampledLogger(logger, lCfg) | ||
traceAttr := attribute.String(obsmetrics.ExporterKey, id.String()) | ||
|
||
qrs := &queuedRetrySender{ | ||
|
@@ -95,7 +95,7 @@ func newQueuedRetrySender(id component.ID, signal component.DataType, qCfg Queue | |
cfg: qCfg, | ||
retryStopCh: retryStopCh, | ||
traceAttribute: traceAttr, | ||
logger: sampledLogger, | ||
logger: newLogger, | ||
requestUnmarshaler: reqUnmarshaler, | ||
} | ||
|
||
|
@@ -104,7 +104,7 @@ func newQueuedRetrySender(id component.ID, signal component.DataType, qCfg Queue | |
cfg: rCfg, | ||
nextSender: nextSender, | ||
stopCh: retryStopCh, | ||
logger: sampledLogger, | ||
logger: newLogger, | ||
// Following three functions actually depend on queuedRetrySender | ||
onTemporaryFailure: qrs.onTemporaryFailure, | ||
} | ||
|
@@ -266,12 +266,16 @@ func NewDefaultRetrySettings() RetrySettings { | |
} | ||
} | ||
|
||
func createSampledLogger(logger *zap.Logger) *zap.Logger { | ||
func createSampledLogger(logger *zap.Logger, lCfg SampledLoggerSettings) *zap.Logger { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @codeboten, do you know why we create a sampled logger here? Shouldn't loggers be created based on the telemetry configuration alone? Instead of this PR, I would consider removing this sampled logger altogether. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was added in #2020 by @tigrannajaryan. The PR description says
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, and shouldn't this be a collector-wide setting, instead of something exclusively to this helper? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this was meant to be specific to this helper to avoid flooding the logs in the case where all requests fail, and you log once per request. The contributing guidelines say
I guess at the time this was written metrics were not really available and thus we went with sampled logs. (Not sure if this is a good reason today to keep this, just trying to reconstruct the history of this to make a better decision) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here: I'm sure this was a good decision by then, not so sure the next iteration should be to expose a setting for this instead of using a metric and/or exposing log sampling as part of the collector's telemetry config. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think a possible future option is to supply 2 loggers to every component: one that has no sampling (outputs 100% of logs) and another that is sampled. Components will use the no-sampled logger for everything that is critical to be printed and is known to be non-repeating so it can't flood logs (e.g. startup and shutdown messages. The messages that can repeat for every data item or request will use the sampled logger. The sampled logger can derive its sampling ratio from a central setting (somewhere in the I do not think merely passing a single sampled logger to every component to be used for every message is good enough since it can result in critical one-off messages being lost. On a side note: the memory usage by sampled logger that this PR attempts to address is puzzling. Maybe try to figure out why and reduce that instead of disabling sampling? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @tigrannajaryan answering the side note. As you can see on the graph https://user-images.githubusercontent.com/5960625/255850931-f9b3e891-2695-4aac-b8d6-9d60cb0dfb9d.png the memory problem is coming from the logger sampling. We also tried modifying the configuration of
But it was still a high memory consumption. Thank you. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with @tigrannajaryan. This doesn't seem like a good option to expose in the exporters' configuration interface. @antonjim-te can you please try initializing one sampled logger instance per service and see if that helps with memory utilization? It can be created in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @dmitryax , I will create a ticket in our internal process and add it to our next sprint. I have a good feeling about this suggestion because it will just create one logger sampled for all the exporters instead of creating one per Thank you, team. I will keep you updated. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi team, The new approach also solved the memory incident. Please take a look at the new PR changes. Thank you in advance. |
||
if logger.Core().Enabled(zapcore.DebugLevel) { | ||
// Debugging is enabled. Don't do any sampling. | ||
antonjim-te marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return logger | ||
} | ||
|
||
if !lCfg.Enabled { | ||
antonjim-te marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return logger | ||
} | ||
|
||
// Create a logger that samples all messages to 1 per 10 seconds initially, | ||
// and 1/100 of messages after that. | ||
opts := zap.WrapCore(func(core zapcore.Core) zapcore.Core { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,3 +25,5 @@ keepalive: | |
timeout: 30s | ||
permit_without_stream: true | ||
balancer_name: "round_robin" | ||
sampled_logger: | ||
enabled: false |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,3 +23,5 @@ headers: | |
header1: 234 | ||
another: "somevalue" | ||
compression: gzip | ||
sampled_logger: | ||
enabled: false |
Uh oh!
There was an error while loading. Please reload this page.