Skip to content

fixing missing logs while pod log file being rolled over #1690

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

pszkamruk-splunk
Copy link
Contributor

@pszkamruk-splunk pszkamruk-splunk commented Mar 5, 2025

Description:
This PR contains changes related to the bug raised by one of our customer. They mentioned that not all logs were being collected from the log files due to the files being rotated.
The changes to fix above issue have been added by using featureGates.fixMissedLogsDuringLogRotation flag. Setting that flag to true will enable the fix and will allow connector to collect logs also from files which are being rolled over.

image

Testing: checked manually, no duplicated logs, no missed logs

@pszkamruk-splunk pszkamruk-splunk requested review from a team as code owners March 5, 2025 13:55
@dmitryax
Copy link
Contributor

dmitryax commented Mar 5, 2025

@pszkamruk-splunk, have you verified that this actually solves the problem? Do you know what's the container runtime and k8s cluster the customer has? AFAIK containerd should be doing copy/truncate, in which case this change isn't needed, see: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/fileconsumer/design.md#rotated-files-that-end-up-out-of-the-matching-pattern. Also, there is relevant issue where it's ended up to be another problem

@pszkamruk-splunk
Copy link
Contributor Author

@pszkamruk-splunk, have you verified that this actually solves the problem? Do you know what's the container runtime and k8s cluster the customer has? AFAIK containerd should be doing copy/truncate, in which case this change isn't needed, see: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/fileconsumer/design.md#rotated-files-that-end-up-out-of-the-matching-pattern. Also, there is relevant issue where it's ended up to be another problem

I tested it locally with loggen pods, and as shown in the attached picture we are collecting events from rotated files. If the fix is not enabled (featureGates.fixMissedLogsDuringLogRotation: false) then from time to time I was observing some dropped logs.
I saw docs regarding file rotation but I was not aware of that issue.
I asked support to confirm what container runtime is being used by customer.

My proposal is to proceed with this change and see if it will help on customer environment, and then we can decide if we want to keep it or remove it.

@pszkamruk-splunk
Copy link
Contributor Author

@dmitryax
The agency runs OpenShift 4 - specifically v4.16 and at the time this ticket was opened v4.12. All versions of OpenShift 4.x leverage cri-o as the container runtime

@dmitryax
Copy link
Contributor

dmitryax commented Mar 7, 2025

My proposal is to proceed with this change and see if it will help on customer environment, and then we can decide if we want to keep it or remove it.

Sounds good to me. Please address my other comment above, and we will merge it

@pszkamruk-splunk pszkamruk-splunk force-pushed the dropped-logs-while-logs-rolling-fix branch from cc432ae to 72b28c9 Compare March 7, 2025 10:18
@dmitryax dmitryax merged commit 6bf2080 into signalfx:main Mar 7, 2025
75 of 77 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Mar 7, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants