-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[pkg/stanza] [receiver/windowseventlog] Fix: Windows Event Max Read (ERRNO 1734) #38149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pkg/stanza] [receiver/windowseventlog] Fix: Windows Event Max Read (ERRNO 1734) #38149
Conversation
6508e01
to
283f772
Compare
@pjanotti Could you take a look at this whenever you have a second? I'd be more than happy to answer any questions regarding it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @BominRahmani - I haven't heard of this issue before. An initial review below, will look into more detail later.
} | ||
if len(events) == n+1 { | ||
i.updateBookmarkOffset(ctx, event) | ||
if err := i.subscription.bookmark.Update(event); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the issue that triggered this addition here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that when i closed the subscription and re-opened, without this addition, it would go over all the events again, the bookmark functionality wasn't working as expected. However with this addition it works. Now that I look at updateBookmarkOffset, there is probably a more elegant/less redundant way of dealing with this that i can look into
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the code use EvtSeek
as proposed in the other comment this may not be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand this: it is the same bookmark handle, passed by value, this second call should be redundant. It is in the end asking to update the same bookmark to the same event twice.
The bookmark ownership now is also divided between Input and Subscription. It seems to make sense to pass the ownership of it to the Subscription instead. If we understand what is going on here we could perhaps do that in a follow-up PR, but, right now this seems to indicate that we are missing something.
err := evtNext(s.handle, uint32(maxReads), &eventHandles[0], 0, 0, &eventsRead) | ||
|
||
if errors.Is(err, ErrorInvalidOperation) && eventsRead == 0 { | ||
return nil, nil | ||
} | ||
|
||
if err != nil && errors.Is(err, windows.RPC_S_INVALID_BOUND) { | ||
// close current subscription | ||
if closeErr := s.Close(); closeErr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is necessary to close the subscription? Can't you just retry reducing the number of events?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When EvtNext Fails, it still updates internal read position associated with the event handle. I think if we don't close and re-open the subscription this would lead to lost events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This other fix for the same issue used EvtSeek of closing and re-opening the subscription. PTAL https://github.com/osquery/osquery/pull/6660/files#diff-ef502ac70422248e983fd638d90b6cfd6dc7a1153666d2c46cf60caac6787a26R164
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked more into EvtSeek and tested it out a bit, using EvtSeek with our bookmark position doesn't seem to work, when the RPC_S_INVALID_BOUND
error is encountered, i believe the event handle that the bookmark refers to is invalidated in the case of encountering this error, this is why the github PR that you linked has them using a position counter instead. I find this approach of using a position counter to be a lot more hacky than just closing and re-opening the subscription.
3178d8b
to
60a298b
Compare
@pjanotti Any chance you can take another look today? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be important to have a test in which we could trigger the actual error. The C++ code is good but not deterministic. something deterministic can be added to the tests directly. My guess is that you need something like 32+ events at maximum size to trigger the issue.
change_type: bug_fix | ||
|
||
# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver) | ||
component: pkg/stanza |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djaglowski what is the typical practice here? List only the stanza package or also include all receivers consuming it (including indirectly).
err := evtNext(s.handle, uint32(maxReads), &eventHandles[0], 0, 0, &eventsRead) | ||
|
||
if errors.Is(err, ErrorInvalidOperation) && eventsRead == 0 { | ||
return nil, nil | ||
} | ||
|
||
if err != nil && errors.Is(err, windows.RPC_S_INVALID_BOUND) { | ||
// close current subscription | ||
if closeErr := s.Close(); closeErr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This other fix for the same issue used EvtSeek of closing and re-opening the subscription. PTAL https://github.com/osquery/osquery/pull/6660/files#diff-ef502ac70422248e983fd638d90b6cfd6dc7a1153666d2c46cf60caac6787a26R164
} | ||
if len(events) == n+1 { | ||
i.updateBookmarkOffset(ctx, event) | ||
if err := i.subscription.bookmark.Update(event); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the code use EvtSeek
as proposed in the other comment this may not be needed.
3d99abd
to
28558e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BominRahmani I'm still worried about the bookmark ownership, please see below.
events, actualMaxReads, err := i.subscription.Read(i.currentMaxReads) | ||
|
||
// Update the current max reads if it changed | ||
if actualMaxReads < i.currentMaxReads { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actualMaxReads
should only be used if err == nil
, even if in the PR implementation took care to return the value currentMaxReads it seems something that can be easily broken in future changes
// Update the current max reads if it changed | ||
if actualMaxReads < i.currentMaxReads { | ||
i.currentMaxReads = actualMaxReads | ||
i.Logger().Debug("Encountered RPC_S_INVALID_BOUND, reducing batch size", zap.Int("current batch size", i.currentMaxReads), zap.Int("original batch size", i.maxReads)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
i.Logger().Debug("Encountered RPC_S_INVALID_BOUND, reducing batch size", zap.Int("current batch size", i.currentMaxReads), zap.Int("original batch size", i.maxReads)) | |
i.Logger().Debug("Encountered RPC_S_INVALID_BOUND, reduced batch size", zap.Int("current_batch_size", i.currentMaxReads), zap.Int("original_batch_size", i.maxReads)) |
} | ||
if len(events) == n+1 { | ||
i.updateBookmarkOffset(ctx, event) | ||
if err := i.subscription.bookmark.Update(event); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand this: it is the same bookmark handle, passed by value, this second call should be redundant. It is in the end asking to update the same bookmark to the same event twice.
The bookmark ownership now is also divided between Input and Subscription. It seems to make sense to pass the ownership of it to the Subscription instead. If we understand what is going on here we could perhaps do that in a follow-up PR, but, right now this seems to indicate that we are missing something.
@pjanotti Regarding the almost redundant usage, its a bit tricky, It is the same bookmark handle passed by value initially, however when the RPC_S_INVALID_BOUNDS error is encountered, the program will pass the subscription bookmark when re-opening the subscription, which can now have a stale/incorrect handle. |
@BominRahmani let's move ahead with this PR and do a follow-up to cleanup the ownership of the bookmark. The main thing in my mind regarding the bookmark is that if you look to its native API it is not connected to the query handle and in principle should not be affected by the EvtNext failure. Anyway, we look deeply at it on a follow-up PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - there should be a follow-up to clean-up the ownership and tracking of the bookmark.
eb3572d
to
92292f4
Compare
Description
When reading a batch of large event logs, the Windows function EvtNext returns errno 1734 (0x6C6) is triggered.
This issue is explained pretty aptly here.
elastic/beats#3076
Link to tracking issue
Fixes
Testing
Testing this is a bit difficult without using a live environment. I was able to create a C++ script that reliably triggers this error by creating "randomish" windows event logs.
You can run this script after registering the event log "ChaoticEvent" on powershell with admin privs