Skip to content

[pkg/stanza] [receiver/windowseventlog] Fix: Windows Event Max Read (ERRNO 1734) #38149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

BominRahmani
Copy link
Contributor

@BominRahmani BominRahmani commented Feb 24, 2025

Description

When reading a batch of large event logs, the Windows function EvtNext returns errno 1734 (0x6C6) is triggered.
This issue is explained pretty aptly here.
elastic/beats#3076

Link to tracking issue

Fixes

Testing

Testing this is a bit difficult without using a live environment. I was able to create a C++ script that reliably triggers this error by creating "randomish" windows event logs.

#include <windows.h>
#include <string>
#include <iostream>
#include <vector>
#include <thread>
#include <random>
#include <cmath>
#include <algorithm>  // Add this for std::generate

// Prime number configuration for chaotic sizes
const int PRIMES[] = {317, 4093, 5113, 6131, 7151, 8171, 9203};
const int NUM_EVENTS = 100000;
const int MAX_TOTAL_SIZE = 31744;

std::random_device rd;
std::mt19937 gen(rd());

int get_random_prime() {
    static std::uniform_int_distribution<> dis(0, sizeof(PRIMES)/sizeof(PRIMES[0])-1);
    return PRIMES[dis(gen)];
}

struct EventConfig {
    int string_count;
    int string_size;
    int binary_size;
};

EventConfig generate_event_config() {
    EventConfig cfg;
    
    do {
        cfg.string_count = 3 + (gen() % 5);  // 3-7 strings
        cfg.string_size = get_random_prime();
        cfg.binary_size = get_random_prime();
    } while ((cfg.string_count * cfg.string_size) + cfg.binary_size > MAX_TOTAL_SIZE);

    return cfg;
}

std::vector<BYTE> create_chaotic_binary(int size) {
    std::vector<BYTE> data(size);
    std::generate(data.begin(), data.end(), [&]{ return static_cast<BYTE>(gen() % 256); });
    return data;
}


std::string create_chaotic_string(int size) {
    std::string str;
    str.reserve(size);
    
    // Mix of printable chars and random nulls
    std::uniform_int_distribution<> char_gen(0, 256);
    std::uniform_int_distribution<> null_gen(0, 20); // 5% chance of null
    
    for(int i = 0; i < size; i++) {
        if(null_gen(gen) == 0) {
            str += '\0';
        } else {
            str += static_cast<char>(char_gen(gen) % 95 + 32); // Printable ASCII
        }
    }
    return str;
}

bool write_event(HANDLE hLog, DWORD eventId) {
    EventConfig cfg = generate_event_config();
    
    // Generate chaotic strings with XML structure
    std::vector<const char*> strings;
    std::vector<std::string> stringStorage;
    
    stringStorage.emplace_back("<Event><System><Provider Name='ChaoticEvent'/>");
    stringStorage.emplace_back("<EventID>" + std::to_string(eventId) + "</EventID>");
    
    for(int i = 2; i < cfg.string_count; i++) {
        stringStorage.push_back(create_chaotic_string(cfg.string_size));
    }
    
    for(auto& s : stringStorage) {
        strings.push_back(s.c_str());
    }

    // Create binary payload with prime size
    std::vector<BYTE> binaryData = create_chaotic_binary(cfg.binary_size);

    // Report the event
    return ReportEventA(
        hLog,
        EVENTLOG_INFORMATION_TYPE,
        0,
        eventId,
        NULL,
        strings.size(),
        binaryData.size(),
        strings.data(),
        binaryData.data()
    );
}

void cleanup_log() {
    system("wevtutil cl Application /bu:backup.evtx 2>nul");  // Backup and clear log
    system("wevtutil set-log Application /retention:true /maxsize:1073741824 /q");
}

int main() {
    cleanup_log();
    HANDLE hLog = RegisterEventSourceA(NULL, "ChaoticEvent");
    
    if(!hLog) {
        std::cerr << "Run as Administrator! Error: " << GetLastError() << "\n";
        return 1;
    }

    std::cout << "Generating chaotic events...\n";
    std::cout << "Each event has:\n"
              << "- Random number of strings (3-7)\n"
              << "- Prime-numbered sizes (e.g., 317, 4093, 6131 bytes)\n"
              << "- Random binary payloads with null bytes\n"
              << "- Total size < 31KB\n";

    int success_count = 0;
    for(int i = 0; i < NUM_EVENTS; i++) {
        if(write_event(hLog, 1000 + i)) {
            success_count++;
        } else {
            DWORD err = GetLastError();
            std::cerr << "Write failed at " << i << " (Error: " << err << ")\n";
            
            // Adaptive backoff
            std::this_thread::sleep_for(std::chrono::milliseconds(50 * (err == 1734 ? 10 : 1)));
            i--;
        }

        if((i+1) % 1000 == 0) {
            std::cout << "Progress: " << (i+1) << "/" << NUM_EVENTS 
                      << " (" << (success_count * 100 / (i+1)) << "% success)\n";
        }
    }

    DeregisterEventSource(hLog);
    std::cout << "Generation complete. Keep this terminal open!\n";
    std::this_thread::sleep_for(std::chrono::hours(1)); // Prevent log rotation
    return 0;
}

You can run this script after registering the event log "ChaoticEvent" on powershell with admin privs

New-EventLog -LogName "Application" -Source "ChaoticEvent"

@BominRahmani BominRahmani changed the title Fix: Windows Event Max Read (ERRNO 1734) [pkg/stanza] [receiver/windowseventlog] Fix: Windows Event Max Read (ERRNO 1734) Mar 10, 2025
@BominRahmani BominRahmani marked this pull request as ready for review March 10, 2025 12:35
@BominRahmani BominRahmani requested a review from a team as a code owner March 10, 2025 12:35
@BominRahmani BominRahmani force-pushed the fix/windows-event-max-read branch 3 times, most recently from 6508e01 to 283f772 Compare March 11, 2025 07:55
@BominRahmani
Copy link
Contributor Author

@pjanotti Could you take a look at this whenever you have a second? I'd be more than happy to answer any questions regarding it.

Copy link
Contributor

@pjanotti pjanotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @BominRahmani - I haven't heard of this issue before. An initial review below, will look into more detail later.

}
if len(events) == n+1 {
i.updateBookmarkOffset(ctx, event)
if err := i.subscription.bookmark.Update(event); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the issue that triggered this addition here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that when i closed the subscription and re-opened, without this addition, it would go over all the events again, the bookmark functionality wasn't working as expected. However with this addition it works. Now that I look at updateBookmarkOffset, there is probably a more elegant/less redundant way of dealing with this that i can look into

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the code use EvtSeek as proposed in the other comment this may not be needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand this: it is the same bookmark handle, passed by value, this second call should be redundant. It is in the end asking to update the same bookmark to the same event twice.

The bookmark ownership now is also divided between Input and Subscription. It seems to make sense to pass the ownership of it to the Subscription instead. If we understand what is going on here we could perhaps do that in a follow-up PR, but, right now this seems to indicate that we are missing something.

err := evtNext(s.handle, uint32(maxReads), &eventHandles[0], 0, 0, &eventsRead)

if errors.Is(err, ErrorInvalidOperation) && eventsRead == 0 {
return nil, nil
}

if err != nil && errors.Is(err, windows.RPC_S_INVALID_BOUND) {
// close current subscription
if closeErr := s.Close(); closeErr != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is necessary to close the subscription? Can't you just retry reducing the number of events?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When EvtNext Fails, it still updates internal read position associated with the event handle. I think if we don't close and re-open the subscription this would lead to lost events.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This other fix for the same issue used EvtSeek of closing and re-opening the subscription. PTAL https://github.com/osquery/osquery/pull/6660/files#diff-ef502ac70422248e983fd638d90b6cfd6dc7a1153666d2c46cf60caac6787a26R164

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked more into EvtSeek and tested it out a bit, using EvtSeek with our bookmark position doesn't seem to work, when the RPC_S_INVALID_BOUND error is encountered, i believe the event handle that the bookmark refers to is invalidated in the case of encountering this error, this is why the github PR that you linked has them using a position counter instead. I find this approach of using a position counter to be a lot more hacky than just closing and re-opening the subscription.

@BominRahmani BominRahmani force-pushed the fix/windows-event-max-read branch from 3178d8b to 60a298b Compare March 11, 2025 19:26
@BominRahmani BominRahmani requested a review from pjanotti March 11, 2025 20:06
@BominRahmani
Copy link
Contributor Author

@pjanotti Any chance you can take another look today?

Copy link
Contributor

@pjanotti pjanotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be important to have a test in which we could trigger the actual error. The C++ code is good but not deterministic. something deterministic can be added to the tests directly. My guess is that you need something like 32+ events at maximum size to trigger the issue.

change_type: bug_fix

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: pkg/stanza
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djaglowski what is the typical practice here? List only the stanza package or also include all receivers consuming it (including indirectly).

err := evtNext(s.handle, uint32(maxReads), &eventHandles[0], 0, 0, &eventsRead)

if errors.Is(err, ErrorInvalidOperation) && eventsRead == 0 {
return nil, nil
}

if err != nil && errors.Is(err, windows.RPC_S_INVALID_BOUND) {
// close current subscription
if closeErr := s.Close(); closeErr != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This other fix for the same issue used EvtSeek of closing and re-opening the subscription. PTAL https://github.com/osquery/osquery/pull/6660/files#diff-ef502ac70422248e983fd638d90b6cfd6dc7a1153666d2c46cf60caac6787a26R164

}
if len(events) == n+1 {
i.updateBookmarkOffset(ctx, event)
if err := i.subscription.bookmark.Update(event); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the code use EvtSeek as proposed in the other comment this may not be needed.

@BominRahmani BominRahmani force-pushed the fix/windows-event-max-read branch 4 times, most recently from 3d99abd to 28558e9 Compare March 18, 2025 13:41
@BominRahmani BominRahmani requested a review from pjanotti March 18, 2025 14:34
Copy link
Contributor

@pjanotti pjanotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BominRahmani I'm still worried about the bookmark ownership, please see below.

events, actualMaxReads, err := i.subscription.Read(i.currentMaxReads)

// Update the current max reads if it changed
if actualMaxReads < i.currentMaxReads {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actualMaxReads should only be used if err == nil, even if in the PR implementation took care to return the value currentMaxReads it seems something that can be easily broken in future changes

// Update the current max reads if it changed
if actualMaxReads < i.currentMaxReads {
i.currentMaxReads = actualMaxReads
i.Logger().Debug("Encountered RPC_S_INVALID_BOUND, reducing batch size", zap.Int("current batch size", i.currentMaxReads), zap.Int("original batch size", i.maxReads))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
i.Logger().Debug("Encountered RPC_S_INVALID_BOUND, reducing batch size", zap.Int("current batch size", i.currentMaxReads), zap.Int("original batch size", i.maxReads))
i.Logger().Debug("Encountered RPC_S_INVALID_BOUND, reduced batch size", zap.Int("current_batch_size", i.currentMaxReads), zap.Int("original_batch_size", i.maxReads))

}
if len(events) == n+1 {
i.updateBookmarkOffset(ctx, event)
if err := i.subscription.bookmark.Update(event); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand this: it is the same bookmark handle, passed by value, this second call should be redundant. It is in the end asking to update the same bookmark to the same event twice.

The bookmark ownership now is also divided between Input and Subscription. It seems to make sense to pass the ownership of it to the Subscription instead. If we understand what is going on here we could perhaps do that in a follow-up PR, but, right now this seems to indicate that we are missing something.

@BominRahmani
Copy link
Contributor Author

@pjanotti
I definitely agree that bookmark ownership would probably make more sense to belong in the subscription and I can probably handle that in a follow up PR.

Regarding the almost redundant usage, its a bit tricky, It is the same bookmark handle passed by value initially, however when the RPC_S_INVALID_BOUNDS error is encountered, the program will pass the subscription bookmark when re-opening the subscription, which can now have a stale/incorrect handle.

@BominRahmani BominRahmani requested a review from pjanotti March 18, 2025 20:01
@pjanotti
Copy link
Contributor

@BominRahmani let's move ahead with this PR and do a follow-up to cleanup the ownership of the bookmark.

The main thing in my mind regarding the bookmark is that if you look to its native API it is not connected to the query handle and in principle should not be affected by the EvtNext failure. Anyway, we look deeply at it on a follow-up PR.

Copy link
Contributor

@pjanotti pjanotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - there should be a follow-up to clean-up the ownership and tracking of the bookmark.

@BominRahmani BominRahmani force-pushed the fix/windows-event-max-read branch from eb3572d to 92292f4 Compare March 30, 2025 23:25
@dehaansa dehaansa added the ready to merge Code review completed; ready to merge by maintainers label Mar 31, 2025
@djaglowski djaglowski merged commit 4930224 into open-telemetry:main Mar 31, 2025
180 checks passed
@github-actions github-actions bot added this to the next release milestone Mar 31, 2025
dmathieu pushed a commit to dmathieu/opentelemetry-collector-contrib that referenced this pull request Apr 8, 2025
Fiery-Fenix pushed a commit to Fiery-Fenix/opentelemetry-collector-contrib that referenced this pull request Apr 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants