Skip to content

[Filebeat/Journald] might miss some entries when reading from small files #46933

@belimawr

Description

@belimawr

First things first, I've only seen this happening when using small, hand crafted for testing, journal files and passing a glob like /tmp/foo/bar/*/* to the Journald input. Part of the problem is caused by journalctl returning a cursor that does not perfectly match the entries read.

The first, and so far, only report of this problem came from a flaky test: [Flaky test] filebeat/input/journald TestDoubleStarCanBeUsed. At this point I'm not 100% sure this affects normal uses of the Journald input.

The problem

The Journald input does its best to read all the journal stored in the host, including past boots, to do so, it starts the journalctl binary twice:

  • First without passing the --follow argument, this allows us to read all past journal entries, including past boots.
  • Then, when the invocation listed above returns, we re-start journalctl with the cursor of the last event and --follow.

The test in case creates a folder structure that looks like this:

% tree /tmp/TestDoubleStarCanBeUsed2504014044
/tmp/TestDoubleStarCanBeUsed2504014044                                                                             
├── 001
├── 002
├── 003
│   └── journal1.journal
├── 004
│   └── journal2.journal
├── 005
│   └── journal3.journal
└── 006

7 directories, 3 files

The first invocation of journalctl will read all entries, 10 from each journal, however it can happen that journalctl exits before the input has a chance to fully read its stdout, this happens on the following line:

data, err := reader.ReadBytes('\n')

Even if we keep trying to read, there is no more data to be read, not all the lines are accessible. Once we re-start using the cursor and --follow the missing lines are not returned by journalctl.

The fact that we're using a glob to read from a folder (the correct would be to just pass the folder now that the input can correctly use the --directory flag) and the files are small and handcrafted seems to cause the problem.

All other tests that read a single journal file are stable, I also have not seen issues when reading from an active journal.

To stop the flakiness and unlock CI, #46913 has been opened, it looses the requirements for the test: as long as there are entries from at least two journals, the test passes.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions