-
Notifications
You must be signed in to change notification settings - Fork 5k
Description
First things first, I've only seen this happening when using small, hand crafted for testing, journal files and passing a glob like /tmp/foo/bar/*/*
to the Journald input. Part of the problem is caused by journalctl
returning a cursor that does not perfectly match the entries read.
The first, and so far, only report of this problem came from a flaky test: [Flaky test] filebeat/input/journald TestDoubleStarCanBeUsed. At this point I'm not 100% sure this affects normal uses of the Journald input.
The problem
The Journald input does its best to read all the journal stored in the host, including past boots, to do so, it starts the journalctl
binary twice:
- First without passing the
--follow
argument, this allows us to read all past journal entries, including past boots. - Then, when the invocation listed above returns, we re-start
journalctl
with the cursor of the last event and--follow
.
The test in case creates a folder structure that looks like this:
% tree /tmp/TestDoubleStarCanBeUsed2504014044
/tmp/TestDoubleStarCanBeUsed2504014044
├── 001
├── 002
├── 003
│ └── journal1.journal
├── 004
│ └── journal2.journal
├── 005
│ └── journal3.journal
└── 006
7 directories, 3 files
The first invocation of journalctl
will read all entries, 10 from each journal, however it can happen that journalctl
exits before the input has a chance to fully read its stdout
, this happens on the following line:
data, err := reader.ReadBytes('\n') |
Even if we keep trying to read, there is no more data to be read, not all the lines are accessible. Once we re-start using the cursor and --follow
the missing lines are not returned by journalctl
.
The fact that we're using a glob to read from a folder (the correct would be to just pass the folder now that the input can correctly use the --directory
flag) and the files are small and handcrafted seems to cause the problem.
All other tests that read a single journal file are stable, I also have not seen issues when reading from an active journal.
To stop the flakiness and unlock CI, #46913 has been opened, it looses the requirements for the test: as long as there are entries from at least two journals, the test passes.