-
Notifications
You must be signed in to change notification settings - Fork 5k
Description
At the moment of writing this only applies to the unreleased versions:
- 8.18 and 9.x for the file identity migration
- 9.1.x and
main
for the Take over
If a file is moved while Filebeat is offline and file identity needs to be migrated, the file will be re-ingested.
For the take over mode, the issue is the same, but the "original" file was ingested by the log input, it got moved while Filebeat was offline and once Filebeat restarts with the Filestream take over mode, the "new" file does not have its state migrated because its path has changed.
Warning
There is a similar issue that causes data re-ingestion because of the store cleanup: #43649. Because the order of operations when the Filestream input is starting, the store clean up bug removes the state before the file identity migration run.
So to reproduce this bug the store clean up needs to be disabled clean_removed: false
.
Steps to reproduce from main
-
Create a log file with at least 1kb of data:
docker run -it --rm mingrammer/flog -b 1024 > /tmp/flog.log
-
Start filebeat with the following configuration
filebeat.yml
filebeat.inputs: - type: filestream id: oops-I-re-ingested-a-file paths: - /tmp/*.log file_identity.native: ~ clean_removed: false output.console: enabled: true pretty: true
-
Wait until all data has been ingested
-
Stop Filebeat
-
Move
/tmp/foo.log
to/tmp/foo-1.log
:mv /tmp/flog.log /tmp/flog-1.log
-
Update the configuration file (remove the
file_identity
setting asfingerprint
is the new default formain
)filebeat.yml
filebeat.inputs: - type: filestream id: oops-I-re-ingested-a-file paths: - /tmp/*.log #file_identity.native: ~ clean_removed: false output.console: enabled: true pretty: true
-
Start Filebeat
-
The file will be fully re-ingested
The cause
The problem comes from checking the the path in the resource metadata with the current files found by the file watcher, specifically this four lines of code:
beats/filebeat/input/filestream/prospector.go
Lines 143 to 146 in 8920a05
fd, ok := files[fm.Source] | |
if !ok { | |
return "", fm | |
} |
There is also a chicken-and-egg problem here, if a file is moved while Filebeat is offline, then the registry has got the old path in the metadata. Even if we get the files returned by the filewatcher
, it's not guaranteed we can rebuild the old registry key because it uses the old file identity, which is not known.
Even if we iterate over the files filewatcher
found and try checking the registry for keys generated by the 'old file identity' (either assuming it's native
or trying native
and path
) that is not 100% guaranteed because we know inodes can change and so can paths. So no matter the previous file identity, there is always a chance it will not be deterministic.