Skip to content

[Filestream] Files can be re-ingested on start up because of clean_removed: true (that's the default) #43649

@belimawr

Description

@belimawr

When Filebeat is restarted it can re-ingest files if hey have been rotated and the rotated paths are also monitored by Filestream.

Given the following Filestream configuration:

filebeat.inputs:
 - type: filestream
   id: oops-I-re-ingested-a-file
   paths:
     - /tmp/*.log

output.console:
  enabled: true
  pretty: true

And a file /tmp/foo.log with some data.

  1. Create a file with some data docker run -it --rm mingrammer/flog -n 2 > /tmp/flog.log
  2. Start Filebeat with the configuration above
  3. Wait until the file is fully ingested (no more events on the console)
  4. Stop Filebeat
  5. Move /tmp/foo.log to /tmp/foo-1.log: mv /tmp/flog.log /tmp/flog-1.log
  6. Start Filebeat

Once Filebeat is restarted the file is re-ingested.

However if after moving /tmp/foo.log to /tmp/foo-1.log a new /tmp/foo.log is created (the contents do not matter), like on a common log rotation strategy, no data is duplicated.

The actual issue comes from how the store clean up is implemented:

if p.cleanRemoved {
cleaner.CleanIf(func(v loginp.Value) bool {
var fm fileMeta
err := v.UnpackCursorMeta(&fm)
if err != nil {
// remove faulty entries
return true
}
_, ok := files[fm.Source]
return !ok
})
}

It checks if meta.source from the registry entry matches any of the current files discovered by the filewatcher if they do not match, then the entry is removed.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions