Skip to content

[receiver/filelog] Intelligent File Detection and Reading for Rotated Files #22998

@Mrod1598

Description

@Mrod1598

Component(s)

receiver/filelog

Is your feature request related to a problem? Please describe.

Yes, our feature request is related to a problem experienced with file detection and reading an environment with no concept
of a current file with a set name. It has a large group of files, all timestamped, which rotates continuously. It has been
challenging to accurately identify and read the "current" file within this pool of rotating files. The inability to effectively filter
these files leads to excessive CPU usage, as the system attempts to read more than just the current file as we need to check
that none of the other files have been updated.

Describe the solution you'd like

We propose an approach that involves utilizing a sequence of ordering filter rules to determine the most recent file. In cases where multiple groups are necessary, it would be more effective to use multiple receivers.

We also consider some assumptions:

It might be possible to have only one group, which could simplify the process, assuming the user specifies a matching
pattern that matches one group.
The most recent file could be determined by an integer in the filename, which would facilitate the process.
The filename format could be year, month, day, sequence number.

EX:

err.2023053001.log
err.2023053002.log
err.2023053003.log
err.2023053101.log
err.2023053102.log
err.2023053103.log

The solution should provide the capability to define alternate ordering strategies with different parsing/sorting techniques
such as:

  • Timestamp only
  • Integer only
  • Timestamp & integer, with primary sort based on timestamp and secondary sort based on integer.

Lastly, we suggest creating a configuration section that applies these sorting methods in order of priority.
In the proposed solution, we will introduce a new top-level key, tentatively named file_name_filtering_rules. This key will
have a list of filtering rules as its value, and these rules will be applied in sequence.

A single rule will comprise the following fields:

regex: A regular expression with a single capture group called value. This will be used against each filename, and the
contents of value will be used for the rule.

sort_type: Determines how the values of value are compared and sorted. Valid entries are timestamp, integer, and
alphabetical.

format : If sort_type is timestamp, this field determines how to parse the timestamp. The stanza timestamp parsing logic can likely be applied here.

ascending: A boolean value which, if true, signals to sort in ascending order. If false, it sorts in descending order.

Example Config:

filelog:
  include: [dir/Error.*.log]
  file_name_filtering_rules:
    - regex: ¹/dir/Error\.(?P<value>\d{8}).*'
      sort_type: timestamp
      format: '%Y%M³D'
      ascending: true
    - regex: '/dir/Error\.\d{8}(?P<value>\d{2}).*'
      sort_type: integer
      ascending: true

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions