-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
A note for the community
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Use Cases
A third party program (on which I have little to no control) outputs its logs in multi-line xml. If that wasn't bad enough, the logs always start with a newline but do not end with a newline.
The log will look something like this:
<?xml version="1.0" encoding="UTF-8"?><a><lot><of></of></lot><tags/></a>
<?xml version="1.0"?>
<a>
<lot>
<of></of>
</lot>
<tags/>
</a>
<?xml version="1.0" encoding="UTF-8"?><a><lot><of>
</of></lot><tags/></a>
<?xml version="1.0"?>
<a><lot><of></of></lot><tags/></a>
Attempted Solutions
I've written a set of transform to read these log which works fine when sourcing a static file with a newline at the end:
sources:
source_logs_xml:
type: file
include:
- /var/log/log.xml
multiline:
start_pattern: "^<\?xml"
mode: halt_with
condition_pattern: "</a>$"
timeout_ms: 1000
transforms:
drop_empty_xml_logs:
type: filter
inputs:
- source_logs_xml
condition: "!is_nullish(.message)"
transform_xml_logs:
inputs:
- drop_empty_xml_logs
source: |-
. |= object!(parse_xml!(.message))
del(.message)
type: remap
However when reading the actual log file, due to the lack of newline character on the last line, the input received by the transform looks like (ignoring empty logs):
<?xml version="1.0" encoding="UTF-8"?><a><lot><of></of></lot><tags/></a>
<?xml version="1.0"?>
<a>
<lot>
<of></of>
</lot>
<tags/>
</a>
<?xml version="1.0" encoding="UTF-8"?><a><lot><of>
</of></lot><tags/></a>
<?xml version="1.0"?>
<a><lot><of></of></lot><tags/></a>
As you'd expected, not much ends up as valid xml.
Proposal
As suggested by @jszwedko in #18341 , a configurable timeout for vector to consider the current line complete would alleviate this problem. In my use case it could be set to a value slightly lower than multiline.timeout_ms to ensure the last line gets properly included.
Alternatively multiline could include the current line buffer when its timeout expires, but I find this solution less elegant and less logical.
References
Version
vector 0.32.1 (x86_64-unknown-linux-gnu 9965884 2023-08-21 14:52:38.330227446)