-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[pkg/stanza] Fix gzip files not being read if nonstandard format (WIP) #40492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pkg/stanza] Fix gzip files not being read if nonstandard format (WIP) #40492
Conversation
For any lost soul that gets here this sort of config worked for me: # This configuration is managed by Bindplane.
# Configuration: CloudTrail_test:23
receivers:
filelog/source1_01JWS1EGG8EWTQ98KX096RPASN:
compression: 'gzip'
delete_after_read: false
force_flush_period: 1s
encoding: 'nop'
include:
- /Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz
include_file_name: true
include_file_name_resolved: false
include_file_path: false
include_file_path_resolved: false
max_concurrent_files: 1024
max_log_size: 1000000000
poll_interval: 500ms
retry_on_failure:
enabled: true
start_at: beginning
processors:
transform:
error_mode: ignore
log_statements:
- context: log
statements:
- set(body, Decode(body, "utf-8"))
- set(body, ParseJSON(body))
# - set(attributes["debug.body"], body)
# - set(body, body.Records)
exporters:
file/DevNull:
path: /Users/keithschmitt/bp_resources/test.txt
service:
extensions:
- file_storage
pipelines:
logs/source1_01JWS1EGG8EWTQ98KX096RPASN__DevNull-0:
receivers:
- filelog/source1_01JWS1EGG8EWTQ98KX096RPASN
processors:
- transform
exporters:
- file/DevNull
telemetry:
# logs:
# level: debug
metrics:
readers:
- pull:
exporter:
prometheus:
host: localhost
port: 8888
extensions:
file_storage:
directory: /Users/keithschmitt/bp_resources/storage
timeout: 1s
|
So you're saying that adding Were there any interesting errors or other logs from the collector when ran with the configuration that didn't work for you? |
@andrzej-stencel exactly!
What was interesting that the collector even with debug logs enabled don't appear to be emitting anything out of the ordinary to myself. Debug LogsConfiguration filelog/source1_01JWS1EGG8EWTQ98KX096RPASN:
compression: 'gzip'
delete_after_read: false
force_flush_period: 1s
# encoding: 'nop'
include:
- /Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz
include_file_name: true
include_file_name_resolved: false
include_file_path: false
include_file_path_resolved: false
max_concurrent_files: 1024
max_log_size: 1000000000
poll_interval: 500ms
retry_on_failure:
enabled: true
start_at: beginning => normal looking logs
Important notes about this file:
I had a theory that since the gzipped file did not have a These logs are coming from AWS CloudTrail so I'm a tad hesitant to share in a public message, feel free to reach out on the CNCF slack if you'd like a copy of the file. Also noted that the header file for this log seemed a tad strange with lacking some metadata within the header but overall seemed like a red-herring but maybe indicates some kind of custom compression happening 🤔
Still trying to understand what's happening but the file is definitely not hitting my pipeline if not |
Description
Work in Progress but was noticing some weird behavior with
pkg/stanza
's capability on reading gzipped filesWas noticing that for nonstandard gzipped file(or maybe custom compressed, still not sure) from AWS CloudTraill were never ingesting, I did some digging and realized that it was never getting ingested via the filelogreceiver.
What was interesting was if I downloaded the file and
gunzip file.json.gz
=>vim file.json
(modify a character) =>gzip file.json
it was properly parsed with the old implementation. This lead me to believe that the file must be customly compressed but is still a valid gzip.Link to tracking issue
Fixes
Testing
The test verifies that:
Documentation
Deeper code level change... user impacts for n