Skip to content

[pkg/stanza] Fix gzip files not being read if nonstandard format (WIP) #40492

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

schmikei
Copy link
Contributor

@schmikei schmikei commented Jun 4, 2025

Description

Work in Progress but was noticing some weird behavior with pkg/stanza's capability on reading gzipped files

# This configuration is managed by Bindplane.
# Configuration: CloudTrail_test:23
receivers:
    filelog/source1_01JWS1EGG8EWTQ98KX096RPASN:
        compression: 'gzip'
        delete_after_read: false
        force_flush_period: 1s
        include:
          - /Users/keithschmitt/resources/*.gz
        include_file_name: true
        include_file_name_resolved: false
        include_file_path: false
        include_file_path_resolved: false
        max_concurrent_files: 1024
        poll_interval: 500ms
        retry_on_failure:
            enabled: true
        start_at: beginning
        # storage: file_storage
exporters:
    file/DevNull: 
      path: /Users/keithschmitt/bp_resources/test.txt
service:
    extensions:
        - file_storage
    pipelines:
        logs/source1_01JWS1EGG8EWTQ98KX096RPASN__DevNull-0:
            receivers:
                - filelog/source1_01JWS1EGG8EWTQ98KX096RPASN
            processors: []
            exporters:
                - file/DevNull
    telemetry:
        metrics:
            readers:
                - pull:
                    exporter:
                        prometheus:
                            host: localhost
                            port: 8888
extensions:
    file_storage:
        directory: /Users/keithschmitt/bp_resources/storage
        timeout: 1s

Was noticing that for nonstandard gzipped file(or maybe custom compressed, still not sure) from AWS CloudTraill were never ingesting, I did some digging and realized that it was never getting ingested via the filelogreceiver.

What was interesting was if I downloaded the file and gunzip file.json.gz => vim file.json (modify a character) => gzip file.json it was properly parsed with the old implementation. This lead me to believe that the file must be customly compressed but is still a valid gzip.

➜ gzip -l /Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz
  compressed uncompressed  ratio uncompressed_name
        1488         5956  75.0% /Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json
  • This PR improves the gzip file reading behavior in the file consumer by:
  • Fixing incorrect offset tracking for gzip files by tracking decompressed bytes instead of file position
  • Optimizing buffer allocation by using file size information
  • Improving error handling and logging for gzip operations

Link to tracking issue

Fixes

Testing

The test verifies that:

  • We correctly read gzipped content
  • We properly handle offset tracking
  • We don't read more data than necessary
  • We correctly emit tokens in order

Documentation

Deeper code level change... user impacts for n

@schmikei
Copy link
Contributor Author

schmikei commented Jun 5, 2025

For any lost soul that gets here this sort of config worked for me:

# This configuration is managed by Bindplane.
# Configuration: CloudTrail_test:23
receivers:
    filelog/source1_01JWS1EGG8EWTQ98KX096RPASN:
        compression: 'gzip'
        delete_after_read: false
        force_flush_period: 1s
        encoding: 'nop'
        include:
          - /Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz
        include_file_name: true
        include_file_name_resolved: false
        include_file_path: false
        include_file_path_resolved: false
        max_concurrent_files: 1024
        max_log_size: 1000000000
        poll_interval: 500ms
        retry_on_failure:
            enabled: true
        start_at: beginning

processors:
  transform:
    error_mode: ignore
    log_statements:
      - context: log
        statements:
        - set(body, Decode(body, "utf-8"))
        - set(body, ParseJSON(body))

        # - set(attributes["debug.body"], body)
        # - set(body, body.Records)
exporters:
    file/DevNull: 
      path: /Users/keithschmitt/bp_resources/test.txt
service:
    extensions:
        - file_storage
    pipelines:
        logs/source1_01JWS1EGG8EWTQ98KX096RPASN__DevNull-0:
            receivers:
                - filelog/source1_01JWS1EGG8EWTQ98KX096RPASN
            processors:
                - transform
            exporters:
                - file/DevNull
    telemetry:
        # logs:
        #     level: debug
      
        metrics:
            readers:
                - pull:
                    exporter:
                        prometheus:
                            host: localhost
                            port: 8888
extensions:
    file_storage:
        directory: /Users/keithschmitt/bp_resources/storage
        timeout: 1s

@schmikei schmikei closed this Jun 5, 2025
@andrzej-stencel
Copy link
Member

So you're saying that adding encoding: nop to the File Log receiver config and later using Decode(body, "utf-8") in Transform processor worked correctly to unzip and ingest the files that otherwise did not ingested?

Were there any interesting errors or other logs from the collector when ran with the configuration that didn't work for you?

@schmikei
Copy link
Contributor Author

schmikei commented Jun 6, 2025

So you're saying that adding encoding: nop to the File Log receiver config and later using Decode(body, "utf-8") in Transform processor worked correctly to unzip and ingest the files that otherwise did not ingested?

@andrzej-stencel exactly!

Were there any interesting errors or other logs from the collector when ran with the configuration that didn't work for you?

What was interesting that the collector even with debug logs enabled don't appear to be emitting anything out of the ordinary to myself.

Debug Logs

Configuration

    filelog/source1_01JWS1EGG8EWTQ98KX096RPASN:
        compression: 'gzip'
        delete_after_read: false
        force_flush_period: 1s
        # encoding: 'nop'
        include:
          - /Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz
        include_file_name: true
        include_file_name_resolved: false
        include_file_path: false
        include_file_path_resolved: false
        max_concurrent_files: 1024
        max_log_size: 1000000000
        poll_interval: 500ms
        retry_on_failure:
            enabled: true
        start_at: beginning

=>

normal looking logs

2025-09:46 in opentelemetry-collector-contrib on  main [$?] via 🐹 v1.24.2 using ☁️  default/ took 6.9s
➜ ./bin/otelcontribcol_darwin_arm64 --config test.yaml
2025-06-06T09:47:25.079-0400	info	[email protected]/service.go:199	Setting up own telemetry...	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}}
2025-06-06T09:47:25.080-0400	debug	builders/builders.go:24	Alpha component. May change in the future.	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "file/DevNull", "otelcol.component.kind": "exporter", "otelcol.signal": "logs"}
2025-06-06T09:47:25.080-0400	debug	builders/builders.go:24	Beta component. May change in the future.	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs"}
2025-06-06T09:47:25.080-0400	debug	builders/extension.go:48	Beta component. May change in the future.	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "file_storage", "otelcol.component.kind": "extension"}
2025-06-06T09:47:25.080-0400	info	[email protected]/service.go:259	Starting otelcontribcol...	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "Version": "0.127.0-dev", "NumCPU": 12}
2025-06-06T09:47:25.080-0400	info	extensions/extensions.go:41	Starting extensions...	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}}
2025-06-06T09:47:25.080-0400	info	extensions/extensions.go:45	Extension is starting...	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "file_storage", "otelcol.component.kind": "extension"}
2025-06-06T09:47:25.080-0400	info	extensions/extensions.go:62	Extension started.	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "file_storage", "otelcol.component.kind": "extension"}
2025-06-06T09:47:25.080-0400	info	adapter/receiver.go:41	Starting stanza receiver	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs"}
2025-06-06T09:47:25.080-0400	debug	pipeline/directed.go:59	Starting operator	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "operator_id": "batching_log_emitter", "operator_type": "batching_log_emitter"}
2025-06-06T09:47:25.080-0400	debug	pipeline/directed.go:63	Started operator	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "operator_id": "batching_log_emitter", "operator_type": "batching_log_emitter"}
2025-06-06T09:47:25.080-0400	debug	pipeline/directed.go:59	Starting operator	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "operator_id": "file_input", "operator_type": "file_input"}
2025-06-06T09:47:25.081-0400	debug	archive	archive/archive.go:34	archiving is disabled. enable pollsToArchive and storage settings to save offsets on disk.	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "component": "fileconsumer", "tracker": "fileTracker"}
2025-06-06T09:47:25.081-0400	debug	pipeline/directed.go:63	Started operator	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "operator_id": "file_input", "operator_type": "file_input"}
2025-06-06T09:47:25.081-0400	info	[email protected]/service.go:282	Everything is ready. Begin running and processing data.	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}}
2025-06-06T09:47:25.582-0400	debug	fileconsumer/file.go:125	matched files	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "component": "fileconsumer", "paths": ["/Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz"]}
2025-06-06T09:47:25.582-0400	debug	fileconsumer/file.go:157	Consuming files	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "component": "fileconsumer", "paths": ["/Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz"]}
2025-06-06T09:47:25.582-0400	info	fileconsumer/file.go:267	Started watching file	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "component": "fileconsumer", "path": "/Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz"}
2025-06-06T09:47:26.082-0400	debug	fileconsumer/file.go:125	matched files	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "component": "fileconsumer", "paths": ["/Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz"]}
2025-06-06T09:47:26.082-0400	debug	fileconsumer/file.go:157	Consuming files	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "component": "fileconsumer", "paths": ["/Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz"]}
2025-06-06T09:47:26.582-0400	debug	fileconsumer/file.go:125	matched files	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "component": "fileconsumer", "paths": ["/Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz"]}
2025-06-06T09:47:26.582-0400	debug	fileconsumer/file.go:157	Consuming files	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "component": "fileconsumer", "paths": ["/Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz"]}
2025-06-06T09:47:27.081-0400	debug	fileconsumer/file.go:125	matched files	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "component": "fileconsumer", "paths": ["/Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz"]}
2025-06-06T09:47:27.081-0400	debug	fileconsumer/file.go:157	Consuming files	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "component": "fileconsumer", "paths": ["/Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz"]}
2025-06-06T09:47:27.582-0400	debug	fileconsumer/file.go:125	matched files	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "component": "fileconsumer", "paths": ["/Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz"]}
2025-06-06T09:47:27.582-0400	debug	fileconsumer/file.go:157	Consuming files	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "component": "fileconsumer", "paths": ["/Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz"]}
2025-06-06T09:47:28.082-0400	debug	fileconsumer/file.go:125	matched files	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "component": "fileconsumer", "paths": ["/Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz"]}
2025-06-06T09:47:28.082-0400	debug	fileconsumer/file.go:157	Consuming files	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "component": "fileconsumer", "paths": ["/Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz"]}
2025-06-06T09:47:28.582-0400	debug	fileconsumer/file.go:125	matched files	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "component": "fileconsumer", "paths": ["/Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz"]}
2025-06-06T09:47:28.582-0400	debug	fileconsumer/file.go:157	Consuming files	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "component": "fileconsumer", "paths": ["/Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz"]}
2025-06-06T09:47:29.081-0400	debug	fileconsumer/file.go:125	matched files	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "component": "fileconsumer", "paths": ["/Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz"]}
2025-06-06T09:47:29.082-0400	debug	fileconsumer/file.go:157	Consuming files	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "component": "fileconsumer", "paths": ["/Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz"]}
2025-06-06T09:47:29.582-0400	debug	fileconsumer/file.go:125	matched files	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "component": "fileconsumer", "paths": ["/Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz"]}
2025-06-06T09:47:29.582-0400	debug	fileconsumer/file.go:157	Consuming files	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "component": "fileconsumer", "paths": ["/Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz"]}
2025-06-06T09:47:30.082-0400	debug	fileconsumer/file.go:125	matched files	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "component": "fileconsumer", "paths": ["/Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz"]}
2025-06-06T09:47:30.082-0400	debug	fileconsumer/file.go:157	Consuming files	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "component": "fileconsumer", "paths": ["/Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz"]}
^C2025-06-06T09:47:32.780-0400	info	[email protected]/collector.go:358	Received signal from OS	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "signal": "interrupt"}
2025-06-06T09:47:32.780-0400	info	[email protected]/service.go:324	Starting shutdown...	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}}
2025-06-06T09:47:32.780-0400	info	adapter/receiver.go:68	Stopping stanza receiver	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs"}
2025-06-06T09:47:32.780-0400	debug	pipeline/directed.go:74	Stopping operator	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "operator_id": "file_input", "operator_type": "file_input"}
2025-06-06T09:47:32.780-0400	debug	pipeline/directed.go:78	Stopped operator	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "operator_id": "file_input", "operator_type": "file_input"}
2025-06-06T09:47:32.780-0400	debug	pipeline/directed.go:74	Stopping operator	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "operator_id": "batching_log_emitter", "operator_type": "batching_log_emitter"}
2025-06-06T09:47:32.780-0400	debug	pipeline/directed.go:78	Stopped operator	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}, "otelcol.component.id": "filelog/source1_01JWS1EGG8EWTQ98KX096RPASN", "otelcol.component.kind": "receiver", "otelcol.signal": "logs", "operator_id": "batching_log_emitter", "operator_type": "batching_log_emitter"}
2025-06-06T09:47:32.780-0400	info	extensions/extensions.go:69	Stopping extensions...	{"resource": {"service.instance.id": "1bc7e84b-668e-48dc-9294-91c93b1ff54f", "service.name": "otelcontribcol", "service.version": "0.127.0-dev"}}
2025-06-06T09:47:32.780-0400	info	[email protected]/service.go:33

Important notes about this file:

  • Single JSON object with no \n character in the uncompressed file
  • If I add any character to the file and recompress using gunzip allows for proper ingestion using the default encoding which was the most bizarre thing
  • Checked using the other encoders utf-8, utf-16, ascii, etc and none of those worked so we had to ingest bytes directly and transform later in the pipeline

I had a theory that since the gzipped file did not have a \n character within the compressed content that the splitFunc was causing our issue here as we read a file and never emit because we never hit that character. But that was at least my theory before I found the workaround.

These logs are coming from AWS CloudTrail so I'm a tad hesitant to share in a public message, feel free to reach out on the CNCF slack if you'd like a copy of the file. Also noted that the header file for this log seemed a tad strange with lacking some metadata within the header but overall seemed like a red-herring but maybe indicates some kind of custom compression happening 🤔

➜ hexdump -C -n 32 /Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz
00000000  1f 8b 08 00 00 00 00 00  00 ff ed 58 5b 73 da 38  |...........X[s.8|
00000010  14 fe 2b 1e 3d 6d 07 0c  be df 76 f6 81 db 02 01  |..+.=m....v.....|
00000020
➜ gzip -tv /Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz
/Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz:	  OK
2025-10:06 in ~/bp_resources using ☁️  default/ took 19m 28.7s
➜ gunzip /Users/keithschmitt/bp_resources/776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json.gz

2025-10:07 in ~/bp_resources using ☁️  default/
➜ file 776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json
776660375188_CloudTrail_us-east-1_20250602T1905Z_mZfBOuEMPCmhbWpK.json: JSON data

Still trying to understand what's happening but the file is definitely not hitting my pipeline if not encoding: nop 🤔 Let me know if you have any further insights or would like the file to debug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants