[receiver/awsfirehosereceiver] Support receiving multiple record type using single endpoint #35988

kaiyan-sheng · 2024-10-24T22:54:43Z

Component(s)

receiver/awsfirehose

Is your feature request related to a problem? Please describe.

When trying to ingest both logs and metrics from Firehose, I tried with two receivers configured with the same endpoint:

receivers:
      awsfirehose/cwmetrics:
        endpoint: {{ (.Values.receiver).hostname | default "localhost"}}:4433
        include_metadata: true
        record_type: cwmetrics
      awsfirehose/cwlogs:
        endpoint: {{ (.Values.receiver).hostname | default "localhost"}}:4433
        include_metadata: true
        record_type: cwlogs

This gave me an error:

2024-10-23T20:01:59.169Z	error	graph/graph.go:426	Failed to start component	{"error": "listen tcp 127.0.0.1:4433: bind: address already in use", "type": "Receiver", "id": "awsfirehose/cwmetrics"}

In order to workaround this issue, I have to use two different endpoints, one for metrics and one for logs.

I would like to have a single endpoint to ingest both metrics and logs.

Describe the solution you'd like

We should be able to check what does the request from Firehose looks like and figure out if it's logs or metrics, if it's JSON format metrics or OTEL1.0 format metrics and then apply the right marshaling.

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2024-10-24T22:54:58Z

Pinging code owners:

receiver/awsfirehose: @Aneurysm9

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Aneurysm9 · 2024-11-27T17:12:31Z

I would strongly recommend simply using different ports for each signal type. The collector requires unique component instances per signal type, even for components that support multiple signal types, and the mechanism used by the OTLP receiver to support multiple signal types on the same port is generally considered a hack and may not be supported going forward.

Beyond that, the type of configuration presented in the issue description here doesn't really fit in with the sharedcomponent model, which expects the same configuration of the component to be used for multiple signal types by allowing signal type-specific instances to share an HTTP handler.

axw · 2024-11-27T23:40:30Z

Beyond that, the type of configuration presented in the issue description here doesn't really fit in with the sharedcomponent model, which expects the same configuration of the component to be used for multiple signal types by allowing signal type-specific instances to share an HTTP handler.

Agreed, I think @kaiyan-sheng was just showing what she tried to configure. It fails of course because two instances of the receiver cannot bind to the same host:port.

The proposal would be to either get rid of the record_type config setting, or introduce something like record_type: auto, and determine the record type from the data: check for CloudWatch JSON/OTLP metrics, and otherwise treat the data as logs.

Minimal config would be something like:

receivers:
  awsfirehose:
    endpoint: hostname:1234

So there would be no requirement for multiple receiver instances.

Aneurysm9 · 2024-11-27T23:53:31Z

The proposal would be to either get rid of the record_type config setting, or introduce something like record_type: auto, and determine the record type from the data: check for CloudWatch JSON/OTLP metrics, and otherwise treat the data as logs.

I don't think this is viable. There are already three different record types supported for two different signal types and more can be added in the future. These formats are not self-describing and I'm not sure it would be advisable to attempt to unmarshal every inbound request with every type of unmarshaller.

Rather than introducing more complexity locally why can the existing capability to listen on different ports not be used?

axw · 2024-11-28T00:44:09Z

Having multiple ports is an option, it's what we're currently doing. We're looking for a way to simplify our user experience. The goal is to have an experience similar to configuring an OTel SDK/collector for sending logs/metrics/traces/profiles to a single OTLP endpoint, regardless of the signal type or protobuf/JSON encoding.

There's a bit of cognitive overhead in needing to know which URL/port to send to based on the signal type and encoding. Imagine you're creating a CloudWatch Metric Stream: if you change from the JSON encoding to OTLP 1.0, now you need to remember to use a different port. Instead of making it the user's problem, we'd like to offer a single Firehose endpoint which can determine the data format itself.

It is possible to deduce the type by sniffing the data without attempting a full unmarshal of each type -- we've implemented this. To be fair though, that's not necessarily future proof.

constanca-m · 2024-12-05T10:35:16Z

These formats are not self-describing and I'm not sure it would be advisable to attempt to unmarshal every inbound request with every type of unmarshaller.

@Aneurysm9 This would not happen. The way I envision this going would be:

We unmarshall the request to map[string]interface{}.
Then based on this map, we will determine which type of record it is. We can make use of the existent functions already:
- If it is a cloudwatch metric, then it has these fields.
- If it is a cloudwatch log, then it has these fields.
- Otherwise, it is is a log directly sent to firehose.

Does this sound OK to you?

I am taking over the PR #36184, and I have come across this need because if we set a new record type firehoselogs, all the other types become redundant.

Additionally the code and comments seem to not be up to date:

opentelemetry-collector-contrib/receiver/awsfirehosereceiver/factory.go

Lines 76 to 84 in f55d810

    
           // createDefaultConfig creates a default config with the endpoint set 
        
           // to port 8443 and the record type set to the CloudWatch metric stream. 
        
           func createDefaultConfig() component.Config { 
        
           	return &Config{ 
        
           		ServerConfig: confighttp.ServerConfig{ 
        
           			Endpoint: testutil.EndpointForPort(defaultPort), 
        
           		}, 
        
           	} 
        
           }

We do not have a default record_type right now. If we add this new type, we could have this default configuration again. We could still keep all these types, and leave to the users to decide if they rather specify their own record type themselves or use an automatic record type.

What do you think @Aneurysm9 ?

Aneurysm9 · 2025-01-15T20:06:10Z

These formats are not self-describing and I'm not sure it would be advisable to attempt to unmarshal every inbound request with every type of unmarshaller.

@Aneurysm9 This would not happen. The way I envision this going would be:

1. We unmarshall the request to `map[string]interface{}`.

2. Then based on this map, we will determine which type of record it is. We can make use of the existent functions already:
   
   * If it is a cloudwatch metric, then it has [these fields](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/f55d810dea78c9b8bfe8ac4cd39247d5d514b113/receiver/awsfirehosereceiver/internal/unmarshaler/cwmetricstream/unmarshaler.go#L93).
   * If it is a cloudwatch log, then it has [these fields](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/f55d810dea78c9b8bfe8ac4cd39247d5d514b113/receiver/awsfirehosereceiver/internal/unmarshaler/cwlog/unmarshaler.go#L98).
   * Otherwise, it is is a log directly sent to firehose.

Does this sound OK to you?

No, this makes no sense to me at all. Why would we assume that the input can be unmarshalled to map[string]any? Maybe it's []map[string]any or map[any]any or []string. Why would we assume that, if it did, we could correctly identify the semantics of the input data from some subset of fields it may contain? The fields you reference are required fields on specific types of data that would only be present if an incoming message had already been unmarshalled to that type. They do not identify the type of data in any way.

The goal, as stated by @axw is:

to have an experience similar to configuring an OTel SDK/collector for sending logs/metrics/traces/profiles to a single OTLP endpoint, regardless of the signal type or protobuf/JSON encoding.

That's not really possible here, though since OTLP uses additional out-of-band information to convey the signal type (gRPC method or HTTP URI path) and content type/encoding headers that are not present in requests from firehose. This receiver must be pre-configured to know what to expect.

I am taking over the PR #36184, and I have come across this need because if we set a new record type firehoselogs, all the other types become redundant.

I don't understand this. Can you elaborate on how adding a new record type makes all other types redundant?

Additionally the code and comments seem to not be up to date:

opentelemetry-collector-contrib/receiver/awsfirehosereceiver/factory.go

Lines 76 to 84 in f55d810
// createDefaultConfig creates a default config with the endpoint set
// to port 8443 and the record type set to the CloudWatch metric stream.
func createDefaultConfig() component.Config {
return &Config{
ServerConfig: confighttp.ServerConfig{
Endpoint: testutil.EndpointForPort(defaultPort),
},
}
}

We do not have a default record_type right now. If we add this new type, we could have this default configuration again. We could still keep all these types, and leave to the users to decide if they rather specify their own record type themselves or use an automatic record type.

There is a default record type per signal. The only way this comment is out of sync with reality is that #35077 didn't update the comment to reflect that the defaults are now per-signal.

github-actions · 2025-05-07T03:41:13Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

receiver/awsfirehosereceiver: @Aneurysm9 @axw

See Adding Labels via Comments if you do not have permissions to add labels yourself.

kaiyan-sheng added enhancement New feature or request needs triage New item requiring triage labels Oct 24, 2024

github-actions bot added the receiver/awsfirehose label Oct 24, 2024

This was referenced Oct 29, 2024

Weekly Report: 2024-10-22 - 2024-10-29 #36039

Closed

Weekly Report: 2024-10-29 - 2024-11-05 #36187

Closed

github-actions bot mentioned this issue Nov 12, 2024

Weekly Report: 2024-11-05 - 2024-11-12 #36302

Closed

github-actions bot mentioned this issue Nov 19, 2024

Weekly Report: 2024-11-12 - 2024-11-19 #36426

Closed

github-actions bot mentioned this issue Nov 26, 2024

Weekly Report: 2024-11-19 - 2024-11-26 #36533

Closed

github-actions bot mentioned this issue Dec 3, 2024

Weekly Report: 2024-11-26 - 2024-12-03 #36628

Closed

constanca-m mentioned this issue Dec 6, 2024

[awsfirehosereceiver] Add auto type #36708

Closed

github-actions bot mentioned this issue Dec 10, 2024

Weekly Report: 2024-12-03 - 2024-12-10 #36739

Closed

This was referenced Dec 17, 2024

Weekly Report: 2024-12-10 - 2024-12-17 #36867

Closed

Weekly Report: 2024-12-17 - 2024-12-24 #36929

Closed

github-actions bot mentioned this issue Dec 31, 2024

Weekly Report: 2024-12-24 - 2024-12-31 #36987

Closed

This was referenced Jan 7, 2025

Weekly Report: 2024-12-31 - 2025-01-07 #37048

Closed

Weekly Report: 2025-01-07 - 2025-01-14 #37181

Closed

github-actions bot mentioned this issue Jan 21, 2025

Weekly Report: 2025-01-14 - 2025-01-21 #37358

Closed

This was referenced Jan 28, 2025

Weekly Report: 2025-01-21 - 2025-01-28 #37519

Closed

Weekly Report: 2025-01-28 - 2025-02-04 #37653

Closed

This was referenced Feb 11, 2025

Weekly Report: 2025-02-04 - 2025-02-11 #37818

Closed

Weekly Report: 2025-02-11 - 2025-02-18 #37985

Closed

github-actions bot mentioned this issue Feb 25, 2025

Weekly Report: 2025-02-18 - 2025-02-25 #38152

Closed

github-actions bot mentioned this issue Mar 4, 2025

Weekly Report: 2025-02-25 - 2025-03-04 #38321

Closed

atoulme removed the needs triage New item requiring triage label Mar 8, 2025

github-actions bot added the Stale label May 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[receiver/awsfirehosereceiver] Support receiving multiple record type using single endpoint #35988

[receiver/awsfirehosereceiver] Support receiving multiple record type using single endpoint #35988

kaiyan-sheng commented Oct 24, 2024

github-actions bot commented Oct 24, 2024

Uh oh!

Aneurysm9 commented Nov 27, 2024

Uh oh!

axw commented Nov 27, 2024

Uh oh!

Aneurysm9 commented Nov 27, 2024

Uh oh!

axw commented Nov 28, 2024

Uh oh!

constanca-m commented Dec 5, 2024 •

edited

Loading

Uh oh!

Aneurysm9 commented Jan 15, 2025

Uh oh!

github-actions bot commented May 7, 2025

Uh oh!

[receiver/awsfirehosereceiver] Support receiving multiple record type using single endpoint #35988

[receiver/awsfirehosereceiver] Support receiving multiple record type using single endpoint #35988

Comments

kaiyan-sheng commented Oct 24, 2024

Component(s)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

github-actions bot commented Oct 24, 2024

Uh oh!

Aneurysm9 commented Nov 27, 2024

Uh oh!

axw commented Nov 27, 2024

Uh oh!

Aneurysm9 commented Nov 27, 2024

Uh oh!

axw commented Nov 28, 2024

Uh oh!

constanca-m commented Dec 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Aneurysm9 commented Jan 15, 2025

Uh oh!

github-actions bot commented May 7, 2025

Uh oh!

constanca-m commented Dec 5, 2024 •

edited

Loading