Skip to content

Should language SDK generate service.instance.id? #3136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jack-berg opened this issue Jan 23, 2023 · 17 comments
Closed

Should language SDK generate service.instance.id? #3136

jack-berg opened this issue Jan 23, 2023 · 17 comments
Labels
spec:resource Related to the specification/resource directory

Comments

@jack-berg
Copy link
Member

The resource semantic conventions say the following about service.instance.id:

If the service has no inherent unique ID that can be used as the value of this attribute it is recommended to generate a random Version 1 or Version 4 RFC 4122 UUID (services aiming for reproducible UUIDs may also use Version 5, see RFC 4122 for more recommendations).

The language is ambiguous as it's not clear "what" / "who" is supposed to generate the random Version 1 or Version 4 UUID. Is it the application owner? Is it the language SDK?

There's a request in opentelemetry-java that we generate service.instance.id. We discussed adding it in the Java SIG and there was debate about whether this was actually the intent of the spec, and if it is, whether its a good idea. The argument against including it can probably be summarized as:

  • Adding a random service.instance.id that is not persistent across restarts increases cardinality. While this is true I disagree that it is a problem, since I think the number of users negatively impacted would be small and they have options available to solve the problem.
  • Adding a random service.instance.id that doesn't reflect the identifier of the instance in its environment (i.e. is different than the pod uid) may be confusing to users. I disagree that this is a problem because users should set service.instance.id to an inherent unique ID if available.

Maybe I'm missing part of the debate but in any case it would be good if the spec could clear up this ambiguity.

This may be a duplicate of #1034, but that issue's description appears to be asking about alternates for service.instance.id where I'm asking about what is specifically responsible for generating service.instance.id in the current wording of the spec.

@svrnm
Copy link
Member

svrnm commented Jan 24, 2023

Adding my 2 cents to that:

I would prefer a solution where the SDK is providing the service.instance.id if it is not already defined by the end user (via environment variable, config or code). It's not something most end-users care about until it comes to observability where they want to say "give me all data for that one particular service instance" and then they figure out they have no unique identifier...

I also don't think it should be persistent across restarts, here's my rational for that: Right now if I have a highly scaled service with lots of nodes I can not identify which service is the troublemaker, except through other IDs that might be available if running on K8s (pod.id), some container runtime (container.id*) or a physical host (host.id), etc., but it's not their purpose to identify a service, which brings us back to a restart: if I (soft) restart a service within a container, the pod&container&host ID stays the same: So in the worst case I have a service that restarted a few times, but I can not uniquely identify an instance.

@Oberon00
Copy link
Member

Oberon00 commented Jan 24, 2023

This may be a duplicate of #1034, but that issue's description appears to be asking about alternates for service.instance.id where I'm asking about what is specifically responsible for generating service.instance.id in the current wording of the spec.

I think that is mostly what the discussion on #1034 is about. Repeating my comment from there: #1034 (comment):

IMHO this attribute is poorly defined right now as it may or may not be the same across service restarts, which IMHO can make quite a difference. It would be easiest if it MUST be the different for each restart, that way it could be used as primary key for all resources (not only service.*) sent by the same telemetry instance. On the other hand, maybe such an attribute would better be named telemetry.sdk.instance.id.

@tsloughter
Copy link
Member

As a datapoint, in the Erlang SDK we use the node name as the instance id resource attribute. If the node is run without distributed erlang enabled (so no nodename) we simply use a random integer as the id -- same as using the default otel trace id generator.

nodename is somename@IP or somename@hostname, so stays the same between restarts if the node is on the same ip/hostname.

@sirzooro
Copy link

I also don't think it should be persistent across restarts, here's my rational for that: Right now if I have a highly scaled service with lots of nodes I can not identify which service is the troublemaker, except through other IDs that might be available if running on K8s (pod.id), some container runtime (container.id*) or a physical host (host.id), etc., but it's not their purpose to identify a service, which brings us back to a restart: if I (soft) restart a service within a container, the pod&container&host ID stays the same: So in the worst case I have a service that restarted a few times, but I can not uniquely identify an instance.

I have opposite use case: I have exactly 2 instances of service X and 10 of Y. These instances are numbered (1-2, 1-10) and their numbers are statically assigned to them during deployment. It makes sense to put them in service.instance.id as-is, and use resource attributes from Kubernetes semantic conventions to store other details.

@Oberon00
Copy link
Member

I think both an ID that persists over restarts and one that doesn't can make sense to have, for different use cases. Also, the former is harder to impalement while the latter is trivial. There should probably be two different attributes here.

@svrnm
Copy link
Member

svrnm commented Jan 30, 2023

I think both an ID that persists over restarts and one that doesn't can make sense to have, for different use cases.
+1

Agreed, there should be 2 different attributes, the questions remains which one is then "service.instance.id" and which one is something else?

@jsuereth
Copy link
Contributor

I'm actually in favor of having the SDK generate this. The main concern, and others have raised this, is that we don't have a good specification for what to generate. I'm actually looking to tackle that and would be happy to brainstorm with you on what we could do.

@jsuereth
Copy link
Contributor

Copying this here from #3202.

My requirements for a solution to this bug:

  • Require service.instance.id to be generated by every SDK.
  • Outline an algorithm/logic for service.instance.id synthesis in a majority of cases. Allow UUID for unknown scenarios.
  • Allow users to override service.instance.id via configuration. Today that means OTEL_RESOURCE_ATTRIBUTES.
  • Ensure folks using Prometheus scrape/pull-based metrics of any sort w/ OTEL have an opportunity to see consistent service.instance.id. This may actually not be technically feasible

@jsuereth
Copy link
Contributor

@jmacd
Copy link
Contributor

jmacd commented Feb 15, 2023

@jsuereth I think your proposal looks good.

@jsuereth
Copy link
Contributor

Thanks @jmacd. I'll PR-ify it today/tomorrow.

jsuereth added a commit to jsuereth/opentelemetry-specification that referenced this issue Feb 16, 2023
@spedersen-emailage
Copy link

Wanted to revive this. A colleague found this issue as we were researching the use of service.instance.id and the best way to generate one. Has there been any other movement on this question?

@tigrannajaryan
Copy link
Member

Additionally we need to make it clear that later elements of collection pipeline may override the service.instance.id if they have a better value for it.

This is for example is the case in resourcedetection processor in the Collector.

@Oberon00
Copy link
Member

How do the later components know their value is better? Would there be any way to distinguish a user-defined instance.id from an SDK-generated one? Should there be?

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Jul 12, 2023

How do the later components know their value is better?

Judgement call by the component developer or by the end user. If you are certain the data you have is more "correct" than what is coming from the previous component then you overwrite it. In the Collector this is end-user configurable (see for example).

Would there be any way to distinguish a user-defined instance.id from an SDK-generated one? Should there be?

I don't expect there to be a way.

@dyladan
Copy link
Member

dyladan commented May 21, 2024

@jack-berg is this resolved by open-telemetry/semantic-conventions#312?

@dyladan dyladan added the triage:deciding:needs-info Not enough information. Left open to provide the author with time to add more details label May 21, 2024
@jack-berg
Copy link
Member Author

Yes! Closing.

@svrnm svrnm removed the triage:deciding:needs-info Not enough information. Left open to provide the author with time to add more details label Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:resource Related to the specification/resource directory
Projects
None yet
Development

No branches or pull requests