Skip to content

Boilerplate warning for config options that are very likely to capture sensitive and PII data #2713

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
trask opened this issue May 1, 2025 · 9 comments
Labels
triage:deciding This issue needs more discussion or consideration.

Comments

@trask
Copy link
Member

trask commented May 1, 2025

E.g. when documenting a config option to capture all database query parameters, how far should we go in warning the user about the potential repercussions?

Is something like this good?

WARNING: captured query parameters may contain sensitive information such as passwords, personally identifiable information or protected health info.

Or do we need to go even further? e.g.

WARNING: captured query parameters may contain sensitive information such as passwords, personally identifiable information or protected health info. Exposing such info may result in substantial fines and penalties or criminal liability. Consult your peers, superiors and a legal counsel before enabling this option.

@svrnm
Copy link
Member

svrnm commented May 2, 2025

I think we should go further or say something like "we are not lawyers, but that could happen"

I suggested that a while back and still owe it to be written, but if we would have a page on privacy and sensitive data on the website we could link to that, e.g. https://opentelemetry.io/docs/security/collecting- sensitive-data

cc @open-telemetry/docs-approvers

@tiffany76
Copy link
Contributor

Not related to query params exactly, but I'm adding for viz:

We have two small mentions about PII in the Collector security docs, but we don't include any clear warnings.

@chalin
Copy link
Contributor

chalin commented May 2, 2025

I'd vote for something short and simple, and a link to more info.

@chalin
Copy link
Contributor

chalin commented May 2, 2025

Ideally we'd like the same text used uniformly across the docs (and kept the same as it changes over time). On the website side of things we can handle that via includes. Not sure how to handle it in the repos like the spec, etc. We might need to brainstorm about that.

@svrnm
Copy link
Member

svrnm commented May 9, 2025

Working on open-telemetry/opentelemetry.io#6850

@jpkrohling jpkrohling changed the title Biolerplate warning for config options that are very likely to capture sensitive and PII data Boilerplate warning for config options that are very likely to capture sensitive and PII data May 12, 2025
@jpkrohling jpkrohling added the triage:deciding This issue needs more discussion or consideration. label May 12, 2025
@mtwo
Copy link
Member

mtwo commented May 13, 2025

First option looks good, the second is a bit much

@danielgblanco
Copy link
Contributor

I think option 1 for individual config options sounds good. However I think we should probably have a doc about OpenTelemetry and personal data, in general. Something as simple as explaining that OpenTelemetry tooling does not store personal data (because it doesn't store data), how semconv and instrumentation libraries default to not capturing data about fields that likely contain personal data unless configured so by the user, how we provide protocols for in-transit encryption of data, and how ultimately OpenTelemetry instrumentation libraries or custom instrumentation may capture personal data if configured to do so, which can fall under regulations and regulatory frameworks depending on the country, and which the user of the library is responsible for.

@svrnm
Copy link
Member

svrnm commented May 28, 2025

I think option 1 for individual config options sounds good. However I think we should probably have a doc about OpenTelemetry and personal data, in general. Something as simple as explaining that OpenTelemetry tooling does not store personal data (because it doesn't store data), how semconv and instrumentation libraries default to not capturing data about fields that likely contain personal data unless configured so by the user, how we provide protocols for in-transit encryption of data, and how ultimately OpenTelemetry instrumentation libraries or custom instrumentation may capture personal data if configured to do so, which can fall under regulations and regulatory frameworks depending on the country, and which the user of the library is responsible for.

something like https://opentelemetry.io/docs/security/handling-sensitive-data/?

@danielgblanco
Copy link
Contributor

I missed your comment above! Thanks @svrnm. I agree with what seems to be the consensus above, which is something short and simple when documenting config options, and a link to that doc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage:deciding This issue needs more discussion or consideration.
Projects
None yet
Development

No branches or pull requests

7 participants