Skip to content

Commit bcadb61

Browse files
authored
Sampling context improvements (#3847)
1 parent 7c70b9c commit bcadb61

File tree

18 files changed

+221
-151
lines changed

18 files changed

+221
-151
lines changed

MIGRATION_GUIDE.md

Lines changed: 103 additions & 96 deletions
Original file line numberDiff line numberDiff line change
@@ -20,102 +20,109 @@ Looking to upgrade from Sentry SDK 2.x to 3.x? Here's a comprehensive list of wh
2020
- Redis integration: In Redis pipeline spans there is no `span["data"]["redis.commands"]` that contains a dict `{"count": 3, "first_ten": ["cmd1", "cmd2", ...]}` but instead `span["data"]["redis.commands.count"]` (containing `3`) and `span["data"]["redis.commands.first_ten"]` (containing `["cmd1", "cmd2", ...]`).
2121
- clickhouse-driver integration: The query is now available under the `db.query.text` span attribute (only if `send_default_pii` is `True`).
2222
- `sentry_sdk.init` now returns `None` instead of a context manager.
23-
- The `sampling_context` argument of `traces_sampler` now additionally contains all span attributes known at span start.
24-
- If you're using the Celery integration, the `sampling_context` argument of `traces_sampler` doesn't contain the `celery_job` dictionary anymore. Instead, the individual keys are now available as:
25-
26-
| Dictionary keys | Sampling context key |
27-
| ---------------------- | -------------------- |
28-
| `celery_job["args"]` | `celery.job.args` |
29-
| `celery_job["kwargs"]` | `celery.job.kwargs` |
30-
| `celery_job["task"]` | `celery.job.task` |
31-
32-
Note that all of these are serialized, i.e., not the original `args` and `kwargs` but rather OpenTelemetry-friendly span attributes.
33-
34-
- If you're using the AIOHTTP integration, the `sampling_context` argument of `traces_sampler` doesn't contain the `aiohttp_request` object anymore. Instead, some of the individual properties of the request are accessible, if available, as follows:
35-
36-
| Request property | Sampling context key(s) |
37-
| ---------------- | ------------------------------- |
38-
| `path` | `url.path` |
39-
| `query_string` | `url.query` |
40-
| `method` | `http.request.method` |
41-
| `host` | `server.address`, `server.port` |
42-
| `scheme` | `url.scheme` |
43-
| full URL | `url.full` |
44-
45-
- If you're using the Tornado integration, the `sampling_context` argument of `traces_sampler` doesn't contain the `tornado_request` object anymore. Instead, some of the individual properties of the request are accessible, if available, as follows:
46-
47-
| Request property | Sampling context key(s) |
48-
| ---------------- | --------------------------------------------------- |
49-
| `path` | `url.path` |
50-
| `query` | `url.query` |
51-
| `protocol` | `url.scheme` |
52-
| `method` | `http.request.method` |
53-
| `host` | `server.address`, `server.port` |
54-
| `version` | `network.protocol.name`, `network.protocol.version` |
55-
| full URL | `url.full` |
56-
57-
- If you're using the generic WSGI integration, the `sampling_context` argument of `traces_sampler` doesn't contain the `wsgi_environ` object anymore. Instead, the individual properties of the environment are accessible, if available, as follows:
58-
59-
| Env property | Sampling context key(s) |
60-
| ----------------- | ------------------------------------------------- |
61-
| `PATH_INFO` | `url.path` |
62-
| `QUERY_STRING` | `url.query` |
63-
| `REQUEST_METHOD` | `http.request.method` |
64-
| `SERVER_NAME` | `server.address` |
65-
| `SERVER_PORT` | `server.port` |
66-
| `SERVER_PROTOCOL` | `server.protocol.name`, `server.protocol.version` |
67-
| `wsgi.url_scheme` | `url.scheme` |
68-
| full URL | `url.full` |
69-
70-
- If you're using the generic ASGI integration, the `sampling_context` argument of `traces_sampler` doesn't contain the `asgi_scope` object anymore. Instead, the individual properties of the scope, if available, are accessible as follows:
71-
72-
| Scope property | Sampling context key(s) |
73-
| -------------- | ------------------------------- |
74-
| `type` | `network.protocol.name` |
75-
| `scheme` | `url.scheme` |
76-
| `path` | `url.path` |
77-
| `query` | `url.query` |
78-
| `http_version` | `network.protocol.version` |
79-
| `method` | `http.request.method` |
80-
| `server` | `server.address`, `server.port` |
81-
| `client` | `client.address`, `client.port` |
82-
| full URL | `url.full` |
83-
84-
- If you're using the RQ integration, the `sampling_context` argument of `traces_sampler` doesn't contain the `rq_job` object anymore. Instead, the individual properties of the job and the queue, if available, are accessible as follows:
85-
86-
| RQ property | Sampling context key(s) |
87-
| --------------- | ---------------------------- |
88-
| `rq_job.args` | `rq.job.args` |
89-
| `rq_job.kwargs` | `rq.job.kwargs` |
90-
| `rq_job.func` | `rq.job.func` |
91-
| `queue.name` | `messaging.destination.name` |
92-
| `rq_job.id` | `messaging.message.id` |
93-
94-
Note that `rq.job.args`, `rq.job.kwargs`, and `rq.job.func` are serialized and not the actual objects on the job.
95-
96-
- If you're using the AWS Lambda integration, the `sampling_context` argument of `traces_sampler` doesn't contain the `aws_event` and `aws_context` objects anymore. Instead, the following, if available, is accessible:
97-
98-
| AWS property | Sampling context key(s) |
99-
| ------------------------------------------- | ----------------------- |
100-
| `aws_event["httpMethod"]` | `http.request.method` |
101-
| `aws_event["queryStringParameters"]` | `url.query` |
102-
| `aws_event["path"]` | `url.path` |
103-
| full URL | `url.full` |
104-
| `aws_event["headers"]["X-Forwarded-Proto"]` | `network.protocol.name` |
105-
| `aws_event["headers"]["Host"]` | `server.address` |
106-
| `aws_context["function_name"]` | `faas.name` |
107-
108-
- If you're using the GCP integration, the `sampling_context` argument of `traces_sampler` doesn't contain the `gcp_env` and `gcp_event` keys anymore. Instead, the following, if available, is accessible:
109-
110-
| Old sampling context key | New sampling context key |
111-
| --------------------------------- | -------------------------- |
112-
| `gcp_env["function_name"]` | `faas.name` |
113-
| `gcp_env["function_region"]` | `faas.region` |
114-
| `gcp_env["function_project"]` | `gcp.function.project` |
115-
| `gcp_env["function_identity"]` | `gcp.function.identity` |
116-
| `gcp_env["function_entry_point"]` | `gcp.function.entry_point` |
117-
| `gcp_event.method` | `http.request.method` |
118-
| `gcp_event.query_string` | `url.query` |
23+
- The `sampling_context` argument of `traces_sampler` and `profiles_sampler` now additionally contains all span attributes known at span start.
24+
- The integration-specific content of the `sampling_context` argument of `traces_sampler` and `profiles_sampler` now looks different.
25+
- The Celery integration doesn't add the `celery_job` dictionary anymore. Instead, the individual keys are now available as:
26+
27+
| Dictionary keys | Sampling context key | Example |
28+
| ---------------------- | --------------------------- | ------------------------------ |
29+
| `celery_job["args"]` | `celery.job.args.{index}` | `celery.job.args.0` |
30+
| `celery_job["kwargs"]` | `celery.job.kwargs.{kwarg}` | `celery.job.kwargs.kwarg_name` |
31+
| `celery_job["task"]` | `celery.job.task` | |
32+
33+
Note that all of these are serialized, i.e., not the original `args` and `kwargs` but rather OpenTelemetry-friendly span attributes.
34+
35+
- The AIOHTTP integration doesn't add the `aiohttp_request` object anymore. Instead, some of the individual properties of the request are accessible, if available, as follows:
36+
37+
| Request property | Sampling context key(s) |
38+
| ----------------- | ------------------------------- |
39+
| `path` | `url.path` |
40+
| `query_string` | `url.query` |
41+
| `method` | `http.request.method` |
42+
| `host` | `server.address`, `server.port` |
43+
| `scheme` | `url.scheme` |
44+
| full URL | `url.full` |
45+
| `request.headers` | `http.request.header.{header}` |
46+
47+
- The Tornado integration doesn't add the `tornado_request` object anymore. Instead, some of the individual properties of the request are accessible, if available, as follows:
48+
49+
| Request property | Sampling context key(s) |
50+
| ----------------- | --------------------------------------------------- |
51+
| `path` | `url.path` |
52+
| `query` | `url.query` |
53+
| `protocol` | `url.scheme` |
54+
| `method` | `http.request.method` |
55+
| `host` | `server.address`, `server.port` |
56+
| `version` | `network.protocol.name`, `network.protocol.version` |
57+
| full URL | `url.full` |
58+
| `request.headers` | `http.request.header.{header}` |
59+
60+
- The WSGI integration doesn't add the `wsgi_environ` object anymore. Instead, the individual properties of the environment are accessible, if available, as follows:
61+
62+
| Env property | Sampling context key(s) |
63+
| ----------------- | ------------------------------------------------- |
64+
| `PATH_INFO` | `url.path` |
65+
| `QUERY_STRING` | `url.query` |
66+
| `REQUEST_METHOD` | `http.request.method` |
67+
| `SERVER_NAME` | `server.address` |
68+
| `SERVER_PORT` | `server.port` |
69+
| `SERVER_PROTOCOL` | `server.protocol.name`, `server.protocol.version` |
70+
| `wsgi.url_scheme` | `url.scheme` |
71+
| full URL | `url.full` |
72+
| `HTTP_*` | `http.request.header.{header}` |
73+
74+
- The ASGI integration doesn't add the `asgi_scope` object anymore. Instead, the individual properties of the scope, if available, are accessible as follows:
75+
76+
| Scope property | Sampling context key(s) |
77+
| -------------- | ------------------------------- |
78+
| `type` | `network.protocol.name` |
79+
| `scheme` | `url.scheme` |
80+
| `path` | `url.path` |
81+
| `query` | `url.query` |
82+
| `http_version` | `network.protocol.version` |
83+
| `method` | `http.request.method` |
84+
| `server` | `server.address`, `server.port` |
85+
| `client` | `client.address`, `client.port` |
86+
| full URL | `url.full` |
87+
| `headers` | `http.request.header.{header}` |
88+
89+
-The RQ integration doesn't add the `rq_job` object anymore. Instead, the individual properties of the job and the queue, if available, are accessible as follows:
90+
91+
| RQ property | Sampling context key | Example |
92+
| --------------- | ---------------------------- | ---------------------- |
93+
| `rq_job.args` | `rq.job.args.{index}` | `rq.job.args.0` |
94+
| `rq_job.kwargs` | `rq.job.kwargs.{kwarg}` | `rq.job.args.my_kwarg` |
95+
| `rq_job.func` | `rq.job.func` | |
96+
| `queue.name` | `messaging.destination.name` | |
97+
| `rq_job.id` | `messaging.message.id` | |
98+
99+
Note that `rq.job.args`, `rq.job.kwargs`, and `rq.job.func` are serialized and not the actual objects on the job.
100+
101+
- The AWS Lambda integration doesn't add the `aws_event` and `aws_context` objects anymore. Instead, the following, if available, is accessible:
102+
103+
| AWS property | Sampling context key(s) |
104+
| ------------------------------------------- | ------------------------------- |
105+
| `aws_event["httpMethod"]` | `http.request.method` |
106+
| `aws_event["queryStringParameters"]` | `url.query` |
107+
| `aws_event["path"]` | `url.path` |
108+
| full URL | `url.full` |
109+
| `aws_event["headers"]["X-Forwarded-Proto"]` | `network.protocol.name` |
110+
| `aws_event["headers"]["Host"]` | `server.address` |
111+
| `aws_context["function_name"]` | `faas.name` |
112+
| `aws_event["headers"]` | `http.request.headers.{header}` |
113+
114+
- The GCP integration doesn't add the `gcp_env` and `gcp_event` keys anymore. Instead, the following, if available, is accessible:
115+
116+
| Old sampling context key | New sampling context key |
117+
| --------------------------------- | ------------------------------ |
118+
| `gcp_env["function_name"]` | `faas.name` |
119+
| `gcp_env["function_region"]` | `faas.region` |
120+
| `gcp_env["function_project"]` | `gcp.function.project` |
121+
| `gcp_env["function_identity"]` | `gcp.function.identity` |
122+
| `gcp_env["function_entry_point"]` | `gcp.function.entry_point` |
123+
| `gcp_event.method` | `http.request.method` |
124+
| `gcp_event.query_string` | `url.query` |
125+
| `gcp_event.headers` | `http.request.header.{header}` |
119126

120127

121128
### Removed

sentry_sdk/integrations/_wsgi_common.py

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
import sentry_sdk
55
from sentry_sdk.scope import should_send_default_pii
6-
from sentry_sdk.utils import AnnotatedValue, logger
6+
from sentry_sdk.utils import AnnotatedValue, logger, SENSITIVE_DATA_SUBSTITUTE
77

88
try:
99
from django.http.request import RawPostDataException
@@ -221,6 +221,20 @@ def _filter_headers(headers):
221221
}
222222

223223

224+
def _request_headers_to_span_attributes(headers):
225+
# type: (dict[str, str]) -> dict[str, str]
226+
attributes = {}
227+
228+
headers = _filter_headers(headers)
229+
230+
for header, value in headers.items():
231+
if isinstance(value, AnnotatedValue):
232+
value = SENSITIVE_DATA_SUBSTITUTE
233+
attributes[f"http.request.header.{header.lower()}"] = value
234+
235+
return attributes
236+
237+
224238
def _in_http_status_code_range(code, code_ranges):
225239
# type: (object, list[HttpStatusCodeRange]) -> bool
226240
for target in code_ranges:

sentry_sdk/integrations/aiohttp.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
from sentry_sdk.sessions import track_session
1414
from sentry_sdk.integrations._wsgi_common import (
1515
_filter_headers,
16+
_request_headers_to_span_attributes,
1617
request_body_within_bounds,
1718
)
1819
from sentry_sdk.tracing import (
@@ -389,11 +390,11 @@ def _prepopulate_attributes(request):
389390
except ValueError:
390391
attributes["server.address"] = request.host
391392

392-
try:
393+
with capture_internal_exceptions():
393394
url = f"{request.scheme}://{request.host}{request.path}" # noqa: E231
394395
if request.query_string:
395396
attributes["url.full"] = f"{url}?{request.query_string}"
396-
except Exception:
397-
pass
397+
398+
attributes.update(_request_headers_to_span_attributes(dict(request.headers)))
398399

399400
return attributes

sentry_sdk/integrations/asgi.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
)
2222
from sentry_sdk.integrations._wsgi_common import (
2323
DEFAULT_HTTP_METHODS_TO_CAPTURE,
24+
_request_headers_to_span_attributes,
2425
)
2526
from sentry_sdk.sessions import track_session
2627
from sentry_sdk.tracing import (
@@ -32,6 +33,7 @@
3233
)
3334
from sentry_sdk.utils import (
3435
ContextVar,
36+
capture_internal_exceptions,
3537
event_from_exception,
3638
HAS_REAL_CONTEXTVARS,
3739
CONTEXTVARS_ERROR_MESSAGE,
@@ -348,19 +350,20 @@ def _prepopulate_attributes(scope):
348350
try:
349351
host, port = scope[attr]
350352
attributes[f"{attr}.address"] = host
351-
attributes[f"{attr}.port"] = port
353+
if port is not None:
354+
attributes[f"{attr}.port"] = port
352355
except Exception:
353356
pass
354357

355-
try:
358+
with capture_internal_exceptions():
356359
full_url = _get_url(scope)
357360
query = _get_query(scope)
358361
if query:
359362
attributes["url.query"] = query
360363
full_url = f"{full_url}?{query}"
361364

362365
attributes["url.full"] = full_url
363-
except Exception:
364-
pass
366+
367+
attributes.update(_request_headers_to_span_attributes(_get_headers(scope)))
365368

366369
return attributes

sentry_sdk/integrations/aws_lambda.py

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,10 @@
2020
reraise,
2121
)
2222
from sentry_sdk.integrations import Integration
23-
from sentry_sdk.integrations._wsgi_common import _filter_headers
23+
from sentry_sdk.integrations._wsgi_common import (
24+
_filter_headers,
25+
_request_headers_to_span_attributes,
26+
)
2427

2528
from typing import TYPE_CHECKING
2629

@@ -162,7 +165,7 @@ def sentry_handler(aws_event, aws_context, *args, **kwargs):
162165
name=aws_context.function_name,
163166
source=TRANSACTION_SOURCE_COMPONENT,
164167
origin=AwsLambdaIntegration.origin,
165-
attributes=_prepopulate_attributes(aws_event, aws_context),
168+
attributes=_prepopulate_attributes(request_data, aws_context),
166169
):
167170
try:
168171
return handler(aws_event, aws_context, *args, **kwargs)
@@ -468,6 +471,7 @@ def _event_from_error_json(error_json):
468471

469472

470473
def _prepopulate_attributes(aws_event, aws_context):
474+
# type: (Any, Any) -> dict[str, Any]
471475
attributes = {
472476
"cloud.provider": "aws",
473477
}
@@ -486,10 +490,15 @@ def _prepopulate_attributes(aws_event, aws_context):
486490
url += f"?{aws_event['queryStringParameters']}"
487491
attributes["url.full"] = url
488492

489-
headers = aws_event.get("headers") or {}
493+
headers = {}
494+
if aws_event.get("headers") and isinstance(aws_event["headers"], dict):
495+
headers = aws_event["headers"]
496+
490497
if headers.get("X-Forwarded-Proto"):
491498
attributes["network.protocol.name"] = headers["X-Forwarded-Proto"]
492499
if headers.get("Host"):
493500
attributes["server.address"] = headers["Host"]
494501

502+
attributes.update(_request_headers_to_span_attributes(headers))
503+
495504
return attributes

sentry_sdk/integrations/celery/__init__.py

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,6 @@
2020
ensure_integration_enabled,
2121
event_from_exception,
2222
reraise,
23-
_serialize_span_attribute,
2423
)
2524

2625
from typing import TYPE_CHECKING
@@ -514,9 +513,17 @@ def sentry_publish(self, *args, **kwargs):
514513

515514

516515
def _prepopulate_attributes(task, args, kwargs):
516+
# type: (Any, *Any, **Any) -> dict[str, str]
517517
attributes = {
518518
"celery.job.task": task.name,
519-
"celery.job.args": _serialize_span_attribute(args),
520-
"celery.job.kwargs": _serialize_span_attribute(kwargs),
521519
}
520+
521+
for i, arg in enumerate(args):
522+
with capture_internal_exceptions():
523+
attributes[f"celery.job.args.{i}"] = str(arg)
524+
525+
for kwarg, value in kwargs.items():
526+
with capture_internal_exceptions():
527+
attributes[f"celery.job.kwargs.{kwarg}"] = str(value)
528+
522529
return attributes

0 commit comments

Comments
 (0)