feat: lambda support for DSM #622

michael-zhao459 · 2025-06-20T14:18:25Z

What does this PR do?

This PR adds lambda support for Data Streams Monitoring (DSM) and reworks the original implementation.

DSM context is passed through the trace propagation headers, code is refactored to use existing extraction logic (deleted dsm.py, reinventing the wheel here).
If DSM is enabled, add custom DSM logic to the extracted context afterwards

Motivation

Remove redundant code. DSM customers wanted to have Lambda support, currently context is not propagated correctly with lambdas.

Testing Guidelines

Test case	Expected outcome	SQS	SNS	SNS -> SQS (arn from SQS is used)	Kinesis
Datadog Context propagated properly through stringValue	Data streams context propagated & data streams checkpoint set	✅	✅	✅	DNE
Datadog Context propagated properly through binaryValue	Data streams context propagated & data streams checkpoint set	✅	✅	✅	✅
No _datadog message attribute	Checkpoint set, no context propagation	✅	✅	✅	✅
Empty datadog message attribute	Checkpoint set, no context propagation	✅	✅	✅	✅
No data streams context in _datadog message attribute	Checkpoint set, no context propagation	✅	✅	✅	✅
Invalid datadog message attribute	Checkpoint set, no context propagation, debug logger called	✅	✅	✅	✅
source_arn is not found	No checkpoint set	✅	✅	✅	✅
Data streams disabled	No checkpoint set	✅	✅	✅	✅

The tests go through all of the SQS case first, then all of the SNS case, then all of the SNS->SQS case, then all of the Kinesis case

Additional Notes

Types of Changes

Bug fix
New feature
Breaking change
Misc (docs, refactoring, dependency upgrade, etc.)

Check all that apply

This PR's description is comprehensive
This PR contains breaking changes that are documented in the description
This PR introduces new APIs or parameters that are documented and unlikely to change in the foreseeable future
This PR impacts documentation, and it has been updated (or a ticket has been logged)
This PR's changes are covered by the automated tests
This PR collects user input/sensitive content into Datadog
This PR passes the integration tests (ask a Datadog member to run the tests)

datadog_lambda/tracing.py

tests/test_dsm.py

datadog_lambda/tracing.py

datadog_lambda/wrapper.py

datadog_lambda/tracing.py

michael-zhao459 · 2025-06-25T12:39:25Z

datadog_lambda/tracing.py

+                if config.data_streams_enabled:
+                    from ddtrace.data_streams import PROPAGATION_KEY_BASE_64
+
+                    data_streams_ctx = {


I know the else is redundant but datadog gets mad if i just do the if too many indents

piochelepiotr · 2025-06-25T12:42:11Z

datadog_lambda/tracing.py

    except Exception as e:
        logger.debug("The trace extractor returned with error %s", e)
-        return extract_context_from_lambda_context(lambda_context)
+        return extract_context_from_lambda_context(lambda_context), None


we should not return None here

joeyzhao2018 · 2025-06-26T11:47:45Z

datadog_lambda/tracing.py

@@ -265,15 +265,27 @@ def extract_context_from_sqs_or_sns_event_or_context(event, lambda_context):
            if dd_json_data:
                dd_data = json.loads(dd_json_data)

+                data_streams_ctx = {}
+                if config.data_streams_enabled:
+                    from ddtrace.data_streams import PROPAGATION_KEY_BASE_64


My main concerns are

Creating dictionary objects and bound methods for every invocation is inefficient.

It is very hard to follow the logic and hard to maintain and may introduce unexpected behaviors that are hard to debug in the future.

May I suggest the following alternative implementation? Let me know what do you think.

def _create_dsm_carrier_func(dd_data): """Create a carrier function for DSM context extraction.""" def carrier_get(key): return dd_data.get(key) if dd_data else None return carrier_get # then in In the extraction functions: if config.data_streams_enabled: dsm_carrier = _create_dsm_carrier_func(dd_data) # Pass the original dd_data else: dsm_carrier = None

I agree with the justifications you made for this change will change the code now!

datadog_lambda/tracing.py

joeyzhao2018

LGTM

joeyzhao2018

LGTM

datadog_lambda/wrapper.py

datadog_lambda/tracing.py

piochelepiotr · 2025-07-08T15:39:20Z

pyproject.toml

@@ -28,7 +28,7 @@ classifiers = [
 python = ">=3.8.0,<4"
 datadog = ">=0.51.0,<1.0.0"
 wrapt = "^1.11.2"
-ddtrace = ">=2.20.0,<4"
+ddtrace = ">=3.10.0"


this changes the major version. Is that what we want to do? Also, should we keep <4?

My mistake on the <4. The code will break without ddtrace version 3.10.0.

piochelepiotr · 2025-07-08T15:40:20Z

tests/test_tracing.py

+        )
+
+    @patch("datadog_lambda.tracing._dsm_set_checkpoint")
+    def test_sqs_incorrect_datadog_message_attribute(self, mock_dsm_set_checkpoint):


nit: incorrect -> invalid

piochelepiotr · 2025-07-08T15:42:31Z

tests/test_tracing.py

+
+    @patch("datadog_lambda.tracing._dsm_set_checkpoint")
+    @patch("datadog_lambda.tracing.logger")
+    def test_sqs_invalid_datadog_message_attribute_raises_exception(


nit: remove raises_exception from the test name. We care about what the test case tests, not the logic under the hood. Maybe in the future, the code won't raise an exception, and it's still OK because the function accepts invalid datadog message attributes.

piochelepiotr · 2025-07-08T15:44:33Z

tests/test_tracing.py

+            event, self.lambda_context, parse_event_source(event)
+        )
+
+        mock_dsm_set_checkpoint.assert_called_once_with(None, "sqs", "")


I don't understand, are we testing mock_dsm_set_checkpoint, or self.mock_checkpoint. We should test only one of them, and it should be consistent across all tests. If possible, it's better to test the lower level one (self.mock_checkpoint)

datadog-datadog-prod-us1 bot reviewed Jun 20, 2025

View reviewed changes

datadog_lambda/tracing.py Outdated Show resolved Hide resolved

purple4reina reviewed Jun 20, 2025

View reviewed changes