Skip to content

Conversation

@therealvio
Copy link

@therealvio therealvio commented Jan 15, 2025

Purpose 🎯

These changes replace the standard library logging with structured logging using structlog. Structured logging provides easier-to-parse capabilities for toolings and humans when going through logs.

Context 🧠

  • Part 1 to satisfy Suggestion: Add Structured logging #76
    • I intend to contribute a follow-up PR that adds logging all connection test events for the purpose of fostering greater support for SLIs, as mentioned in this comment. For those unfamiliar, calculating the SLI is a matter of bad events/all valid events, and right now aws-alternat exercises a kind of "no news is good news" reporting.

Notes 📓

  • I have performed an in-place replacement of the default logger
  • No changes to the message contents have been, with some exceptions that have been explicitly pulled out into standalone commits for ease of reviewing.
  • Example screenshot of debug log messages through structlog
    image

Comment on lines 31 to 37
logger = logging.getLogger()
logger.setLevel(logging.INFO)
logging.getLogger('boto3').setLevel(logging.CRITICAL)
logging.getLogger('botocore').setLevel(logging.CRITICAL)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this guide with regards to working with the standard library, but I wasn't sure how I could make this applicable here. Otherwise, third party deps will continue using the standard logger unless someone can figure out if wrapping the logger is possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My colleague suggests that we could use https://nhairs.github.io/python-json-logger/latest/ as the formatter for these handlers. e.g. something like:

handler = logging.StreamHandler()
handler.setFormatter(JsonFormatter())
logging.getLogger('boto3').addHandler(handler)
logging.getLogger('botocore').addHandler(handler)

Do you think this would work?

Copy link
Author

@therealvio therealvio Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so! This would align with that guide, the section specifically being: "Rendering using logging-based formatters".

I will give it a go and push out a separate commit for this. May take me a bit because it's going to involve tweaking the configs based on what I have now, vs what the guide I am looking at is suggesting.

Copy link
Author

@therealvio therealvio Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An FYI: In the change, handler is a loaded term, since that's the entry point of the function. For now, I have opted for loghandler and we can come up with a better nameas we iterate and review.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have applied the suggestion regarding python-json-logger for stdlib calls within third party deps :)

Thanks for your patience!

@therealvio therealvio marked this pull request as ready for review January 15, 2025 07:59
@therealvio therealvio requested a review from a team as a code owner January 15, 2025 07:59
@therealvio therealvio changed the title feat: use structured logging with structlog feat: replace standard logging with structured logging Jan 15, 2025
@bwhaley
Copy link
Collaborator

bwhaley commented Jan 21, 2025

Thank you for the PR! I am a bit busy atm, but I hope to review & discuss in the coming days. Appreciate the patience!

Copy link
Collaborator

@bwhaley bwhaley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple comments/ideas to consider.

Mind taking a peek at test_replace_route.py and adding a unit test?

}
try:
logger.info("Replacing existing route %s for route table %s", route_table_id, new_route_table)
slogger.info("Replacing existing route %s for route table %s", route_table_id, new_route_table)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line prints a dictionary (new_route_table), which would end up nested in json. The message here is sorta broken - "existing route {route_table_id}" is wrong, and then it prints the whole route table.

Let's update the message so it's more accurate and doesn't include a nested dictionary. It will print more cleanly in the output. It does break backward compatibility for folks monitoring for this specific phrasing, which is too bad, but worthwhile to fix. May as well do it now.

Suggested change
slogger.info("Replacing existing route %s for route table %s", route_table_id, new_route_table)
slogger.info("Updating route table %s to use NAT Gateway ID %s", route_table_id, nat_gateway_id)

Copy link
Collaborator

@bwhaley bwhaley Jan 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, most folks should be monitoring for ERROR status or the "Failed connectivity tests! Replacing route" message. This one is INFO.

Copy link
Author

@therealvio therealvio Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. I will break this out into a separate commit.

Also, most folks should be monitoring for ERROR status or the "Failed connectivity tests! Replacing route" message. This one is INFO.

You'd think it would go without saying. Perhaps in the release we could include that as a sub dot-point? It feels weird having to tell people "how" to monitor things, but mistakes happen.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made this into a standalone commit including a breaking change footer for the commit.

I had to make some tweaks to your suggested log line because NAT Gateway ID assumes only NAT gateway IDs are being fed in. When the replace_route() function is called, it can use either a NAT gateway or EC2 instance ID. So instead, I reworded it as: Updating route table %s to use NAT target %s".

)

# logger is still needed to set the level for dependencies
logger = logging.getLogger()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line still needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snippet is selecting 4 lines, I gather this may be for the comment.

This relates to this comment regarding third party dependencies. structlog doesn't appear to completely drop-in to replace logger in this specific area. If it's not valuable, I am happy to drop it. My intention here was to describe why logger is still being used while slogger is in place.

I don't believe any of my commits, or this PR actually describes why they're staying, so a git log wouldn't help anyone in that area :(. I don't know if the commit will be a squash or not, but if it's a merge commit, then I can amend the commit message where the surrounding change happened if it helps?

Copy link
Collaborator

@bwhaley bwhaley Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm confused about is that logger = logging.getLogger() is still set here but is never used throughout the file. The levels are set using logging.getLogger below. But is the logger object still used in some way? I may be misunderstanding.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back on this, and recollecting:

What's going on here is described in the structlog docs regarding configuring the stdlib logger being used by third party dependencies like boto and botocore that use stdlib logger. In this case, we are configuring the log level for these dependencies.

I believe we solved the JSON formatting problem for a similar problem in this commentary thread. This is similar in nature.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that makes sense.

Comment on lines 31 to 37
logger = logging.getLogger()
logger.setLevel(logging.INFO)
logging.getLogger('boto3').setLevel(logging.CRITICAL)
logging.getLogger('botocore').setLevel(logging.CRITICAL)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My colleague suggests that we could use https://nhairs.github.io/python-json-logger/latest/ as the formatter for these handlers. e.g. something like:

handler = logging.StreamHandler()
handler.setFormatter(JsonFormatter())
logging.getLogger('boto3').addHandler(handler)
logging.getLogger('botocore').addHandler(handler)

Do you think this would work?

@therealvio
Copy link
Author

therealvio commented Jan 28, 2025

Thanks for the feedback thus far, I will go through these as I get time, worse comes to worse it may not be until next week if that's cool?

@therealvio
Copy link
Author

therealvio commented Jan 30, 2025

Mind taking a peek at test_replace_route.py and adding a unit test?

I may be missing something, was there something in particular you were looking for me to add?

Full transparency here: I never got the test suite to run, mostly because I never did testing in the Python ecosystem before. I am more than keen to check out any material that would be of help here. As long as I can get it running, I can extend the test suite without a fuss :)

For what it is worth: I mostly tested through running the SAM app locally, which served me well enough for this PR.

@bwhaley
Copy link
Collaborator

bwhaley commented Feb 4, 2025

The two things to add would be tests and the potential use of python-json-logger.

You can look at the GitHub workflow to see how the tests run. It's basically this

          pip install pip --upgrade
          pip install pyopenssl --upgrade
          pip install -r functions/replace-route/requirements.txt
          pip install -r functions/replace-route/tests/test_requirements.txt
          python -m pytest

@therealvio
Copy link
Author

To keep you in the loop: I am going to be slammed next week, and I am not sure if I will get time to take a look this week.

This is still on my mind, it just may be a while before I get back to it.

@therealvio therealvio force-pushed the feat/use-structured-logging branch from 53c2e62 to 79ef09b Compare November 11, 2025 00:06
therealvio and others added 4 commits November 11, 2025 14:36
This changes removes the addition of the eventPayload key that was
imposed in the previous commit. I would rather leave the decision to the
maintainers.

Whether it be reverting *this* commit, or accepting this one, or leaving
it to the future PR.
It's been a while since this branch and related item of work has been
picked up. So we may as well update the dependencies and do a
soft-restart here.
@therealvio
Copy link
Author

therealvio commented Nov 11, 2025

Hey @bwhaley, I am picking this back up after some time. I have made changes based on commentary though what are you looking for in the tests specifically that you would like added? I am happy to put something in, though I am not sure what we are looking to test for specifically.

Are you looking for something like what is described in this stack overflow thread about testing logging output, testing the structlog configuration and asserting against it using a basic test with some snippets, or something else?

Thanks!

@therealvio therealvio force-pushed the feat/use-structured-logging branch from 2bf4e2a to 670c84b Compare November 11, 2025 04:24
@therealvio therealvio requested a review from bwhaley November 11, 2025 04:24
Copy link
Collaborator

@bwhaley bwhaley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple more comments. Kicking off tests.

)

# logger is still needed to set the level for dependencies
logger = logging.getLogger()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that makes sense.

Comment on lines -357 to +384
logger.error(f"Unable to handle unknown event type: {json.dumps(event)}")
slogger.error("Unable to handle unknown event type: %s", json.dumps(event))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this appear in the logs? Is the JSON value of event legible in the output? This seems like an unlikely error, and I'm not sure how to easily test it, but I would be curious if it's readable.

Copy link
Collaborator

@bwhaley bwhaley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of structlog and orjson and python-json-logger actually adds a fair amount of bloat to the package. Currently, only boto3 is needed, which is included by default in the Lambda runtime. The others are not, so they need to be added as a layer to the Lambda function.

How did you test it?

While updating the logging to switch to structured logging, feedback was
provided that identified a case of inaccurate and broken logging when
`replace_route()` is invoked. Not only is the log message wrong by using
a route table id instead of the route id, but also the use of the
`new_route_table` variable would produce a dictionary in the log.

The log line has now been updated to accurately state what is going on,
and remove the use of a nested dictionary in the structured log.

BREAKING CHANGE: Users monitoring this message with the original
specific phrasing will experience broken monitors. It is recommended
that users should be monitoring generally for ERROR level logs,
or the "Failed connectivity tests! Replacing route" logs specifically.
@therealvio therealvio force-pushed the feat/use-structured-logging branch from 670c84b to 258b403 Compare December 18, 2025 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants