-
Notifications
You must be signed in to change notification settings - Fork 86
feat: replace standard logging with structured logging #122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| logger = logging.getLogger() | ||
| logger.setLevel(logging.INFO) | ||
| logging.getLogger('boto3').setLevel(logging.CRITICAL) | ||
| logging.getLogger('botocore').setLevel(logging.CRITICAL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this guide with regards to working with the standard library, but I wasn't sure how I could make this applicable here. Otherwise, third party deps will continue using the standard logger unless someone can figure out if wrapping the logger is possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My colleague suggests that we could use https://nhairs.github.io/python-json-logger/latest/ as the formatter for these handlers. e.g. something like:
handler = logging.StreamHandler()
handler.setFormatter(JsonFormatter())
logging.getLogger('boto3').addHandler(handler)
logging.getLogger('botocore').addHandler(handler)
Do you think this would work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so! This would align with that guide, the section specifically being: "Rendering using logging-based formatters".
I will give it a go and push out a separate commit for this. May take me a bit because it's going to involve tweaking the configs based on what I have now, vs what the guide I am looking at is suggesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An FYI: In the change, handler is a loaded term, since that's the entry point of the function. For now, I have opted for loghandler and we can come up with a better nameas we iterate and review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have applied the suggestion regarding python-json-logger for stdlib calls within third party deps :)
Thanks for your patience!
|
Thank you for the PR! I am a bit busy atm, but I hope to review & discuss in the coming days. Appreciate the patience! |
bwhaley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple comments/ideas to consider.
Mind taking a peek at test_replace_route.py and adding a unit test?
functions/replace-route/app.py
Outdated
| } | ||
| try: | ||
| logger.info("Replacing existing route %s for route table %s", route_table_id, new_route_table) | ||
| slogger.info("Replacing existing route %s for route table %s", route_table_id, new_route_table) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line prints a dictionary (new_route_table), which would end up nested in json. The message here is sorta broken - "existing route {route_table_id}" is wrong, and then it prints the whole route table.
Let's update the message so it's more accurate and doesn't include a nested dictionary. It will print more cleanly in the output. It does break backward compatibility for folks monitoring for this specific phrasing, which is too bad, but worthwhile to fix. May as well do it now.
| slogger.info("Replacing existing route %s for route table %s", route_table_id, new_route_table) | |
| slogger.info("Updating route table %s to use NAT Gateway ID %s", route_table_id, nat_gateway_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, most folks should be monitoring for ERROR status or the "Failed connectivity tests! Replacing route" message. This one is INFO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries. I will break this out into a separate commit.
Also, most folks should be monitoring for ERROR status or the "Failed connectivity tests! Replacing route" message. This one is INFO.
You'd think it would go without saying. Perhaps in the release we could include that as a sub dot-point? It feels weird having to tell people "how" to monitor things, but mistakes happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made this into a standalone commit including a breaking change footer for the commit.
I had to make some tweaks to your suggested log line because NAT Gateway ID assumes only NAT gateway IDs are being fed in. When the replace_route() function is called, it can use either a NAT gateway or EC2 instance ID. So instead, I reworded it as: Updating route table %s to use NAT target %s".
| ) | ||
|
|
||
| # logger is still needed to set the level for dependencies | ||
| logger = logging.getLogger() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this line still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The snippet is selecting 4 lines, I gather this may be for the comment.
This relates to this comment regarding third party dependencies. structlog doesn't appear to completely drop-in to replace logger in this specific area. If it's not valuable, I am happy to drop it. My intention here was to describe why logger is still being used while slogger is in place.
I don't believe any of my commits, or this PR actually describes why they're staying, so a git log wouldn't help anyone in that area :(. I don't know if the commit will be a squash or not, but if it's a merge commit, then I can amend the commit message where the surrounding change happened if it helps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm confused about is that logger = logging.getLogger() is still set here but is never used throughout the file. The levels are set using logging.getLogger below. But is the logger object still used in some way? I may be misunderstanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking back on this, and recollecting:
What's going on here is described in the structlog docs regarding configuring the stdlib logger being used by third party dependencies like boto and botocore that use stdlib logger. In this case, we are configuring the log level for these dependencies.
I believe we solved the JSON formatting problem for a similar problem in this commentary thread. This is similar in nature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that makes sense.
| logger = logging.getLogger() | ||
| logger.setLevel(logging.INFO) | ||
| logging.getLogger('boto3').setLevel(logging.CRITICAL) | ||
| logging.getLogger('botocore').setLevel(logging.CRITICAL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My colleague suggests that we could use https://nhairs.github.io/python-json-logger/latest/ as the formatter for these handlers. e.g. something like:
handler = logging.StreamHandler()
handler.setFormatter(JsonFormatter())
logging.getLogger('boto3').addHandler(handler)
logging.getLogger('botocore').addHandler(handler)
Do you think this would work?
|
Thanks for the feedback thus far, I will go through these as I get time, worse comes to worse it may not be until next week if that's cool? |
I may be missing something, was there something in particular you were looking for me to add? Full transparency here: I never got the test suite to run, mostly because I never did testing in the Python ecosystem before. I am more than keen to check out any material that would be of help here. As long as I can get it running, I can extend the test suite without a fuss :) For what it is worth: I mostly tested through running the SAM app locally, which served me well enough for this PR. |
|
The two things to add would be tests and the potential use of python-json-logger. You can look at the GitHub workflow to see how the tests run. It's basically this |
|
To keep you in the loop: I am going to be slammed next week, and I am not sure if I will get time to take a look this week. This is still on my mind, it just may be a while before I get back to it. |
53c2e62 to
79ef09b
Compare
This changes removes the addition of the eventPayload key that was imposed in the previous commit. I would rather leave the decision to the maintainers. Whether it be reverting *this* commit, or accepting this one, or leaving it to the future PR.
It's been a while since this branch and related item of work has been picked up. So we may as well update the dependencies and do a soft-restart here.
|
Hey @bwhaley, I am picking this back up after some time. I have made changes based on commentary though what are you looking for in the tests specifically that you would like added? I am happy to put something in, though I am not sure what we are looking to test for specifically. Are you looking for something like what is described in this stack overflow thread about testing logging output, testing the structlog configuration and asserting against it using a basic test with some snippets, or something else? Thanks! |
2bf4e2a to
670c84b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple more comments. Kicking off tests.
| ) | ||
|
|
||
| # logger is still needed to set the level for dependencies | ||
| logger = logging.getLogger() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that makes sense.
| logger.error(f"Unable to handle unknown event type: {json.dumps(event)}") | ||
| slogger.error("Unable to handle unknown event type: %s", json.dumps(event)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this appear in the logs? Is the JSON value of event legible in the output? This seems like an unlikely error, and I'm not sure how to easily test it, but I would be curious if it's readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition of structlog and orjson and python-json-logger actually adds a fair amount of bloat to the package. Currently, only boto3 is needed, which is included by default in the Lambda runtime. The others are not, so they need to be added as a layer to the Lambda function.
How did you test it?
While updating the logging to switch to structured logging, feedback was provided that identified a case of inaccurate and broken logging when `replace_route()` is invoked. Not only is the log message wrong by using a route table id instead of the route id, but also the use of the `new_route_table` variable would produce a dictionary in the log. The log line has now been updated to accurately state what is going on, and remove the use of a nested dictionary in the structured log. BREAKING CHANGE: Users monitoring this message with the original specific phrasing will experience broken monitors. It is recommended that users should be monitoring generally for ERROR level logs, or the "Failed connectivity tests! Replacing route" logs specifically.
670c84b to
258b403
Compare
Purpose 🎯
These changes replace the standard library logging with structured logging using structlog. Structured logging provides easier-to-parse capabilities for toolings and humans when going through logs.
Context 🧠
bad events/all valid events, and right nowaws-alternatexercises a kind of "no news is good news" reporting.Notes 📓
structlog