-
Notifications
You must be signed in to change notification settings - Fork 158
CSP: Implements the algorithm for matching URLs against source patterns. #1512
Conversation
…e spec. Will probably do a cleanup pass on top of it, it's rather scary.
Not much shorter, but a lot less scary.
our %-decoder, which may behave differently in some cases. Also pre-compute the canonicalized + split form on parse.
net/instaweb/rewriter/public/csp.h
Outdated
|
||
GoogleString scheme_part; // doesn't include : | ||
GoogleString host_part; | ||
GoogleString scheme_part; // doesn't include :, lowercased. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rename to lowercase_scheme_part?
net/instaweb/rewriter/public/csp.h
Outdated
GoogleString scheme_part; // doesn't include : | ||
GoogleString host_part; | ||
GoogleString scheme_part; // doesn't include :, lowercased. | ||
GoogleString host_part; // lowercased. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rename to lowercase_host_part?
net/instaweb/rewriter/public/csp.h
Outdated
GoogleString port_part; | ||
GoogleString path_part; | ||
std::vector<GoogleString> path_part; // normalized, separated by / | ||
bool path_exact_match; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe point to the RFC description here in a doc comment?
Maybe rename to parsed_path_exact_match and add a comment about how this interacts/needs interpretation with redirect urls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrote the comments here to give a hopefully better overview. Not a fan of lowercased_ names due to length.
} | ||
|
||
bool operator==(const UrlData& other) const { | ||
return scheme_part == other.scheme_part && | ||
host_part == other.host_part && | ||
port_part == other.port_part && | ||
path_part == other.path_part; | ||
path_part == other.path_part && | ||
path_exact_match == other.path_exact_match; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I wonder if it would make sense to somehow force into consideration the context of the comparison (redirect Y/N)? Or alternatively doc that it doesn't do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just for unit tests.
} | ||
result.mutable_url_data()->path_part.push_back(canon.substr(1)); | ||
} | ||
result.mutable_url_data()->path_exact_match = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to consider the context in which we are parsing this, to be able to determine this? (specifically, if we are considering a redirect url Y/N?)
bool CspSourceExpression::Matches( | ||
const GoogleUrl& origin_url, const GoogleUrl& url) const { | ||
// Implementation of the "Does url match expression in origin with | ||
// redirect count?" algorithm (where redirect count is 0 for our |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe link to the rfc section on this algorithm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return false; | ||
} | ||
|
||
if (!origin_url.IsAnyValid() || !url.IsAnyValid()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to specifically handle blob / data / filesystem schemes here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Actually I want to limit this to http[s], since I dropped support for ws/wss (as we don't use them and they were making for some complicated expressions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually probably want data:, too, assuming we actually fetch it (checking)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't. Hmm.
// Other schemes have to match origin to be permitted. | ||
CheckMatch(true, "*", "gopher://origin", "gopher://www.example.com"); | ||
CheckMatch(false, "*", "gopher://origin", "weirder://www.example.com"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also test assumptions on data / blob / filesystem uri's?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some.... Very good point.
net/instaweb/rewriter/csp.cc
Outdated
} | ||
} | ||
|
||
if (!expr_path.empty()) { // this would also be skipped for redirects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only spot that will need to be changed for redirects --- basically steps after redirect don't do the path check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: "This is the only spot that will need to be changed for redirects --- basically steps after redirect don't do the path check."
Ok, thanks for clearing that up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, thought of it some more, and actually integration with redirects is quite nightmarish.
Basically the problem case is: different pages can have different CSPs, and our caching would be completely unaware of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we could store a list of all the Location:
headers we followed along with the final response, maybe we could check CSP at output time?
based on review feedback
On Wed, Mar 15, 2017 at 10:29 AM, Otto van der Schaaf < ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In net/instaweb/rewriter/csp.cc
<#1512 (comment)>
:
> +
+ if (expr_port.empty()) {
+ if (!HasDefaultPortForScheme(url)) {
+ return false;
+ }
+ } else {
+ // TODO(morlovich): Check whether the :80/:443 case is about effective
+ // or explicit port.
+ if (expr_port != "*"
+ && expr_port != IntegerToString(url.EffectiveIntPort())
+ && !(expr_port == "80" && url.EffectiveIntPort() == 443)) {
+ return false;
+ }
+ }
+
+ if (!expr_path.empty()) { // this would also be skipped for redirects
If we could store a list of all the Location: headers we followed along
with the final response, maybe we could check CSP at output time?
Yeah, that sounds workable. Would have to store them in metadata cache and
propagating that across the layers would be annoying and fiddly, but it
could work correctly.
|
Huibao: Could you take a look at this (and its successors); Josh seems to be way too busy. |
This also includes doing some canonicalization and representation tweaks on the parsed version to make comparisons easier.