Skip to content
This repository was archived by the owner on Apr 10, 2025. It is now read-only.

Conversation

morlovich
Copy link
Contributor

This also includes doing some canonicalization and representation tweaks on the parsed version to make comparisons easier.

…e spec.

Will probably do a cleanup pass on top of it, it's rather scary.
Not much shorter, but a lot less scary.
our %-decoder, which may behave differently in some cases.

Also pre-compute the canonicalized + split form on parse.
@morlovich morlovich requested review from jmarantz and oschaaf March 13, 2017 12:19

GoogleString scheme_part; // doesn't include :
GoogleString host_part;
GoogleString scheme_part; // doesn't include :, lowercased.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename to lowercase_scheme_part?

GoogleString scheme_part; // doesn't include :
GoogleString host_part;
GoogleString scheme_part; // doesn't include :, lowercased.
GoogleString host_part; // lowercased.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename to lowercase_host_part?

GoogleString port_part;
GoogleString path_part;
std::vector<GoogleString> path_part; // normalized, separated by /
bool path_exact_match;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe point to the RFC description here in a doc comment?
Maybe rename to parsed_path_exact_match and add a comment about how this interacts/needs interpretation with redirect urls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrote the comments here to give a hopefully better overview. Not a fan of lowercased_ names due to length.

}

bool operator==(const UrlData& other) const {
return scheme_part == other.scheme_part &&
host_part == other.host_part &&
port_part == other.port_part &&
path_part == other.path_part;
path_part == other.path_part &&
path_exact_match == other.path_exact_match;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I wonder if it would make sense to somehow force into consideration the context of the comparison (redirect Y/N)? Or alternatively doc that it doesn't do that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just for unit tests.

}
result.mutable_url_data()->path_part.push_back(canon.substr(1));
}
result.mutable_url_data()->path_exact_match =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to consider the context in which we are parsing this, to be able to determine this? (specifically, if we are considering a redirect url Y/N?)

bool CspSourceExpression::Matches(
const GoogleUrl& origin_url, const GoogleUrl& url) const {
// Implementation of the "Does url match expression in origin with
// redirect count?" algorithm (where redirect count is 0 for our
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe link to the rfc section on this algorithm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return false;
}

if (!origin_url.IsAnyValid() || !url.IsAnyValid()) {
Copy link
Member

@oschaaf oschaaf Mar 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to specifically handle blob / data / filesystem schemes here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Actually I want to limit this to http[s], since I dropped support for ws/wss (as we don't use them and they were making for some complicated expressions)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually probably want data:, too, assuming we actually fetch it (checking)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't. Hmm.

// Other schemes have to match origin to be permitted.
CheckMatch(true, "*", "gopher://origin", "gopher://www.example.com");
CheckMatch(false, "*", "gopher://origin", "weirder://www.example.com");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also test assumptions on data / blob / filesystem uri's?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some.... Very good point.

}
}

if (!expr_path.empty()) { // this would also be skipped for redirects
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only spot that will need to be changed for redirects --- basically steps after redirect don't do the path check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: "This is the only spot that will need to be changed for redirects --- basically steps after redirect don't do the path check."
Ok, thanks for clearing that up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, thought of it some more, and actually integration with redirects is quite nightmarish.
Basically the problem case is: different pages can have different CSPs, and our caching would be completely unaware of it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we could store a list of all the Location: headers we followed along with the final response, maybe we could check CSP at output time?

@morlovich
Copy link
Contributor Author

morlovich commented Mar 15, 2017 via email

@morlovich morlovich requested a review from huibaolin March 27, 2017 15:04
@morlovich
Copy link
Contributor Author

Huibao: Could you take a look at this (and its successors); Josh seems to be way too busy.
CL roadmap:
https://github.com/pagespeed/mod_pagespeed/wiki/Design-Doc:-Brainstorming-PageSpeed-Optimization-Products-and-Content-Security-Policy#implementation-status

@morlovich morlovich merged commit 1f18e93 into master Jul 27, 2017
@morlovich morlovich deleted the morlovich-csp-urls2 branch July 27, 2017 20:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants