CSP: Implements the algorithm for matching URLs against source patterns. #1512

morlovich · 2017-03-13T12:13:17Z

This also includes doing some canonicalization and representation tweaks on the parsed version to make comparisons easier.

…e spec. Will probably do a cleanup pass on top of it, it's rather scary.

Not much shorter, but a lot less scary.

…vity.

our %-decoder, which may behave differently in some cases. Also pre-compute the canonicalized + split form on parse.

oschaaf · 2017-03-13T13:47:48Z

net/instaweb/rewriter/public/csp.h


-    GoogleString scheme_part;  // doesn't include :
-    GoogleString host_part;
+    GoogleString scheme_part;  // doesn't include :, lowercased.


maybe rename to lowercase_scheme_part?

oschaaf · 2017-03-13T13:48:02Z

net/instaweb/rewriter/public/csp.h

-    GoogleString scheme_part;  // doesn't include :
-    GoogleString host_part;
+    GoogleString scheme_part;  // doesn't include :, lowercased.
+    GoogleString host_part;    // lowercased.


maybe rename to lowercase_host_part?

oschaaf · 2017-03-13T13:59:22Z

net/instaweb/rewriter/public/csp.h

    GoogleString port_part;
-    GoogleString path_part;
+    std::vector<GoogleString> path_part;  // normalized, separated by /
+    bool path_exact_match;


Maybe point to the RFC description here in a doc comment?
Maybe rename to parsed_path_exact_match and add a comment about how this interacts/needs interpretation with redirect urls?

Rewrote the comments here to give a hopefully better overview. Not a fan of lowercased_ names due to length.

oschaaf · 2017-03-13T14:01:10Z

net/instaweb/rewriter/public/csp.h

    }

    bool operator==(const UrlData& other) const {
      return scheme_part == other.scheme_part &&
             host_part == other.host_part &&
             port_part == other.port_part &&
-             path_part == other.path_part;
+             path_part == other.path_part &&
+             path_exact_match == other.path_exact_match;


So I wonder if it would make sense to somehow force into consideration the context of the comparison (redirect Y/N)? Or alternatively doc that it doesn't do that?

This is just for unit tests.

oschaaf · 2017-03-13T14:03:45Z

net/instaweb/rewriter/csp.cc

+      }
+      result.mutable_url_data()->path_part.push_back(canon.substr(1));
+    }
+    result.mutable_url_data()->path_exact_match =


Do we need to consider the context in which we are parsing this, to be able to determine this? (specifically, if we are considering a redirect url Y/N?)

oschaaf · 2017-03-13T14:05:50Z

net/instaweb/rewriter/csp.cc

+bool CspSourceExpression::Matches(
+    const GoogleUrl& origin_url, const GoogleUrl& url) const {
+  // Implementation of the "Does url match expression in origin with
+  // redirect count?" algorithm (where redirect count is 0 for our


Maybe link to the rfc section on this algorithm?

oschaaf · 2017-03-13T16:27:18Z

net/instaweb/rewriter/csp.cc

+    return false;
+  }
+
+  if (!origin_url.IsAnyValid() || !url.IsAnyValid()) {


Do we need to specifically handle blob / data / filesystem schemes here?

Hmm. Actually I want to limit this to http[s], since I dropped support for ws/wss (as we don't use them and they were making for some complicated expressions)

Actually probably want data:, too, assuming we actually fetch it (checking)

We don't. Hmm.

oschaaf · 2017-03-13T16:29:05Z

net/instaweb/rewriter/csp_test.cc

+  // Other schemes have to match origin to be permitted.
+  CheckMatch(true, "*", "gopher://origin", "gopher://www.example.com");
+  CheckMatch(false, "*", "gopher://origin", "weirder://www.example.com");
+}


Maybe also test assumptions on data / blob / filesystem uri's?

Adding some.... Very good point.

morlovich · 2017-03-13T17:09:00Z

net/instaweb/rewriter/csp.cc

+    }
+  }
+
+  if (!expr_path.empty()) {  // this would also be skipped for redirects


This is the only spot that will need to be changed for redirects --- basically steps after redirect don't do the path check.

re: "This is the only spot that will need to be changed for redirects --- basically steps after redirect don't do the path check."
Ok, thanks for clearing that up

Hmm, thought of it some more, and actually integration with redirects is quite nightmarish.
Basically the problem case is: different pages can have different CSPs, and our caching would be completely unaware of it.

If we could store a list of all the Location: headers we followed along with the final response, maybe we could check CSP at output time?

based on review feedback

morlovich · 2017-03-15T16:44:17Z

On Wed, Mar 15, 2017 at 10:29 AM, Otto van der Schaaf < ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In net/instaweb/rewriter/csp.cc <#1512 (comment)> : > + + if (expr_port.empty()) { + if (!HasDefaultPortForScheme(url)) { + return false; + } + } else { + // TODO(morlovich): Check whether the :80/:443 case is about effective + // or explicit port. + if (expr_port != "*" + && expr_port != IntegerToString(url.EffectiveIntPort()) + && !(expr_port == "80" && url.EffectiveIntPort() == 443)) { + return false; + } + } + + if (!expr_path.empty()) { // this would also be skipped for redirects If we could store a list of all the Location: headers we followed along with the final response, maybe we could check CSP at output time?

Yeah, that sounds workable. Would have to store them in metadata cache and propagating that across the layers would be annoying and fiddly, but it could work correctly.

morlovich · 2017-03-27T15:05:31Z

Huibao: Could you take a look at this (and its successors); Josh seems to be way too busy.
CL roadmap:
https://github.com/pagespeed/mod_pagespeed/wiki/Design-Doc:-Brainstorming-PageSpeed-Optimization-Products-and-Content-Security-Policy#implementation-status

morlovich added 4 commits March 13, 2017 08:10

First implementation of the URL-match-expr algorithm, largely from th…

556effb

…e spec. Will probably do a cleanup pass on top of it, it's rather scary.

Simplify a bit by dropping ws:/wss: support.

4403a8d

Not much shorter, but a lot less scary.

Normalize some in our representation, improve testing of case sensiti…

9b88208

…vity.

Consistently canonicalize things the GURL wya, rather than just using

c5d3157

our %-decoder, which may behave differently in some cases. Also pre-compute the canonicalized + split form on parse.

morlovich requested review from jmarantz and oschaaf March 13, 2017 12:19

oschaaf reviewed Mar 13, 2017

View reviewed changes

morlovich commented Mar 13, 2017

View reviewed changes

Comment tweaks plus a couple of additional tests,

12cf40f

based on review feedback

morlovich requested a review from huibaolin March 27, 2017 15:04

jmarantz approved these changes Jul 27, 2017

View reviewed changes

morlovich merged commit 1f18e93 into master Jul 27, 2017

morlovich deleted the morlovich-csp-urls2 branch July 27, 2017 20:51

CSP: Implements the algorithm for matching URLs against source patterns. #1512

CSP: Implements the algorithm for matching URLs against source patterns. #1512

Uh oh!

Conversation

morlovich commented Mar 13, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oschaaf Mar 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

morlovich commented Mar 15, 2017 via email

Uh oh!

morlovich commented Mar 27, 2017

Uh oh!

Uh oh!

oschaaf Mar 13, 2017 •

edited

Loading