Design Doc: Detecting and Fixing Reflows

Detecting and fixing reflows

Srihari Sukumaran, 2012-03-05

Last updated 2013-04-16

Based on discussions with Pulkit Goyal, Rahul Bansal, Ram Ramani and Kishore Simbili

Objective

There are two causes of reflow in pages rewritten by PSA filters:

Deferring the execution of javascript (the deferjs filter) can cause reflows during page rendering.
Insertion of non-cacheable elements by blink client-side javascript.

The primary objective here is to prevent such reflows. Any solution will involve trying to identify (and remember) dom changes due to script execution. We would like to have a single solution to fix both kinds of reflows. It is likely that a solution can have other applications, e.g., identifying image dimensions when these are not specified (this is required for spriting).

Design

We would like the solution to work in blink flow and the proxy fetch flow (used when blink is not enabled). There are two components to the solution:

An off-line module that makes use of a headless browser to render the page and inspect the dom before and after scripts execute. From this it will identify nodes (mostly ‘div’) whose (rendered) size has changed due to scripts, and keep this information in property cache.
A FixReflow filter (acting at request processing time) that will retrieve the node-size information from property cache and add appropriate size attributes to the corresponding nodes.

We give some details of these two components before describing how these can be used in blink flow and proxy fetch flow.

Off-line headless browser module (identify reflows)

The headless browser render will be done with javascript disabled and with an extension js. In the extension js the following will be done in order:

Go over the dom and annotate (in the html) nodes of interest with their rendered size.
Execute the scripts on the page.
Go over the dom and for nodes with annotation, if the rendered size now is different from that in the annotation, then add "identifier for node : size after scripts" to the output json.

There are two (related) points to note:

Nodes of interest: We could consider all nodes here, or for simplicity and performance only look at ‘div’ nodes, since all the reflow observed so far should be fixable by adding appropriate size attributes to some ‘div’.
How to identify a node in the output json: This is hard in general. Initially we plan to use node ‘id’ or ‘class’ (if the class value is used only for that node -- this seems common for ad divs, a common source of reflow problems) to identify a node. Thus others nodes with size changes cannot be captured in the output json. Hence "identifier for node" will be something like “‘id’|’class’,<value of the id/class>”. Clearly this approach has two problems:

Not all reflows can be handled.
This could break pages when another node with the same ‘class’ value is present in a later version of the page.

We need to experiment and check how often b happens.

Once the headless browser response is received, the contents of output json will be written to property cache.

In property cache, there will be a PropertyValue (in render cohort) whose value will, conceptually, be a list of "identifier for node : size" entries. Concretely we could represent this as a protobuf or simply as string (comma separated list of identifier:size pairs). The property value will also need an expiration semantics in property cache.

FixReflow filter

If the property cache page for the requested url contains does not contain a value for "NodeAndSize" then this Filter does nothing. Otherwise, in the filter’s StartElement if the element is a node in the property value, then add the size from the value as a style attribute.

We now consider how the above are invoked in the PSA rewriting flows.

Proxy fetch flow

For an url for which deferjs filter is enabled, if property value for "NodeAndSize" is not found in property cache, then the off-line headless browser module is triggered. For such urls (handled in proxy fetch flow) the headless browser can fetch content from url with deferjs filter enabled (via request header), so that when the fetch request (from headless browser) comes back to PSS “deferjs” is applied. (Thus no buffering of html is needed.)

If property value for "NodeAndSize" is found in property cache, then FixReflow filter is applied in the line of request.

Blink flow

In the blink (off-line) flow certain PSA filters (including deferjs) is applied, the html is buffered and passed to the headless browser with blink js as extension js. The output from this is above the fold html and data for rest of the page. These are cached for use in blink request flow.

The plan for fixing reflow in this flow is the following:

The functionality of reflow identification off-line headless browser module is combined with the blink request. That is, both the reflow js and blink js are run as extension. The js is sequenced such that the reflow js acts after the blink js stitches in the non-cacheable elements and thus is able to identify any dom element size changes due to this also.

In the blink flow, we can obtain two advantages from the fact that the above the fold html is cached:

The extension js (i.e., reflow and blink js run in headless browser) can insert the size attributes at the end of the render. The cached html will have reflows (above the fold) fixed. Thus in the blink flow, the identification and fixing of reflow occur at the same place. (We do not ues property cache and the FixReflow filter.)
More interestingly, since the identification and fixing of reflow both happen in a single headless render we can fix reflows due to elements with no unique id or class name. We set a synthetic id for elements when we record their sizes before the scripts run (step 1 in the off-line headless browser module). Then when we can check size changes for all these elements (step 3) and at this point also remove the synthetic id attribute.

Design alternatives considered

For headless browser rendering, a design with two render requests (one each with scripts enabled and not enabled) and then an explicit dom walk based diff of the render responses was rejected due to complexity (the dom walk and diff is difficult to get correct) and performance (two requests).
Using something like xpath to identify nodes uniquely was not considered due to fragility (for a node the xpath for it need not be same across version).

Headless browser interface

With this functionality described here we have three clients for headless renders: blink, critical image finder and identification of reflows. Of these, if blink is enabled it alone makes a headless browser request (since critical image finder is not active and reflow identification js will be included in the blink render request). When blink is not enabled (i.e., proxy fetch flow) we should combine the critical image finder and reflow identification render requests.

Design Doc: Detecting and Fixing Reflows

Detecting and fixing reflows

Objective

Design

Off-line headless browser module (identify reflows)

FixReflow filter

Proxy fetch flow

Blink flow

Design alternatives considered

Headless browser interface

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally