Skip to content
This repository was archived by the owner on Apr 10, 2025. It is now read-only.

Design Doc: UserAgent classes for fragmenting cached rewritten HTML

Jeff Kaufman edited this page Jan 9, 2017 · 1 revision

UserAgent classes for fragmenting cached rewritten HTML

Anupama Dutta, 2013-05-16 (draft)

This document discusses a plan for cache-key-generation for the rewritten HTML caching feature. The solution proposed should be able to generate a cache-hit for ~94% of the HTML requests (based on internal logs data). In this solution, the cache fragmentation for a single URL will also be capped at 15 (to cover various device types and user-agent-based optimization possibilities).

Overview

The cache key generation logic is intended to be placed in a configuration file when using external caching solutions (e.g. nginx proxy_cache or varnish) and used within pagespeed when using internal caching solutions.

The cache-key will contain the HTML URL by default, and will also contain a UserAgent specific piece as determined by our generation logic. The goal of this document is to decide on the UserAgent classes that will be embedded into the cache key, and the UserAgent classes that will need to be forced to bypass the cache.

The number of UA classes that will be actually used in the cache key should not be excessively large because this could result in:

  1. cache explosion and inefficient use of the existing cache storage due to several alternatives being present for the same HTML

  2. purging complexity for the webmaster when they need to provide a HTML (URL) specific purge UI/API to their users. For e.g., our cdn partner provides a purge UI for their customers (publishers) to purge specific URLs from the cache, and all the versions of the URL would need to be purged if downstream caching for pagespeed is enabled.

Capabilities dependent on UserAgent

Currently, different parts of the pagespeed code help us determine the following capabilities for a given UserAgent:

  1. is a bot?
  2. device type? (desktop/tablet/mobile?)
  3. supports lazyload images?
  4. supports_image_inlining?
  5. supports beaconing? (right now same as image inlining support)
  6. supports defer javascript?
  7. supports webp?
  8. supports webp-lossless?
  9. supports split html?
  10. flush-early prefetching mechanism? (link/image?)
  11. supports insert-dns-prefetch? (prefetch/dns-prefetch tags?)

Capabilities to be ignored in first iteration

Few of the above features can be ignored in the cache-key-construction in the first iteration of the downstream caching feature:

  • Beaconing (point 5 above) will need complex handing wrt caching of rewritten HTML since instrumented pages ought not be cached and beacon requests could potentially need to trigger purges for the cached HTML.

  • Split-html and Flush-early (points 9 and 10 above) are not yet supported in MPS/NPS and the current cached rewritten HTML approach will be first deployed in these.

  • Since Insert-dns-prefetch (point 11) does not have a clear relation to the other capabilities listed above (as in, it is not subsumed by any of the other capabilities), we could skip it for now, and ask for it to be disabled when caching of rewritten HTML is enabled.

For both flush-early and dns-prefetch, trying a client-side approach for detecting the prefetch mechanism to use might be a cleaner approach, but this is not within our current scope.

Capabilities to be respected in first iteration

Considering the remaining capabilities (1,2,3,4,6,7,8) wrt how our cache would be fragmented, we can generate our cache key with two components:

a. device_type (covering point 2 above)

b. capability_list (covering 1,3,4,6,7,8 above).

We see the following distribution:

  1. lazyload (ll) : 16.5%
  2. lazyload + inline_images (ll,ii): 6.9%
  3. lazyload + inline_images + webp : 2.4%
  4. lazyload + inline_images + defer_js (ll,ii,dj) : 26.7%
  5. lazyload + inline_images + defer_js + webp: 5.7%
  6. lazyload + inline_images + defer_js + webp + webp_lossess (ll,ii,dj,jw,ws): 43.7%

For reducing the number of UA classes used in the cache key, and focusing on those which can give us the maximum number of hits, we are going to ignore "webp" capability. Since webp seems to be getting replaced by webp_lossless largely, we can afford to take cache-misses for webp UAs and allow these to hit the backend server.

For the "capability_list" field, the capabilities that would be part of the cache-key in increasing order of feature-richness-of-the-user-agent would be :

  1. Non-user-agent-dependent optimizations only (empty string): 6.9% *
  2. lazyload (ll) : 16.5%
  3. lazyload + inline_images (ll,ii): 6.9%
  4. lazyload + inline_images + defer_js (ll,ii,dj) : 26.7%
  5. lazyload + inline_images + defer_js + webp + webp_lossess (ll,ii,dj,jw,ws): 43.7%

Bots with no capabilities make up 6.9% of all requests.

About 94% of the non-bot requests (which make up 93.1% of all requests) and the bot requests (which make up 6.9% of all requests) will become cache hits if these 5 "capability_list" fields along with the 3 “device_type” fields are used in the cache key generation. This accounts for nearly 94.4% of all requests that can be handled by 15 combinations.

Any request which does not match one of the the above 5 (capability_list) * 3 (device_type) combinations will be forced to bypass the cache and get served out by the backend pagespeed server. So, the logic used for the cache key generation will have to compute the values for all the capabilities (including "webp") and then decide whether it matches one of the 15 combinations or not. The computation logic for various capabilities will need to be kept in sync with the user_agent_matcher.cc logic or any other UA dependent logic that is introduced into pagespeed code.

Future refinements

  • We might want to put in some tracking to figure out which UA combinations are dominating a site, so that these can be prioritized in the cache-key-generation process. For instance, having "tablet" as a device_type might be unnecessary if it accounts for a very low percentage of current requests. I don’t think this prioritization can be automatically incorporated into the cache-key-generation-logic of a running server. However, it can be surfaced in say, the console, to allow the webmaster to improve the cache utilization.

  • Consider dropping the lazyload + inline_images (7%) segment or potentially any segment that has low traffic corresponding to it. We might want to keep this configurable (say, number of cache fragments acceptable to the webmaster) and keep fewer than 5 capability classes based on this configuration. (Will figure out more about this as we work with our cdn partner to get this tested and deployed).

  • The current approach does not consider (a) the enabled filters mentioned in the conf file and (b) whether a filter actually got applied on a given URL/page or not. Though (b) may not be deducible from the input request, (a) might still be something we want to consider factoring into the proxy_cache_key to reduce cache fragmentation.

  • The device_type and capability_list pieces of the cache-key could be propagated to the backend pagespeed server, via a special request header in the cache MISS/EXPIRED cases, so that the pagespeed server respects the capabilities mentioned in this header. So, even if the pagespeed-logic deviates from the caching-layer-logic wrt UA classification/support, we would be able to choose the most restrictive of the 2 classifications, so that the responses served out do not break pages served for any UA. For e.g., if the capability_list in the header mentions image_inlining, but the UA logic within pagespeed has been updated to indicate that this particular UA no longer supports image_inlining (due to some bug etc.), we could serve out a page without image_inlining applied on it. Though this would mean that the optimization would be ineffective, it would not break pages anymore.

Some rough implementation details

  1. Webp detection logic, ported to nginx (based on UA sniffing): https://github.com/igrigorik/webp-detect/blob/master/nginx.conf

(For Canary, we should prefer Accept: http://www.igvita.com/2013/05/01/deploying-webp-via-accept-content-negotiation/)

  1. Refer to http://wiki.nginx.org/HttpLuaModule#Introduction for including the logic from a separate file that can be updated with every release.

Configuring number of cache fragments

How do we allow webmasters to control the number of cache fragments allowed (the max is going to be capped at 15 as per my doc)? This number of cache fragments would determine the factor by which the cache size would grow when the downstream caching feature is enabled.

a) Jeff and I discussed, just before yesterday's cdn partner meeting, an approach where the webmaster uses a max_cache_variations_per_url number to decide whether the request will have its response cached or not. This decision logic will be in a snippet provided by us and it will be based on our prioritized list of UA-classes. So, someone who specifies 2 would see only the top 2 combinations (as per our recommendation) cached. These might be optimizations-supported-by-modern-Chrome and optimizations-supported-by-modern-IE-and-FF. Everything else would fall through to the backend pagespeed server. The webmaster could also configure everything else to use a cached non-UA-dependent rewritten version. We would want to add a request header PS-CapabilityList to tell the backend what kind of capabilities it will be ready to support in this cached response, so that the backend does not use a different logic and add unsupported optimizations

The configuration will be along the following lines:

set $ua_dependent_ps_capability_list = "";
set $ua_dependent_ps_capability_list_priority = 0; 
# Lower numbers indicate higher priority in the above line.
if ($http_user_agent ~* "Android|Chrome|Firefox" # sample regexp) {
  # image inlining is supported.
  $ua_dependent_ps_capability_list += "ii,";
}
..
# Add other regexp checks here.
# Use a predefined capability_list -> priority map to replace the below block.
if ($ua_dependent_ps_capability_list = "ii,") {
  $ua_dependent_ps_capability_list_priority = 4; # sample priority number.
} else ..
..
# the below variable is set by the webmaster and then the
# logic snippet given by us is pasted below it.
set $max_cache_variations_per_url = 3 
if ($ua_dependent_ps_capability_list_priority <=
    $max_cache_variations_per_url) { 
  add_header "PS-CapabilityList: $ua_dependent_ps_capability_list;"
  proxy_cache_key = $ua_dependent_ps_capability_list + $original_key;
  set $bypass = false; # this should be used in proxy_bypass line.
} else {
  # There are 2 options here:
  # 1. Add the capability list header and allow this request
  # to fall through to backend server.
  add_header PS-Capabilities: $ua_dependent_ps_capability_list;
  set $bypass = true;  # this should be used in proxy_bypass line.
  # 2. Dont add the capability list header so that only 
  # non-UA-dependent optimizations alone get applied.
  # Also allow lookup to use empty ua_dependent_ps_capability_list
  # in proxy_cache_key, by uncommenting the below line:
  # proxy_cache_key = $original_key;
}

b) CDN partner folks wanted to be able to enable and disable filters to control fragmentation. I think this will be difficult for both us and them because the mapping from filters to UA-dependent-capabilities is not 1:1. For e.g., inline-preview-images and inline-images depend on image-inlining being supported. However, disabling inline-preview-images does not necessarily mean that the inline-images capability can be ignored in the cache fragmentation. Since the way in which the cache fragments are computed might also change over time, and disabling certain filters may not necessarily reduce the number of fragments. I think having a number for the max-cache-fragments would be the most natural and forward-compatible way to tailor the cache size increase.

Clone this wiki locally