Design Doc: Beaconing dependent optimizations and downstream caching

Beaconing-dependent-optimizations and downstream-caching

Anupama Dutta, April 2013

Objective:

Make beaconing-dependent optimizations (such as lazyload_images, prioritize_critical_css, etc.) work well with downstream caching feature.

Background:

Downstream caching feature draft design is in this document. There are several side-effects to the current approach wrt beaconing-dependent optimizations. If such optimizations are enabled, the same instrumented page may be served to several users (due to being cached in the downstream cache), and we may receive several beacons of which only one is valid, due to the constraints imposed due to nonces. This also means that it takes a long time to get enough beacons to give us confidence in the data we have collected.

Solution: (based on discussion with Jeff, Jan and Jud)

Here is a diagram representing the final solution. Noted below are the changes needed for this solution to work.

Add configuration in the downstream cache to randomly send the following sets of traffic to the backend with an extra PS-ShouldBeacon header. These requests will be treated as requests that need to be instrumented.
- HitRebeacon% of the hit traffic. Recommended value for HitRebeacon% will be <5% and this will need to be configured in the downstream caching layer.
- MissRebeacon % of the miss traffic. Recommended value for this will be 25%.
Every instrumented page should be served out by the PageSpeed server with no-cache headers. These will not be cached by the downstream caching layer. Beacon data received from these pages will be used to build pcache data, which will be used whenever the page expires in the downstream cache.
A DownstreamCacheRebeaconingKey should be specified for co-ordinating b/w the downstream caching server and the PageSpeed server.

Details:

Remove any incoming PS-ShouldBeacon headers to ensure that attackers don't use this header to force re-beaconing for their requests.
Whenever a DownstreamCacheRebeaconingKey is specified, beaconing (instrumenting) will only be done on requests that carry the PS-ShouldBeacon header with the right matching key. Note that the reinstrumentation-interval directive will be ignored when downstream-caching is enabled.
Popular pages will mostly get sufficient number of re-beacon opportunities because of the HitRebeacon% logic. Pages with medium traffic might be forced to get fewer re-beacon opportunities if they rely on just the HitRebeacon% because of the popular pages taking up most of the quota. However, the higher DownstreamCacheMissRebeaconPercentage will ensure that even these pages get instrumented intermittently but at a slower rate.
The DownstreamCacheMissRebeaconPercentage directive can be omitted if we are able to control the bypass-cache-rate for miss/expiry status differently. I don’t think this is doable, which is why I am suggesting moving this decision to the PageSpeed server.
For downstream caching setups that rely on the caching layer to decide the amount of time the response is to be cached, and do not use the upstream Cache-Control headers for this, we should send an additional header, say PS-NotCacheable: true, which can then be respected in the downstream cache config,
No purges will be issued at any time due to re-beaconed data.

Tasks:

Add support for respecting a PS-ShouldBeacon header with the right value (DownstreamRebeaconingKey) : CL in review
Set no-cache headers on all instrumented requests irrespective of whether ModifyCachingHeaders is true or not.
Reuse RewriteOptions BeaconReinstrumentTimeSec instead of local kMinBeaconIntervalSec so that it can be configured in tests etc.

Supporting data from internal logs for why we need DownstreamCacheMissRebeaconPercentage in addition to HitRebeacon%

Example site has 175K QPD. 99+% of the distinct URLs have < 1K QPD.

20 unique requests make up 37% of the traffic with >1K QPD = 65K
35 unique requests make up 20% of the traffic with <1K QPD. = 35K
Assuming peak hour traffic (of ~12 hours) to make up 80% of the traffic, there would be 140K/12 = 12K requests per hour.

For the high traffic requests:

37% of total traffic = 4K requests for the 20 high-traffic requests = 220 requests per hour for one page.
1% of this will mean 2 beacon requests per hour and 22 beacons during the entire day (on an average).

For the medium-traffic requests:

20% of total traffic = 2.4K requests for the 35 medium-traffic requests = 70 requests per hour for one page, which is not insignificant at all. This means we can’t ignore this sector.
1% of this will mean <1 beacon request per hour and about 7 requests during the whole day which is very less data for something that frequent.

Assuming downstream cache expiry time is 5 mins, 25% miss-re-beacon logic will cause 18 misses in an hour for the high-traffic requests, of which 4.6 will trigger beacons. So, we will get 55 more beacons per day for such requests (in addition to 220). Hit rate that was originally 92% ((220 - 18) / 220) will now become 90%.

For the medium-traffic requests, the 25% miss-re-beacon logic will cause 6 misses in an hour, of which 1.5 will trigger beacons. So, we will get 6 more beacons (in addition to the 7 we used to get). Hit rate that was originally 91% will now become 89%.

To summarize, hit rates don’t drop by more than 2% if we use the miss-rebeacon logic, and this logic can help get almost twice the number of beacons in the case of medium/low-traffic pages.

Later refinements:

If there are enough usecases where pages are cached for longer durations of time, it may so happen that though low-traffic fragments (e.g. non-webp fragment for a certain page) are getting re-beaconed often enough, they are not getting purged often enough from the downstream cache, resulting in incompletely optimized pages being served out for these. To overcome this problem, we might want to introduce the ability to purge whenever beacon data is received from an instrumented page. And, in order to purge the relevant fragment, we will need to store the RequestHeaders along with the nonce used for the page, so that the purge request can be reissued with the right RequestHeaders.

Issues that should be considered for fixing in the beaconing strategy (unrelated to downstream caching):

We are right now pretty naive about our re-beaconing strategy. We should consider aggressively re-beaconing when the page changes and then set a much longer (or even infinite) re-beacon interval once we have enough page data.
Maybe store the last time a page was requested in the pcache, and only beacon if the page has been requested within a certain time frame. This helps cut down on putting in beacon js when the results won’t be used.

Design Doc: Beaconing dependent optimizations and downstream caching

Beaconing-dependent-optimizations and downstream-caching

Objective:

Background:

Solution: (based on discussion with Jeff, Jan and Jud)

Details:

Tasks:

Supporting data from internal logs for why we need DownstreamCacheMissRebeaconPercentage in addition to HitRebeacon%

Later refinements:

Issues that should be considered for fixing in the beaconing strategy (unrelated to downstream caching):

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally