Authors: Craig E. Wills and Mikhail Mikhailov
This paper continues work to monitor and better understand the characteristics of resource changes at servers and how these servers report meta data about the resources. It extends our own previous work, which studied selected resources from popular web sites, to an actual trace of user requests. This approach allows study of a set of resources that users are known to be retrieving.
The results show that there is potential to reuse more cached resources than is currently being realized due to inaccurate and nonexistent directives. For example, over 33\% of HTML resources in the study do not change, but contain no last modification time or other cache directive in the response, so these resources cannot be cached and validated with the origin server. In addition, embedded images are often reused, even in pages that change frequently. This result both points to the need to cache such images and to discard them when they are no longer included as part of any page.
The last result of this work is that the inclusion of a cookie as part of a request does not make the response uncacheable. In most cases we obtained identical responses from two requests for the same URL with different cookies. These results imply such responses can be cached and used for validation if other cache directives allow for it. In cases where the responses are not the same, they often differ only in the ad image contained.
Full Paper: postscript