I recently had to deal with a situation in which a web server, which served content primarily to mobile devices, was constantly running out of disk space.
The reason for this was that Apache was generating GBs of logs each day. The site associated with the server was a busy site, but it still seemed strange that the logs would grow to that magnitude.
A word about HTTP responses before we continue.
When a browser first requests a file, the web server will fetch that file from the file system and deliver it over the network with a HTTP 200 response. By default, the browser will then store that file in its cache. If the browser needs to get the same file again, because it considers the copy in its cache to be too old, it will send what is referred to a Conditional GET request to the web server (by including a special HTTP header in the request, which includes the date the browser last accessed the file), which asks the web server to send the file again only if it has not been modified since the last request.
The web server then checks the last modified date on the file, and if it is the same as before, the web server will not send the actual file, but will instead issue a HTTP 304 response. This tells the browser that the file has not been modified since it was last accessed, and that it is safe to load that file from its cache.
Setting the Expires header of 14 days for files means that when those files are stored in cache, the browser will only make a Conditional Get request for any such file if 14 days have elapsed since it was first accessed and stored in the cache. This is a trade off between performance and control: your web server gets fewer requests, but there may be a delay in a user seeing an updated file.
In theory, this means that something like an image file should only be requested from the web server once every 14 days, which meant the behavior I was seeing in the logs was very strange indeed.
To get to the bottom of this I ran some Analog analysis on a week’s worth of logs. I targeted requests for a single-image file, and in the first pass, looked for the number of HTTP 200 responses for that image, and in the second pass, I looked for the number of HTTP 304 responses for the same file. I then did a comparison on the profile of mobile browsers making those requests.
The results are given below:
|Status 200||%||Status 304||%|
*iOS6 = iPhone 5, iOS5 = iPhone 4 etc etc…please don’t got out looking for the iPhone 6 in the shops!
The highlighted rows show the source of the problem.
Based on this data, it would appear that Android 2.3 browsers, and iOS 4 browsers (iPhone 3), have very limited caching capability.
Between them, they account for 23.57% of traffic on the site in the period in question, but 81.93% of Conditional GET requests. This would seem to suggest issues with the caching function in these browsers, which is most likely due to the cache space available to them reaching capacity.
What seems to be happening is that either new files are not being written to the cache (these phones had limited disk space), causing the browser to constantly refer to an out of date expires date on the files in the cache, of the cache is simply not functioning correctly, causing the browser to issue an unnecessary Conditional GET request.
The implications of this are pretty significant in terms of mobile web performance, which typically relies a much lower bandwidth capacity than the PC web.
Yes, it is true that Android 2.3 and iOS 4.0 are dropping out of the mix as new handsets come on the market, but given the amount of HTTP requests they generate, even a small population of older devices will have an impact on server performance.
Compare the relative data for iOS4 and iOS5 in the table above. There are 4 times as many standard HTTP 200 responses for iOS5 (iPhone 4), indicating that there are 4 times as many iPhone 4s as iPhone 3s in use on the site, but when you factor in the HTTP 304 responses, the total number of actual HTTP requests issuing from iPhone 3s is greater!
I hope to run this analysis again in 6 months time. The results should make for interesting reading.