Using Apache mod_cache

On February 10, 2012, in Web Development, by Anuj Gakhar

Recently, I’ve been looking at some of the caching mechanisms to allow for a faster response time from Apache webserver. I know this is a pretty huge topic and it appears that the general consensus is to use one of the known tools like redis, memcached or varnish. However, before diving into one of these full blown tools and before adding any extra layer of complexity in my stack, I thought of giving Apache’s mod_cache a try. Directly quoting from Apache’s documentation :-

As of Apache HTTP server version 2.2 mod_cache and mod_file_cache are no longer marked experimental and are considered suitable for production use. These caching architectures provide a powerful means to accelerate HTTP handling, both as an origin webserver and as a proxy.

mod_cache and its provider modules mod_mem_cache and mod_disk_cache provide intelligent, HTTP-aware caching. The content itself is stored in the cache, and mod_cache aims to honour all of the various HTTP headers and options that control the cachability of content. It can handle both local and proxied content. mod_cache is aimed at both simple and complex caching configurations, where you are dealing with proxied content, dynamic local content or have a need to speed up access to local files which change with time.

mod_file_cache on the other hand presents a more basic, but sometimes useful, form of caching. Rather than maintain the complexity of actively ensuring the cachability of URLs, mod_file_cache offers file-handle and memory-mapping tricks to keep a cache of files as they were when Apache was last started. As such, mod_file_cache is aimed at improving the access time to local static files which do not change very often.

I decided to use the disk based cache (mod_disk_cache) and the first step was to obviously enable the module. On my Ubuntu server, this is what I did :-
[xml]sudo a2enmod cache[/xml]

Once the module is enabled, I added this in one of my virtual host configuration :-
[xml]<IfModule mod_cache.c>
CacheRoot /var/cache/apache2
CacheEnable disk /
CacheDirLevels 2
CacheDirLength 1
CacheIgnoreNoLastMod On

Here is a little explanation of the above Directives :-

  • CacheRoot – tells the module where on disk it should store the cache files.
  • CacheEnable – tells the module that it should enable caching for this virtual host.
  • CacheDirLevels – specifies how many levels of subdirectory there should be
  • CacheDirLength specifies how many characters should be in each directory
  • CacheIgnoreNoLastMod – this one was quite important in my implementation. My URls/files did not actually exist as they were being served by dynamic routes. So they did not have any Last Modified Date on them and Apache needs this or the Expires header to actually cache the content. Telling Apache to ignore this actually started the caching for me, before this I could not get Apache to cache anything.

This is all that’s required in order to set up caching. The default cache is one hour but you can override that with the CacheDefaultExpire Directive. So, after all this, restart Apache and hit your website in the browser. You should see a series of new folders being generated in your CacheRoot location….and some .header/.data files under each folder. This means your caching is working.

Here are some rules/gotchas on what can be cached :-

  1. Caching must be enabled for this URL. See the CacheEnable and CacheDisable directives.
  2. The response must have a HTTP status code of 200, 203, 300, 301 or 410.
  3. The request must be a HTTP GET request.
  4. If the request contains an “Authorization:” header, the respons will not be cached.
  5. If the response contains an “Authorization:” header, it must also contain an “s-maxage”, “must-revalidate” or “public” option in the “Cache-Control:” header.
  6. If the URL included a query string (e.g. from a HTML form GET method) it will not be cached unless the response specifies an explicit expiration by including an “Expires:” header or the max-age or s-maxage directive of the “Cache-Control:” header, as per RFC2616 sections 13.9 and 13.2.1.
  7. If the response has a status of 200 (OK), the response must also include at least one of the “Etag”, “Last-Modified” or the “Expires” headers, or the max-age or s-maxage directive of the “Cache-Control:” header, unless the CacheIgnoreNoLastMod directive has been used to require otherwise.
  8. If the response includes the “private” option in a “Cache-Control:” header, it will not be stored unless the CacheStorePrivate has been used to require otherwise.
  9. Likewise, if the response includes the “no-store” option in a “Cache-Control:” header, it will not be stored unless the CacheStoreNoStore has been used.
  10. A response will not be stored if it includes a “Vary:” header containing the match-all “*”.

This is my first ever attempt at using Caching with Apache. Not sure about any gotchas or issues yet but seems to be working. Does anyone have any experience with using this on a production environment?

Tagged with:  

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

© 2011 Anuj Gakhar