# How To Optimize Your Site With HTTP Caching

Get the Math, Better Explained eBook and turn Huh? to Aha!

I’ve been on a web tweaking kick lately: how to speed up your javascript, gzip files with your server, and now how to set up caching. But the reason is simple: site performance is a feature.

For web sites, speed may be feature #1. Users hate waiting, we get frustrated by buffering videos and pages that pop together as images slowly load. It’s a jarring (aka bad) user experience. Time invested in site optimization is well worth it, so let’s dive in.

## What is Caching?

Caching is a great example of the ubiquitous time-space tradeoff in programming. You can save time by using space to store results.

In the case of websites, the browser can save a copy of images, stylesheets, javascript or the entire page. The next time the user needs that resource (such as a script or logo that appears on every page), the browser doesn’t have to download it again. Fewer downloads means a faster, happier site.

Here’s a quick refresher on how a web browser gets a page from the server:

1. Browser: Yo! You got index.html?
2. Server: (Looking it up…)
3. Sever: Totally, dude! It’s right here!

(The actual HTTP protocol may have minor differences; see Live HTTP Headers for more details.)

## Caching’s Ugly Secret: It Gets Stale

Caching seems fun and easy. The browser saves a copy of a file (like a logo image) and uses this cached (saved) copy on each page that needs the logo. This avoids having to download the image ever again and is perfect, right?

Wrongo. What happens when the company logo changes? Amazon.com becomes Nile.com? Google becomes Quadrillion?

We’ve got a problem. The shiny new logo needs to go with the shiny new site, caches be damned.

So even though the browser has the logo, it doesn’t know whether the image can be used. After all, the file may have changed on the server and there could be an updated version.

So why bother caching if we can’t be sure if the file is good? Luckily, there’s a few ways to fix this problem.

## Caching Method 1: Last-Modified

One fix is for the server to tell the browser what version of the file it is sending. A server can return a Last-modified date along with the file (let’s call it logo.png), like this:

Last-modified: Fri, 16 Mar 2007 04:00:25 GMT File Contents (could be an image, HTML, CSS, Javascript...)

Now the browser knows that the file it got (logo.png) was created on Mar 16 2007. The next time the browser needs logo.png, it can do a special check with the server:

1. Browser: Hey, give me logo.png, but only if it’s been modified since Mar 16, 2007.
2. Server: (Checking the modification date)
3. Server: Hey, you’re in luck! It was not modified since that date. You have the latest version.
4. Browser: Great! I’ll show the user the cached version.

Sending the short “Not Modified” message is a lot faster than needing to download the file again, especially for giant javascript or image files. Caching saves the day (err… the bandwidth).

## Caching Method 2: ETag

Comparing versions with the modification time generally works, but could lead to problems. What if the server’s clock was originally wrong and then got fixed? What if daylight savings time comes early and the server isn’t updated? The caches could be inaccurate.

ETags to the rescue. An ETag is a unique identifier given to every file. It’s like a hash or fingerprint: every file gets a unique fingerprint, and if you change the file (even by one byte), the fingerprint changes as well.

Instead of sending back the modification time, the server can send back the ETag (fingerprint):

ETag: ead145f File Contents (could be an image, HTML, CSS, Javascript...)

The ETag can be any string which uniquely identifies the file. The next time the browser needs logo.png, it can have a conversation like this:

1. Browser: Can I get logo.png, if nothing matches tag “ead145f”?
2. Server: (Checking fingerprint on logo.png)
3. Server: You’re in luck! The version here is “ead145f”. It was not modified.
4. Browser: Score! I’ll show the user my cached version.

Just like last-modifed, ETags solve the problem of comparing file versions, except that “if-none-match” is a bit harder to work into a sentence than “if-modified-since”. But that’s my problem, not yours. ETags work great.

## Caching Method 3: Expires

Caching a file and checking with the server is nice, except for one thing: we are still checking with the server. It’s like analyzing your milk every time you make cereal to see whether it’s safe to drink. Sure, it’s better than buying a new gallon each time, but it’s not exactly wonderful.

And how do we handle this milk situation? With an expiration date!

If we know when the milk (logo.png) expires, we keep using it until that date (and maybe a few days longer, if you’re a college student). As soon as it goes expires, we contact the server for a fresh copy, with a new expiration date. The header looks like this:

Expires: Tue, 20 Mar 2007 04:00:25 GMT File Contents (could be an image, HTML, CSS, Javascript...)

In the meantime, we avoid even talking to the server if we’re in the expiration period:

There isn’t a conversation here; the browser has a monologue.

1. Browser: Self, is it before the expiration date of Mar 20, 2007? (Assume it is).
2. Browser: Verily, I will show the user the cached version.

And that’s that. The web server didn’t have to do anything. The user sees the file instantly.

## Caching Method 4: Max-Age

Oh, we’re not done yet. Expires is great, but it has to be computed for every date. The max-age header lets us say “This file expires 1 week from today”, which is simpler than setting an explicit date.

Max-Age is measured in seconds. Here’s a few quick second conversions:

• 1 day in seconds = 86400
• 1 week in seconds = 604800
• 1 month in seconds = 2629000
• 1 year in seconds = 31536000 (effectively infinite on internet time)

## Bonus Header: Public and Private

The cache headers never cease. Sometimes a server needs to control when certain resources are cached.

• Cache-control: public means the cached version can be saved by proxies and other intermediate servers, where everyone can see it.
• Cache-control: private means the file is different for different users (such as their personal homepage). The user’s private browser can cache it, but not public proxies.
• Cache-control: no-cache means the file should not be cached. This is useful for things like search results where the URL appears the same but the content may change.

However, be wary that some cache directives only work on newer HTTP 1.1 browsers. If you are doing special caching of authenticated pages then read more about caching.

## Ok, I’m Sold: Enable Caching

First, make sure Apache has mod_headers and mod_expires enabled:

... list your current modules...
apachectl -t -D DUMP_MODULES

... enable headers and expires if not in the list above...
a2enmod expires


The general format for setting headers is

• File types to match
• Header / Expiration to set

A general tip: the less a resource changes (images, pdfs, etc.) the longer you should cache it. If it never changes (every version has a different URL) then cache it for as long as you can (i.e. a year)!

One technique: Have a loader file (index.html) which is not cached, but that knows the locations of the items which are cached permanently. The user will always get the loader file, but may have already cached the resources it points to.

The following config settings are based on the ones at AskApache.

Seconds Calculator

All the times are given in seconds (A0 = Access + 0 seconds).

ExpiresActive On
ExpiresDefault A0

# 1 YEAR - doesn't change often
<FilesMatch "\.(flv|ico|pdf|avi|mov|ppt|doc|mp3|wmv|wav)$"> ExpiresDefault A31536000 </FilesMatch> # 1 WEEK - possible to be changed, unlikely <FilesMatch "\.(jpg|jpeg|png|gif|swf)$">
ExpiresDefault A604800
</FilesMatch>

# 3 HOUR - core content, changes quickly
<FilesMatch "\.(txt|xml|js|css)$"> ExpiresDefault A10800 </FilesMatch>  Again, if you know certain content (like javascript) won’t be changing often, have “js” files expire after a week. Using max-age headers: # 1 YEAR <FilesMatch "\.(flv|ico|pdf|avi|mov|ppt|doc|mp3|wmv|wav)$">
</FilesMatch>

# 1 WEEK
<FilesMatch "\.(jpg|jpeg|png|gif|swf)$"> Header set Cache-Control "max-age=604800, public" </FilesMatch> # 3 HOUR <FilesMatch "\.(txt|xml|js|css)$">
</FilesMatch>

# NEVER CACHE - notice the extra directives
<FilesMatch "\.(html|htm|php|cgi|pl)\$">
Header set Cache-Control "max-age=0, private, no-store, no-cache, must-revalidate"
</FilesMatch>


## Final Step: Check Your Caching

To see whether your files are cached, do the following:

• Online: Examine your site in Redbot (You’ll see the headers returned, and a cache summary on the side)
• In Browser: Use FireBug or Live HTTP Headers to see the HTTP response (304 Not Modified, Cache-Control, etc.). In particular, I’ll load a page and use Live HTTP Headers to make sure no packets are being sent to load images, logos, and other cached files. If you press ctrl+refresh the browser will force a reload of all files.

Read more about caching, or the HTTP header fields. Caching doesn’t help with the initial download (that’s what gzip is for), but it makes the overall site experience much better.

Remember: Creating unique URLs is the simplest way to caching heaven. Have fun streamlining your site!

## Other Posts In This Series

Kalid Azad loves sharing Aha! moments. BetterExplained is dedicated to learning with intuition, not memorization, and is honored to serve 250k readers monthly.

Math, Better Explained is a highly-regarded Amazon bestseller. This 12-part book explains math essentials in a friendly, intuitive manner.

"If 6 stars were an option I'd give 6 stars." -- read more reviews

1. me says:

There is no such thing as an “If-None-Exist” header in HTTP; I think you mean “If-None-Match.”

2. Kalid says:

Whoops, my mistake. It should be fixed now.

3. Dave says:
4. This is an awesome, awesome guide. Thanks!

5. Thanks Julius, I’m happy you like it

6. I just recently implemented caching on my blog. It makes a lot of difference!

BTW, What do you use for your diagrams?

7. Kalid says:

Hi Carlo, I’m glad it worked for you! I use PowerPoint 2007 to make the diagrams.

8. Martin Stangeby Lunde says:

This is a great article, well written.

9. Kalid says:

Hi Martin, thanks for the comment, glad you liked it.

10. it’s helped me to know cache well. thanks!

11. Kalid says:

12. Dave says:

You’re an angel!
All your articles will be cached up there in the Heavens for the enjoyment of the Gods

13. Kalid says:

Thanks Dave, appreciate the warm comments!

14. Ponnusamy says:

Thanks a lot. I was searching to know about browser cache and then i finally got this. Once again Thanks a lot for your explaination.

15. daniel says:

ALRIGHT HERES MY PROBLEM.every time i try to go on youtube it says “400 bad request,your broswer sent a request that this server can not understand” youtube used to work…but now it ownt would should i do?and how do i fix it!should i learn how to control my broswer request?

16. Thanks, this is a great article! One question — is there any way of getting around the max-age=0 request header that Firefox sends upon an F5 refresh? It renders all my .htaccess goodness useless. Thanks!

17. @Karen: Good question, I didn’t realize Firefox did that on an F5. It seems the intent is to look like an entirely new visitor upon an F5 request (i.e. I don’t have any versions of any files), so I’m not sure if there’s a way around it.

18. Mark says:

Great summary of caching, very informative and readable.

Thx for the detailed info.
But where i write the caching rules?
In a .htaccess file or in the header of the site or somewhere else?

Sorry, this is very new for me.

Regards, Michael

20. jess says:

yes, I would also like to know where to place this info. I’m on an IIS server, so I know the rules are a little different. I am just having trouble understanding where to place, say, an http cache expires header for “logo.jpg”

This is an awesome tutorial, regardless– THANK YOU!

-Jess

21. Tony says:

22. Nicholas says:

Hi, great tutorial but examples are missing.

Could you please put up an example on how to set up all five cases. You have listed only two, so I am wondering how ETag needs to be set up.

23. Thanks for the information.

I am still confused about how you would go about removing the default caching that goes on for a particular item . . . I have a calendar that I would like not to be cached, because it only causes problems . . .

24. Kalid says:

@Evan: You can specify that certain resources not be cached using Cache-control: no-cache.

25. Jayjayjay says:

Excellent work! This doc saved my day after trying to figure these same things out from the HTTP-RFC.

26. Kalid says:

27. Tasneem says:

This is a great article . Short , precise and well explained !

28. There is a new site for testing out caching to make sure it is working. it is http://redbot.org.

29. federico says:

good tutorial, but…

why don’t you use cache?

30. ujjwal says:

well explained !

31. Very good explanation. I’m using all caching methods in my framework RhinOS. From PHP, I use the header call and this allow to RhinOS to determine the correct headers to sent in each content case. Josep Sanz.

32. Thomas Powell says:

If you unfortunately find yourself on IIS doing this is kind of a bear on most versions. You have to go into the MMC and set each folder headers directly. Try CacheRight (http://cacheright.com) it can make this a lot easier or of course you could install Apache and put a proxy up if that is allowed. BTW this article from Mark Nottingham http://www.mnot.net/cache_docs/ has probably the best wide reaching overview of caching across platforms, HTTP, server, etc. if this article interested you. There is were redbot comes from I believe.

33. kalid says:

@Thomas: Thanks for the info! I don’t know much about configuring IIS.

34. I have this code on my site:

It doesn’t seem to be caching anything. Any help would be appreciated.

35. *** the code don’t seem to be showing ***

I have this code on my site:

It doesn’t seem to be caching anything. Any help would be appreciated.

36. Ranjith says:

Good one!!!

37. kalid says:

@Ranjith: Thanks!

38. Nice article! I setup Varnish cache on my server and was amazed to find that it even served static files significantly faster than Apache. Http caching is definitely worth it!

39. kalid says:

@Daniel: Glad it worked — varnish is a good idea as well.

40. David says:

I love cache expires as the request dosn’t even hit the server. Thank you!

41. kalid says:

@David: Exactly! That’s the best type

42. D says:

Very good article. You may want to mention that in schemes where the browser never queries the server, that it could still display stale data. Even if the server tells the browser that something won’t expire for two weeks, sometimes servers are wrong and the resource changes before the expiration date it published but the browser won’t bother asking for it for the full two weeks.

43. kalid says:

@D: Great point — yes, if there’s a chance you need to expire early, it’s best to use unique URLs (i.e. just change the reference in your raw HTML).

44. If there is any solution which stores cache between web server and user browser. As many user clear cache at the time of exiting, this is not useful but something like the one I am suggesting can reduce the load multifold.

45. I have this code on my site

46. RE: Final Step: Check Your Caching
quoted web address: http://www.surfnetters.nl/cgi-bin/cacheability.py for cacheability query is now outdated. “Use The Engine is no longer maintained, and has been replaced by a new tool, RED:” http://redbot.org/
You need to copy the url address of a specific resource e.g. a png file to check the status of caching on your server.

47. kalid says:

Thanks Nick, I’ll update the link.

48. henk says:

server error 500 with ExpiresActive On

49. Anonymous says:

Very Good Article. Thanks for posting

50. Popat Kharat says:

Good One !!!

51. Hello! I just wanted to ask if you ever have any problems with hackers?
My last blog (wordpress) was hacked and I ended up losing many months of hard work due to no data backup.

Do you have any methods to prevent hackers?|

52. sam says:

Does server automatically checks whether the file is changed? or should we write seperate code to compare the header “If-Modified-Since” and file modified date?

We are using iis, do we have to do something in the iis for comparing the file changes and then return the 304?

53. Nandan Kulkarni says:

hey, thanks a lot for this very simple and elegant introduction on caching!

54. This time I’ve been using w3 total cache, but the results of my tests showed F, what should I do

I still have to use the above code?