Web caching is hard
Web caching is hard. And also, maybe I’m not that good under pressure? In any event, I made the following mistakes while trying to debug a web site using our nginx cache that bit the dust under heavy load today:
Action: I ran
curl -I https://website.org/ and it hung.
Wrong assumption: Something is wrong with nginx. Why else would it just hang?
Reconsidered conclusion: The resource (the home page) is a MISS, so nginx has to retrieve it from the origin, but the origin is over-loaded and timing out, so my request is also timing out. Maybe something is wrong with the nginx caching configuration since the home page really should be a HIT… but that’s another problem.
Action: I changed the configuration from our normal caching set of
directives to our aggressive caching set of directives, reloaded nginx and
curl -I https://website.org/ still hung.
Wrong assumption: aggressive caching isn’t working and I need a different configuration.
Reconsidered conclusion: The home page still has failed to be loaded from the origin, so every request for it is going to be a MISS, and is going to hang, until nginx is able to fill the cache with it. The configuration change might be the right change; we just need the origin to calm down before we will know.
Action: I restarted PHP on the origin to free up PHP processes so my home
page request can fill the cache … and still
curl -I https://website.org/
Wrong assumption: WTF! The world is ending!
Reconsidered conclusion: The regular traffic which is accessing other pages (not the home page) consumed all the available PHP processes on the origin before my request for the home page could complete, so nginx is still unable to fill the cache with the home page.
Action: Once we got things under control, I changed the caching level from
aggressive back down to normal. I ran
curl -I https://website.org/ and it was
HIT’ing. I concluded that we don’t need the aggressive cache after all. Got
some coffee, came back later and ran it again and it consistently showed MISS.
Wrong assumption: What?!? Did something change on the origin to stop the cache from working??
Reconsidered conclusion: The aggressive cache set the cache for 5 minutes. Even after changing to normal caching, the home page was still cached so it was served from the cache. After 5 minutes, the cache expired. Now, the normal cache setting are in play to determine whether the request would be cached or not. In other words, you have to wait for the cache to expire (or bust the cache) before you can effectively know if the new cache settings are working.