Stylesheet issues and how it affects web caching (an edev work)

It would be nice to cut down the hits to the site... every time a noder with the ekw theme hits a page it goes out again and makes another request for another node. By making this static or otherwise cache-able, we can reduce the hits by 50%! (or there abouts)

Go ahead... right click and 'View Page Source', or whatever it is on your browser.

See that? thats the style sheet for the ekw theme. It also happens to cause a number of problems with caching of web pages. Unfortunately, most solutions have their own bag of problems too.

Many caches (Squid and Apache for example) and browsers use some logic to try to determine if the page is cache-able. Squid (and many browsers) with a default setting refuses to cache anything with a '?' or 'cgi-bin' in the GET request. Period. The style sheet above shows this.

For most themes, the style sheet is the same for all users and could (in theory) be moved to a static file. However, the ekw theme has a dynamic style sheet that allows you to tweak various portions of it and thus makes it impossible to move this to a static file.

So, one way to make a page cache-able is to remove things that mention cgi-bin (not a problem here) and '?'. This can be done by using the extra path information href="/node/id/raw/1100984" instead of the query string.

However, moving to a cache-able page causes other problems where the cache is shared by multiple E2 users. This has been seen before when the URL is http://www.everything2.com/ users have occasionally reported getting someone else's front page (and Inbox!).

If the ekw style sheet was moved to this, two individuals with the ekw theme would get each other's style sheet - not a good thing.

A possible solution to this would be to have the personid tacked on to every request. Now, the url's are different: href="/node/id/raw/1100984/774422" (774422 is my home node node_id).

Well, now the page is cache-able - this doesn't make our problems go away. If the page is modified, the cached version still exists locally and I have to do a shift-reload (reload everything ignoring cache) to force it to update. This isn't such a bad idea for style sheets which rarely change, would this work with 'normal' nodes?

http://everything2.com/index.pl?node_id=1192823&lastnode_id=744222
becomes
http://everything2.com/node/id/1192823/lastnode/744222/me/744222

Ok, its a bit ungainly - and ugly. Does this really solve all our problems? The answer, is no. This page is now cached, along with all the nodelets on the side. So, now my Inbox is cached on the cache with its privacy issues again. Furthermore, if I request this page again, I get the same chatterbox and Inbox as before - even if it was old data. We need to add some more headers to every page.

Well, Apache web cache refuses to cache anything without the Last-Modified header, so presumably, we've added that. This header is dependant upon the state of all the nodelets - if the chatterbox changes, this header changes (though, not for the style sheet - that only changes when you change your ekw settings). There is also the Expires time that should be set - how long should this page remain cache-able? Icky icky icky.

In any project to address caching of pages, these issues should be looked at, along with the important issue of privacy. Ultimately, pages should only be cache-able locally on a machine - if they are cache-able at all.

For more information on web caches and interaction with cgi programs, see http://vancouver-webpages.com/CacheNow/detail.html