Stylesheet issues and how it affects web caching (an edev work)
It would be nice to cut down the hits to the site... every time a noder with the ekw theme hits a page it goes out again and makes another request for another node. By making this static or otherwise cache-able, we can reduce the hits by 50%! (or there abouts)
Go ahead... right click and 'View Page Source', or whatever it is on
See that? thats the style sheet for the ekw theme. It also happens
to cause a number of problems with caching of web pages. Unfortunately,
most solutions have their own bag of problems too.
Many caches (Squid and Apache for example) and browsers use some
logic to try to determine if the page is cache-able. Squid (and
many browsers) with a
default setting refuses to cache anything with a '?' or 'cgi-bin' in
the GET request. Period. The style sheet above shows this.
For most themes, the style sheet is the same for all users and could
(in theory) be moved to a static file. However, the ekw theme
has a dynamic style sheet that allows you to tweak various portions of
it and thus makes it impossible to move this to a static file.
So, one way to make a page cache-able is to remove things that mention
cgi-bin (not a problem here) and '?'. This can be done by using
the extra path information href="/node/id/raw/1100984" instead
of the query string.
However, moving to a cache-able page causes other problems where
the cache is shared by multiple E2 users. This has been seen before
when the URL is http://www.everything2.com/ users
have occasionally reported getting someone else's front page (and Inbox!).
If the ekw style sheet was moved to this, two individuals with the
ekw theme would get each other's style sheet - not a good thing.
A possible solution to this would be to have the personid tacked on
to every request. Now, the url's are different:
href="/node/id/raw/1100984/774422" (774422 is my home node node_id).
Well, now the page is cache-able - this doesn't make our problems go away.
If the page is modified, the cached version still exists locally and I
have to do a shift-reload (reload everything ignoring cache) to force
it to update. This isn't such a bad idea for style sheets which rarely
change, would this work with 'normal' nodes?
Ok, its a bit ungainly - and ugly. Does this really solve all our problems?
The answer, is no. This page is now cached, along with all the nodelets
on the side. So, now my Inbox is cached on the cache with its privacy
issues again. Furthermore, if I request this page again, I get the
same chatterbox and Inbox as before - even if it was old data. We
need to add some more headers to every page.
Well, Apache web cache refuses to cache anything without the Last-Modified
header, so presumably, we've added that. This header is dependant upon
the state of all the nodelets - if the chatterbox changes, this header
changes (though, not for the style sheet - that only changes when you
change your ekw settings). There is also the Expires time that should
be set - how long should this page remain cache-able? Icky icky icky.
In any project to address caching of pages, these issues should be
looked at, along with the important issue of privacy. Ultimately,
pages should only be cache-able locally on a machine - if they are
cache-able at all.
For more information on web caches and interaction with cgi programs, see http://vancouver-webpages.com/CacheNow/detail.html