Have you ever spent an hour or two or ten leisurely chatting away in the chatterbox and suddenly had an overpowering desire to keep a log of the conversation? Or have you remembered something someone said in the chatterbox several days ago, and wanted to find the exact quote, only it was long since lost in the ether?

Worry no more!

Thanks to the Chatterbox XML ticker, I've managed to toss together a collection of PHP scripts that archive every single word uttered in the chatterbox and put them in an indexed, fully searchable database for your perusal and enjoyment. This thing still has bugs to be worked out and features to be added, so be patient, but it works for the most part. You can find the E2 Chatterbox Archive at:

http://ascorbic.net/catbox/

Update (4/2/02): ascorbic is now hosting and developing the catbox archive, since I don't have time anymore. Yay for ascorbic!

Update (5/31/01): There is now a table listing interesting chatterbox statistics. Most borged user, most talkative, and even most foul-mouthed. Enjoy!

Update (5/28/01): The chatterbox archive now includes a popup chatterbox window that will let you chat in real-time (like the Java chatterbox, only without Java). This only works on IE and Mozilla and requires that you have a login cookie set at everything2.com, but it's a very convenient way to chat without having to keep refreshing the entire E2 window.


Technical Details

The webserver is Apache running on Windows 2000. Every 15 seconds, a PHP script grabs the contents of the chatterbox XML ticker and parses it, then dumps it into a MySQL database. After the database is updated with new chatter, the script generates a static HTML file named 'main.html' containing the most recent messages.

Users can choose to use the popup E2 Catbox window as a replacement for the standard chatterbox interface here on Everything2. The popup catbox refreshes main.html every 15 seconds, keeping you constantly up-to-date on the latest happenings in the chatterbox. You can also send messages to the chatterbox from the popup window if you're using IE or Mozilla and have a cookie set at everything2.com.

Messages are kept in the archive for exactly seven days. Messages older than seven days are deleted. This is not so much to save hard drive space (even 20,000 chatterbox messages don't take up a lot of space), but rather to keep the size of the tables to a manageable level and keep from bogging down the database more than necessary.

Every 10 minutes, a PHP script is run which analyzes the archived messages for statistical information. This script generates a static HTML include file which you can see on the main page of the archive. Statistical analysis is done every ten minutes (rather than on-demand as users load the page) because it's very database intensive and the script takes around 15-20 seconds to run, which is unacceptable when loading a webpage.

In my random travels, I have just now stumbled across a node from earlier this year by chkno called Chatterbox Logger, in which he published a shell script for logging chatterbox conversations. I hadn't known of this before, but it appears he stumbled onto the idea before me. Credit where credit is due.

ascorbic.net/catbox

What is the chatterbox archive?

Simply put, it is a fully searchable database of the enlightening conversations that happen in the chatterbox. There is also a page of vaguely interesting statistics about the messages in the archive.

How does it work?

The archive is written in PHP, with a MySQL database as the backend. It's been completely rewritten since I took it over from wonko.

Once a minute, a cron job calls a PHP script that grabs The Universal Message XML ticker from E2. This contains the latest things said in the chatterbox. Using PHP's DOM XML parser, this is broken up and then inserted into the database.

Every hour the statistics script is run. This runs a series of queries on the database to generate those fascinating stats. These are saved to a static file that is included on the front page.

There is a full text index of the messages in the archive, so you can type natural language queries into the search box and all the matching messages appear, sorted by date. You can also search by the name of the user who spoke, or restrict it further by entering both.

If you join a conversation in the middle, and want to catch up, you can always see the latest messages in the archive at http://ascorbic.net/catbox/latest.

In order to make the XML acceptable to the parser, the messages are stored with hard links intact. They are converted to real links to E2 when they are displayed. You can see a variant of the function I use to do this here. Feel free to add it to your own site, for E2-style hard links and pipe links.

The witty little chatterbox topics are also now tracked, and appear above each page in the archive. Topic changes are highlighted in the archive. Topic changes are also stored as messages in the database, with the special user name @topic. Search for that name and you can see all the topics. Magic!

What is it running on?

Since I took over the running of the archive in April 2002, it has moved between a few machines. Firstly it was on my company's server in Telehouse: a 900MHz Athlon, I think, running Red Hat 7.2. When I left that job at the beginning of June 2002, I put it on the only spare computer I had at the time: a rev-A bondi-blue iBook, running Mac OS X, and stuck it at the end of my cable modem. Running it from an old laptop over 802.11 was sketchy as hell, and my flatmates kept on putting it to sleep or shutting it, so I dug out an ancient IBM P90, with 16MB of very tempramental RAM and a 500MB HD, and put Debian on it. The hardware was useless, but it almost did the job and was the best there was available at the time. When I moved I gave up on that, so it then moved to a shared server provided by phpwebhosting.com, who seemed good enough. It was a dual 1.4GHz PIII, with a gig of RAM and 64GB SCSI RAID, running Red Hat. They became crappy, so I moved to a virtual private host, so I have root again. It's pretty slow, mainly due to the lack of memory. It's running Debian stable. It was using turck-mmcache to cache the pages and database results, but that stopped being maintained, so it won't work with newer apache. I plan to move moved it to memcached. My next trick will be was to separate the old (i.e. no longer on the first page) messages from the latest. Old messages are stored as page-sized blocks using memcached (packaged for Debian by our very own jaybonci). In September 2007 I dumped the virtual private server when they shut me down without warning for alleged late payment (they didn't even sent me a bill). After paying the bill, I decided I'd had enough of them, so moved back to my company's servers. It's now on our old-ish dual Xeon server, which is sitting in a colo facility at the top of Merchant Venturers Building in Bristol. First time in years that it's on a machine that I've actually seen.

So, happy chatting. You are being logged (hopefully).

Log in or registerto write something here or to contact authors.