Bit less loquacious and more to the point this month. Most of the buzz, of course, is about the hardware gremlins that decided to have a bit of fun with us.

Down but not out

As you doubtlessly noticed, we've had a lot of ups and downs, emphasis on the down, this month. Most people who want to know have already been informed about the basic nature of the problem but here it is for the permanent record:

We lost several pieces of equipment due to an overheating problem in the room in which our servers are housed at MSU. I should have expected stuff to shut down before heat damage was possible but that evidently did not happen. On the afternoon of Wednesday July 8, Kurt powered down the servers in order to replace a faulty battery unit. Two of the Really Critical servers were non-functional when they were powered up.

We do not have much in terms of spare hardware but we did have the capacity to replace one machine. We did not have the capacity to deal with a double hit on short notice (even less so now). We also do not have the staff to handle major failures swiftly. Data recovery and backup restoration tend to take a handsome amount of time to run. It tooks us about 2.5 days to borrow a pair of machines and configure them into a working setup. The loaners came with Fedora 11 installed, which is different in significant ways from our previous Debian and current Ubuntu Linux setups (and which, quite frankly, we don't like working with and some of us had successfully avoided and not touched in years). We were under orders to be non-destructive with the loaners so we had somewhat less of a choice in our installed software and it took us so much longer to come up with a quasi-stable configuration.

On the 13th we apparently lost a third machine. Its status remains unknown as of the time of this writing. Because the code and content were being sourced off that machine, this took out everything else for an hour or so. Your homenode images might have reverted to older versions. You'll understand that fixing this is not a high priority so anyone in a hurry should re-upload their homenode picture. The moral of the story is that this is not our month and Single Points of Failure don't like us.

More will doubtlessly come in this month's root log. This, too, may be delayed as it's usually Swap who writes them and he's laid up with a bum hand. He did a lot of the work on the temporary servers, working in what almost became around the clock shifts with Nate and me, and knows what was done in detail. I saw Oolong with a shovel, too, and some of the edevites provided advice and know-how from the sidelines.

The IRC channel

Since the forum was idled, we've had an IRC channel which is used mostly for tech talk and which was drafted into service during the outage. This is #everything2 on irc.freenode.net (port 7000). I've retooled the 500 and 503 error messages to include a link to Freenode's web interface so people can join using a web browser. The Word Galaxy also has a link but you should not expect to see the Word Galaxy unless the situation is really bad. The IRC channel was a success in that we finally got a working link between the tech staff and the noding public during an outage. Or at least one that's more real-time and better publicised than Livejournal. Anyone is welcome to pop in at any time but keep in mind that tech talk is at the top of the agenda.

Revoting

Revoting was to be implemented this month. It will be GP-neutral but not XP-neutral. This is another thing that's taking a back seat to our hardware and stability problems.

Acting director

Quite a few admins will be unavailable for some time later this month. This includes me and two of my designated deputies. I'm moving house starting the 15th. Seeing that grundoon and The Debutante also have plans, Oolong will be in charge and flying solo for a week or so. And someone said having three people in the "line of succession" was too many.

Anyway. Peace, love, and shellfish.