Yikes, this has been a bad week for the Cyborganic
servers. Monday night they crashed and I had to go over there yesterday morning and see what was up. Xanadu, the mail server and web staging server, had hung on some scsi
disk errors. yuck.
At first it wouldn't reboot, but after letting it sit for 10 minutes or so it came back up. Maybe it's a heat thing? Maybe. It does get pretty stuffy in that little room. There are 6 servers in that rack, and not very good ventilation. It's supposed to be a damn laundry room, dammit.
Erehwon, the live web server, had crashed too, but I think this was only because it has an NFS mount of a directory on Xanadu. It came back up with no problem, other than the usual annoying long fsck.
(Right about while this was happening, I later found out today, the guys from Slashdot were dealing with a strange outage of their servers. There's an excellent blow-by-blow account of their trials and tribulations on their site now.)
Anyway, so the machines came back up, but then later, last night, they crashed again. Repeat process this morning. Luckily I only live about a block away from jonathan's, where the servers are. Also swapped out a flakey hub while i was at it, which has been causing annoying outages for months.
Sure enough, this afternoon they go down again. I just want someone else to deal with this. People are calling me, emailing me, but there are like 8 other admins. Why can't it be someone else's turn? I admit that in a bout of selfish escapism I went and got a latte and read a book for a half an hour before finally giving in and dealing with the servers again. I'm not being paid for this crap. Let them all wait. Anyway, this time I figured out which SCSI disk was the problem (the one that has /home on it, unfortunately),and I rebooted without mounting that one. Which means users don't have their home directories, or their staging servers, but at least they can get email, and named is running. And the live web server is running fine, except for some CGI scripts which use the NFS mount. guh. so complicated. But things are limping along at least. people have their email at least. that's the important thing. uh, except they have to use POP to get their email, no shell readers like pine. grrr.
now Mitra has the nerve to complain. his CGIs aren't working. he's another cybo admin. he's in australia, though, so he has an excuse, i guess. no recent word from any more local admins. dammit.
stefan, in new york, just reported that he was the one who bought the aforementioned dying disk, and he thinks it was less than a year ago! So maybe it's still under warranty! They should pay me for my time too....
ah, Unix... as my friend Mykle once said, "Unix is like a rubic's cube, you feel so smart for figuring
it out, but you still aren't getting any work done." Actually, to be fair, disks fail no matter what the OS. But I just had to use that quote. :-)