« Sharing Access to Digital Photos | Main | An Unhealthy Obsession With Coffee (Makers) »

February 5, 2008

The Ongoing Saga of Flaky Hardware

Well, after being stable for a couple weeks, my home machine started acting up again, disproving my theory that the drive trouble I was having were power-supply related.

The basic symptom is that, at seemingly random times, one of the two drives in my machine will "vanish" out from under the OS. If both physical drives are connected to the same SATA controller, the machine typically locks up at this point, as the SATA bus gets hard reset and neither drive comes online after the reset.

Since my home directory is NFS-mounted from my ReadyNAS, and the root file system is mirrored across the two disks, as long as the drives are split across the two SATA controllers on my motherboard, this is just annoying, as Linux notices the drive vanishing, and removes that drive from the mirror. Rebooting brings the drive back, and once I re-add it to the mirror sets, the mirror rebuilds correctly, and I'm fine, 'till the next time it happens.

I started keeping a more careful record of which drive "fails," and over the last week, the Maxtor has failed just over twice as often as the Seagate. Both drives give themselves clean bills of health when subjected to SMART diagnostics.

It seems highly unlikely that two drives from two different manufacturers, and batches, would start failing in the same mysterious way at the same time.

Since they're on different SATA controllers, there are only a couple common points left. It's either some fundamental motherboard problem, or it's software.

I'm hoping for the latter, of course. My first experiment is to disable I/O APIC.

I keep going back and forth about spending money on my personal machine. The guts of my machine are getting on in years, and several components have been through a couple upgrade cycles already. It's a Socket 939 AMD board with AGP graphics (hence an AGP Nvidia card), and uses relatively slow RAM by today's standards (DDR 400) - so it's at the "do-over" point.

I don't use my home machine for all that much these days. I have a work notebook that's got just as much CPU grunt as my desktop and better graphics performance, so when I very occasionally get the urge to play a game, I use the notebook (which has the added "advantage" of running windows for work, so no dual/reboot necessary).

Were I not having this instability, I wouldn't even be thinking about an upgrade, but I don't think I can have no machine, and even though the real impact of the current failure mode is minimal, it's seriously getting on my nerves.

I could rebuild the machine - make it bigger, better, faster - for about $700 in parts.

Or I could put that money toward the DSLR I want.

Or that money could go to the roof we're going to have to put on the house.

Or the kitchen remodel.

I just hate spending money on computers; which may strike you as ironic, given what I do for a living.

Posted by dberger at February 5, 2008 10:09 AM