Highly available systems use component redundancy to simulate a very small Mean Time To Repair (MTRR). This often means giving the appearance of a system that does crash, but reboots _really quickly_. This is distinct from a fault tolerant system, which instead tries to simulate a very large Mean Time Between Failures (MTBF) and mask faults entirely.

High Availability is most often defined as a system that is free of Single Points Of Failure (often shortened to SPOFs). Thus, any component failing will result in system slowdown, but not in full loss of service. In practice, if an entire node in a High Availability cluster fails, there will be a brief loss of service (e.g, active TCP connections will be aborted), and a slowdown while the caches are being warmed.

In older texts, especially marketing material, you will see Fault Tolerant and Highly Available used interchangably. Nowadays, Fault Tolerant is most commonly used to describe a system where the hardware let any component fail without having any impact on the software.

Note that a highly available system may be more attractive than a fault tolerant system, as the former is usually also resistant to many forms of software fault.

Log in or registerto write something here or to contact authors.