What are single points of failures? Consider a computer with one hard disk
. If that hard disk fails, you lose all your data. That's a SPOF. Let's put two disk in your computer and mirror
them (ie both disks contain the same data). Now, if one disk fails, you won't lose your data - there you go, your system is already more stable
. However, what about your disk controller
? If you only have one, then you've got another SPOF. So now you have to add a second disk controller. What about your power supply
? You'll need multiple power supplies if you every want to get close to achieving zero downtime. What about your network connection? Better have multiple network card
s in your server and make sure that the cards are connected to separate subnet
s so that if a router fails on one, it won't affect both your cards etc etc
Single points of failure are basically areas where if they failed would cause you to experience downtime and/or data loss.
As you can see, trying to achieve zero downtime on a system is very difficult (and expensive) to achieve. I work in an environment where zero downtime is the goal. Our servers have multiple mirrored disks, multiple disk controllers, multiple power supplies, multiple network cards, multiple network routes, the buildings have multiple network connections entering the building from opposite ends and to top it all off, we have fail-over systems so that if one of the main systems dies, all network traffic is automatically switched to the fail-over system with no impact to the client.
Naturally, all this costs a stack of money to achieve, but when downtime can result in up to $50 million in lost transactions, this kind of redundancy is cheap in the long run.