Today, I’m going to start things off with a simple question: how fault tolerant are your servers?
Can they handle themselves if part of the system stops working?
Will they be able to weather through a software glitch or hardware failure?
Those are more important questions than you might think.
If the systems you have in place for your website, application platform, or network aren’t properly hardened against failure, then everything’s likely to come crashing down the moment one component in the system stops working.
There’s even a term for this sort of thing – cascading failure.
Unfortunately, as we demonstrated in a piece on fault tolerance a few weeks back, building a system that’s well and truly fault-tolerant is a significant challenge – particularly when you’re managing a complex application infrastructure, such as the Netflix API. So what’s a beleaguered server administrator to do, then?
How can you ensure your servers – and the network on which they’re hosted – gracefully handle failure? Continue Reading →