I don't remember where I first heard about the circuit breaker pattern but I had seen it a few times in a few different forms under different names. Over time I have come to believe that the original approach described to me might not be the most appropriate.
When I was reading this book I stumbled over it again. The way it is described in the book is the same way it was first explained to me: Once you see a certain number of failures in a time period, open the circuit quickly failing any further requests for a period of time and then try single requests (while failing all other) until deciding that circuit breaker can be closed and let all traffic through.
When talking about circuit breakers the states used are often referred to as CLOSED (all traffic flows through), OPEN (all traffic is blocked and failing) and HALF-OPEN (after being OPEN the circuit breaker evaluates if it should be OPEN or CLOSED meaning that a few requests are flowing through the circuit breaker). I find these terms a little confusing and like to call them OK (CLOSED), FAILING (OPEN) and RECOVERING (HALF-OPEN).
Turns out however there are a number of different algorithms you can use in your circuit breaker and they all have different pros and cons. But when you choose which type of circuit breaker to use you need to decide if user experience or service resources are the most important to consider. You also need to consider which type of resource you are accessing through the circuit breaker.
For example if your resource is partitioned a large amount of failing may all be against the same partition meaning all other partitions work just fine. Do you still want to fail all requests then?
Future posts will cover the pros and cons of different circuit breakers based on consecutive failures, failures in a time window, failure rate, concurrency and rate. It all ends with a summary.