During a training I was teaching recently we were talking about retrospectives and different ways to make them interesting. Afterwards one of the students came forward and suggested something interesting.
When you are working on services that need to scale to millions of users you typically come to the conclusion that you will never be able to start a debugger on one of your live services. Instead you need instrumentation (also known as logging, tracing or diagnostics) to make sure you can figure out what went wrong. What I see happening a lot is that developers start logging the raw HTTP request to capture all data. And there are several problems with this approach...
So when a web service is getting too much traffic it starts returning the 503 status code. Well written services also return the Retry-After header hinting the client when it should come back again. Good behaving clients then respect that or will back-off by themselves to make sure the server is not getting too much traffic. However this is not enough if there are bad behaving clients in the mix. And how do you identify the bad behaving clients?