When you are working on services that need to scale to millions of users you typically come to the conclusion that you will never be able to start a debugger on one of your live services. Instead you need instrumentation (also known as logging, tracing or diagnostics) to make sure you can figure out what went wrong. What I see happening a lot is that developers start logging the raw HTTP request to capture all data. And there are several problems with this approach...
First of all if you log all requests as they come in in case they go wrong you are probably wasting a bunch of space in your instrumentation back-end. You're in luck since you can fix this by only logging failed requests.
Second problem is that the raw data probably contains something you don't want in logs such as user tokens that somebody could steal and then pretend to be another user. Remember that one of the most common attackers come from within... You're in luck since you can fix this by removing or encrypting all sensitive data.
Third problem is the slow client attack where an evil client sends you a request but very slowly; maybe one byte every few seconds. Since you insist on reading all data for your logs your service grinds to a halt. You're in luck since you can fix this by never reading a complete request if the client is too slow.
Your fourth problem is that since you logged the whole request you probably don't bother logging much other context you might have since you can always use the logged request to reproduce the problem. Hence your logs take some smart jobs to analyze all those logged requests. You're in luck since you can fix this by hiring good developers who always log details about what went wrong.
Wait... If you have good developers who log context about what went wrong - why do you need the raw request? Your answer to that is probably "so we can figure out what went wrong for things we didn't log context about". Sorry, but I think you are wrong.
I think we can categorize errors into two categories; things were we know what went wrong and things where we don't know. In all cases where we know what went wrong, why would we ever want to log the raw request over logging exactly what we know? Who did the request, what data did we not like and what went wrong.
And when we don't know what went wrong I'm pretty sure there is a client out there who is furious about all those failed requests. I'm certain they will be interested in working with you and provide an example of request that fails and there is your request to reproduce that unknown problem!
So here is how I think; since logging raw requests is always a risk I prefer to not even take that risk. Especially since that also encourages better instrumentation that actually logs the context of what went wrong. And the fallback for those things you didn't think about is always the clients. Because if there is some error happening that clients don't care about - why should you? You might if the error is causing a DoS attack which is why you always want good behaving clients to tell you who they are (and you log this) so you can track them down.
I always recommend that developers follow the HTTP guidelines and use the Authorization header for security information. That makes to easy for HTTP logging mechanism to strip that information off before storing.ReplyDelete
One of the primary guidelines of HTTP is that requests/responses should be self-descriptive and stateless. If you are able to follow this advice then "context" information is rarely needed to debug issues.
DOS attacks due to slow clients is always a concern, but it is a concern regardless of whether you are logging full requests or not. Your service is still going to need to read the full payload. Using commercial front end API gateways that have been hardened against these types of attacks is one way to mitigate the problem.
Unfortunately, the majority of errors that are going to be problematic are going to be ones that you didn't expect. If you did expect them, then could have coded defensively against them. It is hard to explicitly collect context for problems that you don't know you are going to have.
@Darrel: I think you are missing my point. I do agree that if everybody did everything right there would never be a problem but that is rarely the case. I even think you point that out in you're last sentence.Delete
You are also right that in a perfect world the response will contain all information needed so logging a response should be both good enough and safe (as you have control over the response). But sometimes that's a bad idea if the "error" is security related.
I think however you are wrong in that my service needs to read the full payload. That is only a property of (in my opinion) inferior frameworks for services. But yes commercial API gateways can buy yourself out of that trouble.
back to the point I'm trying to make; as people forget to do the right thing, try to minimize the danger they can inflict when doing so by advocating patterns that are less likely to end up causing you a lot of trouble.