When you are working on services that need to scale to millions of users you typically come to the conclusion that you will never be able to start a debugger on one of your live services. Instead you need instrumentation (also known as logging, tracing or diagnostics) to make sure you can figure out what went wrong. What I see happening a lot is that developers start logging the raw HTTP request to capture all data. And there are several problems with this approach...
First of all if you log all requests as they come in in case they go wrong you are probably wasting a bunch of space in your instrumentation back-end. You're in luck since you can fix this by only logging failed requests.
Second problem is that the raw data probably contains something you don't want in logs such as user tokens that somebody could steal and then pretend to be another user. Remember that one of the most common attackers come from within... You're in luck since you can fix this by removing or encrypting all sensitive data.
Third problem is the slow client attack where an evil client sends you a request but very slowly; maybe one byte every few seconds. Since you insist on reading all data for your logs your service grinds to a halt. You're in luck since you can fix this by never reading a complete request if the client is too slow.
Your fourth problem is that since you logged the whole request you probably don't bother logging much other context you might have since you can always use the logged request to reproduce the problem. Hence your logs take some smart jobs to analyze all those logged requests. You're in luck since you can fix this by hiring good developers who always log details about what went wrong.
Wait... If you have good developers who log context about what went wrong - why do you need the raw request? Your answer to that is probably "so we can figure out what went wrong for things we didn't log context about". Sorry, but I think you are wrong.
I think we can categorize errors into two categories; things were we know what went wrong and things where we don't know. In all cases where we know what went wrong, why would we ever want to log the raw request over logging exactly what we know? Who did the request, what data did we not like and what went wrong.
And when we don't know what went wrong I'm pretty sure there is a client out there who is furious about all those failed requests. I'm certain they will be interested in working with you and provide an example of request that fails and there is your request to reproduce that unknown problem!
So here is how I think; since logging raw requests is always a risk I prefer to not even take that risk. Especially since that also encourages better instrumentation that actually logs the context of what went wrong. And the fallback for those things you didn't think about is always the clients. Because if there is some error happening that clients don't care about - why should you? You might if the error is causing a DoS attack which is why you always want good behaving clients to tell you who they are (and you log this) so you can track them down.