2014-03-27

Protecting your resource identifiers

In a lot of older web applications as well as newer RESTful services I've seen it fairly common to have identifiers such as user and order IDs just being an auto incremented primary key from the database. This approach has the problem of revealing your total number of users/orders in a certain time period. Information your competitors could use against you!

In order to protect your application from exposing your true number of users and orders it is very common to use unique identifiers (aka GUIDs). One downside with those are that they are long and hard for a human to remember or retype if needed. A pretty elegant solution is described in this article where he generates a table of random identifiers that he then just picks from when needed.

There are however a few problems with that approach in my opinion. First of all you need to remember to add more numbers to the table when needed. Since this is likely to be a rare event I think there is a big risk you forget to do it or the automatic job to do it fails. And the person who knows why you do this is gone when it is needed. Kind of what often happens when it is time to renew certificates...

I also think there is a problem in that the random identifier is likely going to be primary key on your data table. There is a risk that you make your primary key clustered and now any inserts using a random key will be relatively expensive compared to using an incremental key that just adds data to the end of your table. But this property could also be good in case you have partitioned your data and/or using a NoSQL type storage.

I think there is another interesting alternative here. If you are actually using a relational database instead of keeping a table of random keys to use, you could keep a table of random key to incremental key mappings. A simple way is a database with two columns; unique identifier and integer (assuming your incremental keys are integers). You could even let your userID = 42 and orderID=42 share the same unique identifier if you want... This way only your public APIs would deal with the random key while everything internally would use the incremental one. Another good approach here is that your key does not need to be a GUID nor a number. If you are creative your could let your "random key" actually be words or series of letters.

The lookup table approach is however a bad idea in a NoSQL situation since "joins" are typically expensive there. But then in a NoSQL scenario the table of random keys to use is also a bottle neck so for large distributed storage scenarios I would just stick with GUIDs anyway...

Do you have another good way to solve this problem?

No comments:

Post a Comment