When it comes to auto scaling I tend to see one of two approaches. But neither is actually good.
The most common approach is to look at some performance counter and when it is outside a comfort zone you either add or remove instances of your service. This is a very simple approach but it does not take any historical data into account and will not be fast enough if you have a big spike (or dip) in needs.
The other approach I often see uses historical data from yesterday or last week or even the last month and tries to predict needs based on previous usage patterns. This is great to estimate needs in the future but doesn't really help with auto scaling since you need to adjust for spikes outside of your historical usage patters anyway.
A better method is actually something in between. Instead of looking at all historical data and trying to figure out how many instances you need at any time in the future you only need to figure out how many instances you need T minutes into the future. I would choose a T such that T is the time it takes to add a new instance 95% of the time. Now you just look at some recent data and calculate the equation for the line describing the most recent load. The tricky part here is to decide if the load is linear or curved, but given a short enough time period even a curved line can be approximated as a straight line. Now that you have a curve fitting your data you can use that to estimate the needs into the future. Naturally (since you are estimating needs) you should be conservative and make an estimate such that it is only a small probability of being too small.
This way you can be a little bit smarter and not just react to load but actually be proactive in your scaling. Without the cost of keeping and analyzing a lot of data.