Influence of outliers in product sales prediction (Part 2)

Recap of Part 1

In the previous article, we discussed, Prediction of sales from historical time series data.  How outlier detection IQR (Interquartile range) technique improve the accuracy of daily sales and monthly sales prediction values ( Check out Part I of this article ). here we are going to discuss how the same (IQR) technique influences the individual product sales prediction.

Understand monthly sales data for the product: “Music videos”

Statistics of daily sales of Product: music videos. mean value of daily sales amount is 60.5 From standard normal distribution some of daily sales amount up to $160! Screen Shot 2017-07-19 at 4.42.03 PM

z-scores

Standardization or z-scores is the most commonly used method. It converts all indicators to a common scale with an average of zero and standard deviation of one. Comparing daily sales quantity with sales amount of producing music videos, gives the idea of how the data is distributed in common scale. Screen Shot 2017-07-19 at 6.32.03 PM

Normal distribution

Screen Shot 2017-07-19 at 6.32.30 PM Here we can see the same ( above said ) comparison with their own scales respectively daily sales quantity and amount. most of the daily sales amount lies between $30 to $100. rarely it reaches $160 and above.

Outliers not visible in time series data

From the daily music video sales amount data, there is no standard growth or downfalls, only the irregular patterns are visible. Screen Shot 2017-07-19 at 4.36.36 PM Screen Shot 2017-07-19 at 4.34.02 PM

Detect outliers using IQR formula

Screen Shot 2017-07-19 at 5.58.58 PM

Detection of outliers combining sales quantity and amount

Screen Shot 2017-07-19 at 6.33.46 PM

Compare sales predictions

Product Name Daily sales Prediction

With Outliers (Actual data) After removing outliers

Music videos $62 $59

Entertainment Laptops $2905 $2850

Android Phones $1113 $1022

Music videos $490 $530

cross validation of  current prediction

Plotting daily cumulative sales value (blue line) and after removing outliers (red line) Screen Shot 2017-07-19 at 6.54.51 PM How does it change in daily sales prediction for the product Music videos. We can see the gap between normal prediction( blue line) and prediction after removing outlier data points (red line). Screen Shot 2017-07-19 at 7.39.04 PM

Conclusion

IQR technique helps to identify outlier data points, from single measure with time series and dimension attribute. furthermore, if we have multiple measure data sets, we need more advanced algorithm like One-class SVM, Robust covariance and Isolation forest. In further posts, we will discuss more.