New charts for item prices, help wanted

I have created now charts where you can observe how item prices developed over time.

As you can see from this example however, some of the charts are distorted by some extreme values. I would like to know if someone can recommend a way to find a better maximum value for the charts than simply the highest value of the dataset. The method has to be implemented in PHP.

4 thoughts on “New charts for item prices, help wanted”

Hi Unc from the info you gave i think what you need to implement to clear that data is to use the Standard deviation formula for the data prices. This way it wont show extreme values in price only the ones that are within a range of the mid price of the item see links below:

http://en.wikipedia.org/wiki/Standard_deviation

http://upload.wikimedia.org/wikipedia/commons/7/7e/Standard_deviation_illustration.gif

You would just need to see how can it be runned in PHP to clear the data, but i think any programer should be able to add the formula with no problems to the price list you already provided. hope this helps. Also great idea to have Item prices and list keep up the good work.

pepetrueno on 2011/08/12 at 12:24 said:

Also the formula in PHP so it can be implemented can be found here :
http://www.ajdesigner.com/php_code_statistics/variance_population.php

Wish I could help but I’m not familiar with PHP. Only Java here

Hi Unc,

Reading up on finding outliers, you mostly get the academic literature, which involves mathematically robust (but complicated) methods like finding integrals for the distribution to one side of each data point. Rather than do that, I took some of the code that pepetrueno linked to, and modified it to give an iterative approach. This is an off-the-cuff quick method that would probably make statisticians cringe. ;-)

The input is the array, the output is the upper_bound for your graph (an integer).

http://pastebin.com/CUeMrQZ7

You can set the variable for standard deviations from the mean you want to include. I set it to 3 in the example, but I don’t have your data so that may need tweaking. Then you just iterate over the set and remove items that are too far above the mean.

Crucially, we then re-calculate everything and start again with a new mean and standard deviation. We iterate until we do a run where we didn’t remove anything (or the array is empty, which should never happen). Worst case scenario is it’ll run once for every outlier. Best case is it runs once.

It’s iterative and not recursive, so it shouldn’t do anything wacky with stack or recursion depth. Hope it’s suitable!

Comments are closed.