Skip to main content

Normalizzazione dati

A standard approach is to scale the inputs to have mean 0 and a variance of 1. Also linear decorrelation/whitening/pca helps a lot.


1- Min-max normalization retains the original distribution of scores except for a scaling factor and transforms all the scores into a common range [0, 1]. However, this method is not 

robust (i.e., the method is highly sensitive to outliers.
2- Standardization (Z-score normalization) The most commonly used technique, which is calculated using the arithmetic mean and standard deviation of the given data. However, both mean and standard deviation are sensitive to outliers, and this technique does not guarantee a common numerical range for the normalized scores. Moreover, if the input scores are not Gaussian distributed, this technique does not retain the input distribution at the output.
3- Median and MAD: The median and median absolute deviation (MAD) are insensitive to outliers and the points in the extreme tails of the distribution. therefore it is robust. However, this technique does not retain the input distribution and does not transform the scores into a common numerical range.
4- tanh-estimators: The tanh-estimators introduced by Hampel et al. are robust and highly efficient. The normalization is given by
 where μGH and σGH are the mean and standard deviation estimates, respectively, of the genuine score distribution as given by Hampel estimators.
Therefore I recommend tanh-estimators.
Evernote consente di ricordare tutto e di organizzarti senza sforzo. Scarica Evernote.

Comments

Popular posts from this blog