When analyzing data with a regression model it is very important to avoid overfitting of the data.


Overfitting occurs when a statistical model is too complex for the data. This may occur if the model has too many degrees of freedom.


Overfitting results in a model that performs well on the training data but poorly on subsequent data sets. The appearance of good performance is deceptive since the model is actually fitting noise in the original data and not the key underlying reality.


Overfitting can be prevented if it is specifically looked for. Methods used to avoid overfitting include:

(1) pruning

(2) cross-validation

(3) regularization

(4) early stopping

(5) model comparison

(6) Bayesian priors on parameters

(7) training with a small amount of extra noise in the data

(8) penalization

(9) shrinkage


To read more or access our algorithms and calculators, please log in or register.