Overfitting - Underfitting - Regularization

Description

Note on Overfitting - Underfitting - Regularization, created by Sheyma Den on 28/11/2018.
Sheyma Den
Note by Sheyma Den, updated more than 1 year ago
Sheyma Den
Created by Sheyma Den almost 6 years ago
5
0

Resource summary

Page 1

Low loss on validation set = better model. Overfitting : it's when the accuracy is high on the training set (which is often the case),  but it does not generalize well on unseen data (the model doesn't perform well on test data, and we want want our model to do good on unseen data). Overfitting can be noticed on the validation set, when the accuracy peaks up after some training and starts decreasing -> Model is overfitting the training set. Underfitting: occurs when there's still room for improvement for the model. Reasons for underfitting:  Model not powerful enough. Model is over-regularized. Model not trained enough:  it didn't learn relevant patterns in the training data. Note that training the model for too long leads to overfitting. We need to train for a certain number of epochs that balances overfitting and underfitting. How to avoid overfitting?  Use more training data. Early stopping during training (need more infos). Data augmentation? Batch normalization?  Reduce the capacity (size) of the model: meaning reduce the number of learnable parameters in the model. In NNs, learnable parameters = layers and units in each layer. Intuition about reducing the capacity of the model:  Model with a lot of parameters = more memorization capacity -> learns dict like mapping between training sample and their targets -> No generalization power + useless on unseen data. Model with limited memorization capacity = has to learn a compressed representation with a more predictive power. Note that model that's too small = difficulty fitting to the training data. To find the balance between too much and too little capacity, we need to experiment and do some tuning.  Always start with a NN with few layers and units and then increase. Add dropout: randomly dropping out (setting to 0) a number of output features of the layer during training. Dropout rate = fraction of the features that are being set to 0 . At test time, no units are dropped, instead the layer's output values are scaled down by a factor = to the dropout rate, so as to balance for the fact that more units are active at test time Regularization = Technique that places constraints on the quality and type of infos the model can store. Weight reg: In NN, given a training set, there's multiple models with different sets of weights that could explain the data. Simple models are less likely to overfit compared to complex models. A simple model is a model where the distribution of it's weights values has less entropy, or the model has fewer parameters. So to avoid overfitting, we can put constraints on the complexity of a model by forcing it's weights to only take small values -> makes the distrivution of weights more "regular". This is called weight reg and it's done by adding to the loss function of the model a cost associated with having large weights. We have 2 types of weight reg: L1 reg and L2 reg.

Show full summary Hide full summary

Similar

Machine Learning
alex_sj
Python
Jay Prakash
Skewed Distributions in Data Science.
Vishakha Achmare
Sampling Techniques In Data Science
Vishakha Achmare
Descriptive Statistics for Data Science
Vishakha Achmare
Logistic regression
Vishakha Achmare
Linear Regression
Vishakha Achmare
Inferential Statistics for Data Science
Vishakha Achmare
anatomy
Rewise MD
LA MML DE UN PROGRAMA
trane64
Machine Learning for Data Science
Vishakha Achmare