Neural Networks - Data Analysis

Descrição

Very High Data Science Mapa Mental sobre Neural Networks - Data Analysis, criado por Aleksandar Kovacevic em 17-09-2017.
Aleksandar Kovacevic
Mapa Mental por Aleksandar Kovacevic, atualizado more than 1 year ago Mais Menos
Aleksandar Kovacevic
Criado por Aleksandar Kovacevic mais de 8 anos atrás
Aleksandar Kovacevic
Copiado por Aleksandar Kovacevic mais de 8 anos atrás
19
0

Resumo de Recurso

Neural Networks - Data Analysis
  1. Tuning
    1. High Bias
      1. Data is too roughly modelled (underfitting)
        1. Bigger Network
          1. Train Longer
            1. NN architecture search
            2. High Variance
              1. Data is too good modelled (overfitting)
                1. More Data
                  1. Regularization
                    1. Weight Decay
                      1. L2 regularization : add (lamba/2*m)* ||W||F to cost
                        1. L1 regularization: add (lamba/2*m)* ||W||F to cost
                          1. Regularization required also to partial differential: W= W-alpha*dW
                            1. Intuition for param Lambda: Lambda goes High -> weights goes low -> makes NN more linear
                            2. Dropout
                              1. Randomly take out certain neurons from network
                              2. Data Augmentation
                                1. Adding more training data of distorting existing data
                                2. Early Stopping
                                  1. Stop earlier, where train error and dev error are at min
                              3. Optimization Problem
                                1. Data not normalized -> slower training process
                                  1. Normalize data to have mean=0, and std=1
                                    1. Vanishing/ exploding gradients
                                      1. Deep network have issue of becomming to high or two low throughout network
                                      2. Gradient Checking
                                        1. Compare cost function, when increased and decreased by small value epsilon
                                        2. Optimization Algorithms
                                          1. Mini-Batch gradient descent
                                            1. Split the input and output (X,Y) data into small slices / batches, and calculate costs of only these batches
                                              1. Choosing Batch Size
                                                1. small set (m<=2000) -> batch gradient descent
                                                  1. larger set -> batch size: 64,128,256 or 512
                                                    1. Make sure batch fits CPU/GPU mem
                                                  2. Exponentially weighted averages
                                                    1. Weights are recalculated based on formula
                                                      1. Bias Correction
                                                        1. Corrects the starting values of exp. weighted averages using the formula: v(t)=v(t)/(1-beta^t)
                                                      2. Gradient Descent with Momentum
                                                        1. Aim: accelerator horizontal component of gradient descent to converge faster towards solution. Based similarly on formula for exp.weighted averages, just with gradient instead of theta
                                                        2. RMSprop
                                                          1. Aim to slower the vertical component of gradient descent and speed up the horizontal component.
                                                          2. Adam
                                                            1. Combination of RMSprop and Gradient Descent with momentum
                                                              1. Hyperparameter choice: alpha : needs to be tuned, beta1 = 0.9, beta2 = 0.999, epsilon = 1e-8
                                                              2. Learning Decay
                                                                1. A method to lowe the learning rate closer it gets to minimum.
                                                                  1. Many formulas exist,the most famous is: alpha = 1/(1+decay_rate*epoch_num)
                                                                  2. Tuning algorithm's hyperparameters
                                                                    1. priorities: darkest - most important, lightest - least. white is fixed.
                                                                      1. try random values: dont't use grid
                                                                        1. Coarse to fine choice
                                                                          1. randomness scale choice, e.g. for alpha - logarithmic scale
                                                                          2. Batch Normalization
                                                                            1. Idea of normalizing each layer input (Z, not A) of Neural Network
                                                                      2. Weights/parameters initialization
                                                                        1. Zeros? NO!

                                                                          Anotações:

                                                                          • Zeros will make all neurons of neural network act the same, and behave linear, which loose the sense of having neural network.
                                                                          1. Bad - Fails to break Symmetry -> gradient not decreasing
                                                                          2. Random Init

                                                                            Anotações:

                                                                            • - Initializing weights to very large random values does not work well.  - intializing with small random values does better. 
                                                                            1. Good - Breaks Symmetry
                                                                              1. Bad - large weight -> exploding gradients
                                                                              2. He Init - the best!

                                                                                Anotações:

                                                                                • sqrt(2./layers_dims[l-1])
                                                                                1. Good - Ensures faster learning speed
                                                                                  1. works well with ReLU activations
                                                                                2. Dataset Split
                                                                                  1. Data > 1M 98% Train, 1% Dev, 1% Test
                                                                                    1. Small Data, 60% Train, 20% Dev, 20% Test
                                                                                      1. Train set from different distribution than Dev/test sets

                                                                                      Semelhante

                                                                                      Basic Python - Lists
                                                                                      Rebecca Noel
                                                                                      Python
                                                                                      Jay Prakash
                                                                                      Computer Science
                                                                                      Bayram Annanurov
                                                                                      Sampling Techniques In Data Science
                                                                                      Vishakha Achmare
                                                                                      Top 5 Data Science Certifications In-demand By Fortune 500 Firms in 2022
                                                                                      Data science council of America
                                                                                      Top 5 Data Science Certifications In-demand By Fortune 500 Firms in 2022
                                                                                      Data science council of America
                                                                                      Skewed Distributions in Data Science.
                                                                                      Vishakha Achmare
                                                                                      Inferential Statistics for Data Science
                                                                                      Vishakha Achmare
                                                                                      Logistic regression
                                                                                      Vishakha Achmare
                                                                                      Linear Regression
                                                                                      Vishakha Achmare
                                                                                      Descriptive Statistics for Data Science
                                                                                      Vishakha Achmare