Machine Learning

Description

Estudio de aprendizaje automatico
Eduardo Guerrero
Mind Map by Eduardo Guerrero, updated more than 1 year ago
Eduardo Guerrero
Created by Eduardo Guerrero over 5 years ago
25
0

Resource summary

Machine Learning

Annotations:

  • Is the ability of machine of learned automatically
  1. Supervised

    Annotations:

    •  is where the data is labeled and the program learns to predict the output from the input data.
    1. Regression

      Annotations:

      • In regression problems, we are trying to predict a continuous-valued output. Examples are: What is the housing price in Neo York?What is the value of cryptocurrencies?
      • Loss: We can think about loss as the squared distance from the point to the line. We do the squared distance (instead of just the distance) so that points above and below the line both contribute to total loss in the same way:
      1. Classification

        Annotations:

        • In classification problems, we are trying to predict a discrete number of values. Examples are: Is this a picture of a human or a picture of an AI?Is this email spam?
        • Classification is used to predict a discrete label. The outputs fall under a finite set of possible outcomes. Many situations have only two possible outcomes. This is called binary classification (True / False, 1 o 0)
        1. Normalization

          Annotations:

          • The scale of the datapoint can afect the algoritm because the data points are in diferent scale
          1. Min Max Normalization

            Annotations:

            • Min-max normalization is one of the most common ways to normalize data. For every feature, the minimum value of that feature gets transformed into a 0, the maximum value gets transformed into a 1, and every other value gets transformed into a decimal between 0 and 1. For example, if the minimum value of a feature was 20, and the maximum value was 40, then 30 would be transformed to about 0.5 since it is halfway between 20 and 40. The formula is as follows: (value - min)/(max-min)
            • Min-max normalization has one fairly significant downside: it does not handle outliers very well. For example, if you have 99 values between 0 and 40, and one value is 100, then the 99 values will all be transformed to a value between 0 and 0.4.
            1. Z -Score Normalization

              Annotations:

              • Z-score normalization is a strategy of normalizing data that avoids this outlier issue. The formula for Z-score normalization is below:  (value - u)/o u: valor promedio de la caracteristica o: standart desviation of feature
            2. K - Nearest Neighbors
            3. Multi Label Clasification

              Annotations:

              •  is when there are multiple possible outcomes. It is useful for customer segmentation, image categorization, and sentiment analysis for understanding text. To perform these classifications, we use models like Naive Bayes, K-Nearest Neighbors, and SVMs.
            4. Unsupervised

              Annotations:

              • Unsupervised Learning is a type of machine learning where the program learns the inherent structure of the data based on unlabeled examples.
              1. Clustering
              2. The ML Process

                Annotations:

                • 1. Formulating a Question 2. Finding a understanding the data 3. Cleaning the data and feature engineering 4.Choosing a model 5. Tunning and Evaluating 6. Using the model and present the result
                1. Testing our Models

                  Annotations:

                  • In order to test the effectiveness of your algorithm, we’ll split this data into: training set validation set test set
                  1. Trainning Set

                    Annotations:

                    • The training set is the data that the algorithm will learn from. Learning looks different depending on which algorithm you are using. For example, when using Linear Regression, the points in the training set are used to draw the line of best fit. In K-Nearest Neighbors, the points in the training set are the points that could be the neighbors.
                    1. Validation Set

                      Annotations:

                      • After training using the training set, the points in the validation set are used to compute the accuracy or error of the classifier. The key insight here is that we know the true labels of every point in the validation set, but we’re temporarily going to pretend like we don’t. We can use every point in the validation set as input to our classifier. We’ll then receive a classification for that point. We can now peek at the true label of the validation point and see whether we got it right or not. If we do this for every point in the validation set, we can compute the validation error!
                      1. How split

                        Annotations:

                        • In general, putting 80% of your data in the training set, and 20% of your data in the validation set is a good place to start.
                        1. Coefficients

                          Annotations:

                          • Coefficients are most helpful in determining which independent variable carries more weight. For example, a coefficient of -1.345 will impact the rent more than a coefficient of 0.238, with the former impacting prices negatively and latter positively.
                          1. Correlations

                            Annotations:

                            • A negative linear relationship means that as X values increase, Y values will decrease. Similarly, a positive linear relationship means that as X values increase, Y values will also increase.
                            1. Evaluating the models Accuracy

                              Annotations:

                              • Now let's say we add another x variable, building's age, to our model. By adding this third relevant x variable, the R² is expected to go up. Let say the new R² is 0.95. This means that square feet, number of bedrooms and age of the building together explain 95% of the variation in the rent. The best possible R² is 1.00 (and it can be negative because the model can be arbitrarily worse). Usually, a R² of 0.70 is considered good.
                            Show full summary Hide full summary

                            Similar

                            09. Air masses and fronts
                            Jose Antonio Vazquez
                            Unidad III La función de auditoria interna y sus responsabilidades
                            baby-taz22
                            Transhumanismo
                            Sergio Rojas
                            TIPOS DE SISTEMAS
                            Laura Diaz Espindola
                            Cloud Computing
                            proyectoFinla 43GRU
                            Visión Artificial
                            FERNANDO HENDRIX ROJAS QUEZADA
                            APRENDIZAJE INVISIBLE
                            Carlos Viesca
                            Reconocer el Sesgo en la Inteligencia Artifical
                            Carlos David Alamillo Zúñiga
                            Herramientas
                            Carolina Alvarad
                            Evolución- Nuevas TIC
                            Cristhian Hernando Pinzon Camacho
                            Machine Learning
                            Jorge Valbuena