| Question | Answer |
| Real Effects | Real relationship between attributes and response variables |
| Random Effects | Random but looks like real effects |
| Difference between Real and Random Effects | Real effects are the same in all data sets Random effects are different in each data sets |
| Training Data Set | Larger set of data to fit (build) the model |
| Validation Data Set | Smaller set of data to measure the model's effectiveness and choose a model |
| Test Data Set | To Estimate performance of chosen model |
| Splitting Data | Random Rotation e.g. 5 data point rotation sequence |
| Cross Validation | To avoid problem that some important data only appears in the validation or test datasets |
| K-Fold Cross Validation | One of the Cross Validation methods |
| How does K-fold CV work | Separate the whole data set to K parts and each of them will be act as training or validation dataset |
| K-Fold CV, which model to pick | Do not average the coefficients across the splits Tran the model again using all the data |
| Clustering | Grouping Data Points |
| Euclidean distance | aka Straight-line distance 2-norm |
| Rectilinear distance | aka Manhattan distance 1-norm |
| Minkowski distance | aka p-norm distance the generalized formula of all distance formula |
| Infinity norm distance | the largest (absolute) of a set of numbers e.g. Machine go and put the box on top of a pile of boxes |
| K-Means Clustering | |
| Why K-means Clustering is heuristic? | Fast, good but not guaranteed to find absolute best solution BUT K-means usually gets really close to the best solution |
| Expectation-Maximization algorithm | K-mean clustering is an example of EM EM: 1. Alternate expectation step (find cluster centers) 2. maximization step (assign points to clusters) |
| Compare different K-means clustering | Elbow diagram |
| Difference between Classification vs Clustering | Classification: We know what the correct classification is Clustering: We DO NOT KNOW what the correct classification is |
| Supervised Learning | Correct answer (response) is known |
| Unsupervised Learning | Correct answer (response) is unknown |
Want to create your own Flashcards for free with GoConqr? Learn more.