Validation

Description

Introduction to validation
Xuehua Bu
Flashcards by Xuehua Bu, updated more than 1 year ago
Xuehua Bu
Created by Xuehua Bu about 5 years ago
5
0

Resource summary

Question Answer
Real Effects Real relationship between attributes and response variables
Random Effects Random but looks like real effects
Difference between Real and Random Effects Real effects are the same in all data sets Random effects are different in each data sets
Training Data Set Larger set of data to fit (build) the model
Validation Data Set Smaller set of data to measure the model's effectiveness and choose a model
Test Data Set To Estimate performance of chosen model
Splitting Data Random Rotation e.g. 5 data point rotation sequence
Cross Validation To avoid problem that some important data only appears in the validation or test datasets
K-Fold Cross Validation One of the Cross Validation methods
How does K-fold CV work Separate the whole data set to K parts and each of them will be act as training or validation dataset
K-Fold CV, which model to pick Do not average the coefficients across the splits Tran the model again using all the data
Clustering Grouping Data Points
Euclidean distance aka Straight-line distance 2-norm
Rectilinear distance aka Manhattan distance 1-norm
Minkowski distance aka p-norm distance the generalized formula of all distance formula
Infinity norm distance the largest (absolute) of a set of numbers e.g. Machine go and put the box on top of a pile of boxes
K-Means Clustering
Why K-means Clustering is heuristic? Fast, good but not guaranteed to find absolute best solution BUT K-means usually gets really close to the best solution
Expectation-Maximization algorithm K-mean clustering is an example of EM EM: 1. Alternate expectation step (find cluster centers) 2. maximization step (assign points to clusters)
Compare different K-means clustering Elbow diagram
Difference between Classification vs Clustering Classification: We know what the correct classification is Clustering: We DO NOT KNOW what the correct classification is
Supervised Learning Correct answer (response) is known
Unsupervised Learning Correct answer (response) is unknown
Show full summary Hide full summary

Similar

Unit 6: Principles of Training and Training methods
Cath Warriner
Barista Product Knowledge Quiz
Antonia Blankenberg
Distributing GoConqr Courses
Sarah Egan
Customer Service Training
Jake Johnson
Sample Product Knowledge Quiz - Smoke Detectors
Antonia Blankenberg
What is Product Knowledge Training?
Antonia Blankenberg
Retail checkout procedure
Tony Watson
Smokey Bones Bartender Quiz
Haley Macon
Demand management and customer service
Sonaly Verdin
Awareness
Paddy Costello
Training & Salesforce Motivation
Sarah Yuzaidi