Data Science

Description

Diploma Data Science Flashcards on Data Science, created by Sarah Sandison on 07/06/2019.
Sarah  Sandison
Flashcards by Sarah Sandison , updated more than 1 year ago
Sarah  Sandison
Created by Sarah Sandison almost 5 years ago
28
0

Resource summary

Question Answer
Clustering: What is clustering? Clustering is a data mining technique used to organise objects into groups that have similar characteristics.
Clustering: What is the least number of variables needed to perform Clustering? One
Clustering: What is the goal of clustering a set of vectors? To divide them into groups of vectors that near each other
Clustering: Name as many different types of clustering as you can - K-Means - DBSCAN - Graph Community Detection
Clustering: The most common used measure of similarity is _____ distance or it's square Euclidean
Clustering: What is K-Means? K-Means is an algorithm that will group object by nearest neighbour
Clustering (K-Means): K-Means only works for data in what format? Numerical
Clustering (K-Means): What 3 data attributes causes inaccurate results in K-Means? 1. Data that has outliers 2. Data that has non-convex (not circular) shape 3. Data that has a round shape
Clustering (K-Means): K-means is an iterative algorithm. Which two steps are repeatedly carried out in its inner-loop? 1. Move the cluster centroids, where the centroids μk are updated 2. Parameters c are updated when the clusters are assigned to the closest centroid
Clustering (K-Means): K-Means is also known as... Non-hierarchical
Clustering (K-Means): What is Within the Sum of Square? For each cluster, it is the sum of the squared distances of points in that cluster to their center, summed over the clusters
Clustering: ? Groups or clusters are suggested by the data and are not defined a priori.
Clustering: ? Objects in each cluster are similar to each other and unlike objects in other clusters.
Association Rules Association rule mining is a data mining technique used to discover frequent patterns and associations among items in a (often transactional) dataset. Sometimes it is called market-basket analysis.
Association Rules: Itemset A group of items that occur together.
Association Rules: Support Each itemset can have a support level, which is the percentage of times it appears in a dataset.
Association Rules: Association Rule The statement of itemsets that occur together. {Hotdog} => {Bun, Ketchup}
Association Rules: Confidence Confidence is an indication of how often the rule has been found to be true. Support of LHS + RHS / Support LHS
Association Rules: Lift Lift is a measure of how many times more often X and Y occur together than expected if they are statistically independent of each other. Support of LHS + RHS / (Support LHS * Support of RHS)
Association Rules: List the different Association Rule algorithms. - Apriori - ECLAT - FP-Growth - AprioriTID Aprioiri Hybrid
Association Rules: ? For a given number of items n, there is a larger number of possible rules than possible itemsets.
Association Rules: Which measure of association needs to not be too small in order to ensure that the rule has not just occurred by chance? n
Association Rules: The definition of a frequent itemset is... An itemset whose support is greater than the support threshold
Association Rules: What is a sparse matrix? A matrix with mostly 0 entries
Association Rules: Is the support of the rule A → B the same or different as the support of the rule A → B is? Same
Association Rules: In general, an association rule A → B tells us that... If A occurs, then B is likely to occur too
Association Rules: ? If an itemset is infrequent, then all of its supersets are infrequent
Visualisation: What are the 3 main python libraries for visualisations? 1. Matplotlib 2. Seaborn 3. Bokeh
Visualisation: What Charts can you use for Comparison of Data? - Bar - Column -Pie (although DF don't like it) - Scatter Plot - Line
Visualisation: What Charts can you use to show Composition of Data? - Pie - Stacked Bar - Stacked Column - Area - Waterfall
Visualisation: What Charts can you use to show Distribution of Data? - Scatter Plot - Line - Column - Bar
Visualisation: What Charts can you use to show Trends within Data? - Line - Dual-Axis Line - Column
Visualisation: What Charts can you use to show the the relationship between value sets? - Scatter Plot - Bubble - Line
Regression: What are the limits for statistically significant correlation?
Show full summary Hide full summary

Similar

Basic Python - Lists
Rebecca Noel
Python
Jay Prakash
Computer Science
Bayram Annanurov
Top 5 Data Science Certifications In-demand By Fortune 500 Firms in 2022
Data science council of America
Top 5 Data Science Certifications In-demand By Fortune 500 Firms in 2022
Data science council of America
REVISION TIMETABLE
princessmatende
Skewed Distributions in Data Science.
Vishakha Achmare
Is Data Visualization Literacy Part of Your Company Culture?
Data science council of America
Forensic DNA Analysis I
Sarah Bax
Sampling Techniques In Data Science
Vishakha Achmare
US Talent ID scheme
007842-Stuart Denton