# Stats

Mind Map by nb43, updated more than 1 year ago
 Created by nb43 almost 7 years ago
118
2

### Description

Stats Mind Map on Stats, created by nb43 on 04/22/2013.

## Resource summary

Stats
1 Mixed Factorial ANOVA
1.1 Experiment designs: between and within subjects

Annotations:

• Between Subjects: - different participants in each condition - looks at the differences between groups Within Subjects: - same participants in each condition - differences between the treatments The dependent variable is measured in exactly the same way for each design
1.1.1 Problems for between

Annotations:

• Participants variables Large group of participants required - impractical Biases lead to false conclusions - assignment, observer-expectancy, subject-expectancy It is possible to assess the baseline measure
1.1.2 Problems for within

Annotations:

• Practice effects - lack of naivety - the more you do the task, the better you get Longer testing sessions when many conditions.
1.1.3 Factorial Designs

Annotations:

• one dependent variable two or more independent variables. Used when we suspect that more than one IV is contributing to a DV. Allow exploration of complicated relationships between IVs and a DV
1.1.3.1 Main effect: how the IVs individual effect the DV - overall trend
1.1.3.2 Interactions: how IV factors combine to affect the DV
1.1.3.3 Between Subjects factorial ANOVA
1.1.3.4 Within Subjects factorial ANOVA
1.1.3.5 Mixed factorial ANOVA

Annotations:

• Efficient uses of participant numbers and individual participant time - reduces cons of other designs. One of the most common types of design. Mixed factorial ANOVA assumptions and formulae are the same as for factorial ANOVA.
1.1.3.5.1 mix of between and within factors

Annotations:

• at least one between subjects factor and one within subjects factor
1.1.3.5.1.1 Increasing between subjects factors rapidly makes high cost studies non-viable
1.1.3.5.1.2 Main effect and Interaction formula

Annotations:

• F values MS values SS values F(between df, within df) =  F value, p = p value
1.1.3.5.1.2.1 Within subjects
1.1.3.5.1.2.2 Between Subjects
1.1.3.5.1.2.3 F(between df, within df) = f value, p = p value
1.1.3.5.1.2.4 F values, MS values, SS values
1.1.3.5.2 Assumptions

Annotations:

• Interval/ ratio data Normal distribution - histogram Homogeneity of variance - between subjects - Levene's test Sphericity of covariance - within subjects - Mauchly's test No parametric alternatives if these are violated
1.1.3.5.2.1 1. interval/ ratio data
1.1.3.5.2.2 2. normal distribution
1.1.3.5.2.3 3. Homogenity of variance (between, Levene's)

Annotations:

• want it to be non-significant
1.1.3.5.2.4 4. Sphericity of covariance (within, Mauchly's)

Annotations:

• want it to be non-significant
1.1.3.5.2.5 no parametric alternatives if violated
1.1.3.5.3 TWO RULES
1.1.3.5.3.1 use between subjects formulae for between subjects effects and within subjects for within subjects effects
1.1.3.5.3.2 if there is a conflict e.g. in interactions, use within subjects
1.1.3.5.4 N = total number of scores
1.1.3.5.5 n = number of scores within the condition
2 Correlation
2.1 Tests of Association

Annotations:

• Tests of the relationships between two variables and are usually performed on continuous variables. Tests where there is a shared variance between any given pair of variables. looking for an association between the samples, not a difference (independent samples t-test).
2.1.1 Pearson's (parametric); Spearman's (nonparametric)

Annotations:

• Also point-biserial correlation - one continuous variable - one cateogrical variable 2 levels And simple linear regression, and multiple linear regression.
2.1.1.1 Pearson's Correlation Assumptions (parametric)
2.1.1.1.1 1. linear relationship between variables

Annotations:

• A linear relationship means that at any point a given change in x will lead to a change in y. If the scatterplot shows a clear nonlinear relationship do not run a Pearson's correlation. Data isn't suitable for correlation analysis if it has a curving nonlinear relationship.
2.1.1.1.2 2. variables measure interval/ ratio data which are normally distributed

Annotations:

• as the mean and s.d. only accurately describe the average and dispersal of the data when the data are normally distributed. If frequency distributions fshow a non-normal distribution do not run a Pearson's correlation.
2.1.1.1.3 3. Data should be free of statistical outliers

Annotations:

• because outliers have disproportionate influence on the correlation statistic or correlation coefficient (r). There is a misrepresentation of data if outliers are included. Either exclude them or use a Spearman's correlation (nonparametric) if they are more systematic.
2.1.1.2 Spearman's Correlation Assumptions (nonparametric)
2.1.1.2.1 1. monotonic relationship between variables

Annotations:

• either a positive, negative or curved relationship. Not a bell curve. -
2.1.1.2.1.1 relationship that goes in one direction
2.1.1.2.2 2. works on ordinal/ interval/ ratio data - no need to worry about the distribution
2.1.1.2.3 3. outliers can be included in Spearman's analysed data

Annotations:

• they do not exert as much influence, this is because Spearman's correlations do not use means or s.d.s but use ranks.
2.1.2 tell us whether variables covary with other variables
2.1.3 Pearson's correlation formula

Annotations:

• a. For each case, subtract the mean from the score on the X variable; repeat for the mean and score on the Y variable; multiply these two values, then add together the products for all cases. b. For each case, subtract the mean from the score on the X variable; square this difference; add together the squared value for all cases, and then find the square root. Repeat for the Y variable and multiply. Use this value to divide by.
2.1.3.1 Df = no. of pairs - 2
2.1.3.2 r(df) = r value, p = p value
2.1.3.3 r = correlation coefficient
2.1.3.3.1 indication of the strength of the relationship
2.1.3.4 r2 = coefficient of determination
2.1.3.4.1 measure of the strength of the relationship, describes the amount of variance explained
2.1.3.4.2 effect size
2.2 Scatterplots

Annotations:

• typically show relationships between pairs of variables. Each point represents one pair of observations at each measurement point
2.2.1 Bottom left to top right = positive
2.2.2 Top left to bottom right = negative
2.2.3 the spread gives an indication of the strength of the relationship
2.2.3.1 Direction and Strength
2.2.3.1.1 If there is low or no spread between the data points then there is a very strong correlation between the variables

Annotations:

• If there is a reasonable spread, then there is a strong correlation between the variab;es.
2.2.3.1.1.1 r value = 1/ -1

Annotations:

• direct diagonal line. when there is a greater spread, the points deviate from 1/-1.
2.2.3.1.2 If there is a high spread then there is low or no correlation
2.3 Interpreting correlations; facts about correlation coefficients
2.3.1 range from -1 to 1.
2.3.2 no units
2.3.3 they are the same for x and y as for y and x
2.3.4 positive values: as one variable increases so does the other
2.3.5 negative values: as one variable increases, the other decreases
2.3.6 positive relationship - as one value decreases, so does the other
2.3.7 the more spread out data are, the more values will deviate from 1
2.3.8 how close a value is to -1 or 1 indicates how close the two variables are to being perfectly linearly related
2.4 R values
2.4.1 Estimating r values

Annotations:

• 1. plot your scatterplot and divide it accordingly to the mean x and y values in order to estimate your values. 2. count up number of points in each quadrant. A positive correlation will populate the +ve quadrants more than the -ve quadrants and vice versa.
2.4.2 Calculating r values - determining whether two values are associated.
2.4.2.1 1.Plot the raw values against one another

Annotations:

• scaling problems - different means and SDs. We don't care about the means etc, only the relationships. If all the values are along the bottom, we must try to look at the data in a way that accounts for the differing means and SDs of each axis - therefore do a z score.
2.4.2.2 2. Z score gives you values which have a mean of 0 and a z score of 1.

Annotations:

• z score = (score-mean)/ SD No scaling or unit problems. Converting raw scores into z scores allows direct comparisons between scores even if they are measured on different scales, and thus enables a comparison of the relative probabilities of each. Z scores are referred to as standard scores because measurement scales are converted into a standardised format (mean = 0, SD =1)
2.4.2.3 3. r = the adjusted average of the product for each standardised x-y coordinate pair

Annotations:

• Top-right and bottom left quadrants produce positive values +/- r Calculate the area between the points and you would do this for every single value you are looking for a relationship between. The outliers would artifically inflate the correlation value (r). Bigger area = larger correlation value - further away from the mean.
2.4.2.3.1 the closer to the diagonal a point is, the more it contributes to the r value.
2.4.2.3.1.1 The further away from both means a point is, the more it contributes to r.
2.4.2.3.2 r = Σ(zX x zY)/ N -1

Annotations:

• where zX = X- x̄/ Sx
2.5 Limitations
2.5.1 Correlation does not equal causation

Annotations:

• there can be a casual link but correlation analyses do not allow us to conclude this. To prove causation, the experiment would have to be controlled.
3 Regression
3.1 what is regression?
3.1.1 a family of inferential statistics
3.1.2 Test of association
3.1.3 Help make predictions about data
3.1.4 used when causal relationships are likely
3.2 Correlation does not tell you how much to intervene
3.3 line of best fest
3.3.1 formula of the line gives the exact answer
3.4 Predictions
3.4.1 it is possible to make predictions about how predictor variables will effect outcome variables
3.4.2 regression gives an indication of the:
3.4.2.1 unstandardised relationship
3.4.2.2 between outcome (y-axis) and predictor (x-axis) variables
3.4.2.3 using calculations of the intercept and gradient
3.4.2.3.1 expressed in the form Y = a + bX
3.4.2.3.1.1 a = intercept/ constant
3.4.2.3.1.2 b = gradient/ coefficient
3.4.2.3.1.3 in order to determine a, you need to calculate b first
3.5 Assumptions
3.5.1 1. the data are linearly related
3.5.2 2. Homoscedasticity of data
3.5.2.1 residuals
3.5.2.1.1 residuals are the difference between the actual outcome score and the predicted score outcome
3.5.2.1.2 need same degree of variation across all predictor variable scores
3.5.2.1.3 if data are heteroscedastic, a regression isn't the appropriate analysis
3.6 simple regression
3.6.1 predicting one outcome variable from one predictor variable
3.6.2 Y = a + bX
3.6.3 SPSS output
3.6.3.1 1. descriptive statistics
3.6.3.2 2. correlation coefficient
3.6.3.3 3. variables enter and removed
3.6.3.3.1 variable entered = predictor variable
3.6.3.3.2 dependent variable = outcome variable
3.6.3.4 4. model summary (R values)
3.6.3.5 5. Check assumptions - graph tests of homoscedasticity
3.6.3.5.1 3 charts at the bottom
3.6.3.5.1.1 frequency plot of standardised residuals
3.6.3.5.1.1.1 histogram of residual values
3.6.3.5.1.1.2 want normal distribution
3.6.3.5.1.1.3 bars should approx fit the normal curve
3.6.3.5.1.1.4 good indication of homoscedasticity
3.6.3.5.1.2 normal plot of regression standardised residual
3.6.3.5.1.2.1 points should follow the diagonal line
3.6.3.5.1.3 scatterplot of regression standardised residual and regression standardised predicted value
3.6.3.5.1.3.1 DV = change
3.6.3.5.1.3.2 plots standardised predicted y values (x axis) against their corresponding residuals
3.6.3.5.1.3.3 want to see a diffused cloud - no distinct patterns
3.6.3.6 Determining whether the regression model is statistically valid - 3 R values
3.6.3.6.1 R = pooled correlation
3.6.3.6.2 R2 = amount of variance in the data that is explained by the model (%)
3.6.3.6.2.1 most important value
3.6.3.6.3 adjusted R2 = how much variance would be expected by chance
3.6.3.6.4 ANOVA table
3.6.3.6.4.1 test of whether the regression model is better than using the mean outcome value (y) for all cases
3.6.3.6.4.2 is the model signfiicantly better at predicting another model
3.6.3.6.4.3 report R2 than ANOVA result
3.6.3.7 Reporting Results
3.6.3.7.1 1. Check descriptives and correlations
3.6.3.7.2 2. Check that predictor and outcome variables show a linear relationship (scatterplot)
3.6.3.7.3 3. Check that homscedasticity assumption is not violated
3.6.3.7.4 Report the R2 in the test, and the ANOVA results
3.6.3.7.4.1 R2 = , F( , )= , p <
3.6.3.7.5 Report the coefficients in a table
3.7 Multiple Regression
3.7.1 Predicting one outcome variable from more than one predictor variable
3.7.2 Formula: Y = a + b1X1 + b2X2 +b3X3
3.7.3 many participants are needed
3.7.4 Methods
3.7.4.1 predictors can be entered in many different orders
3.7.4.2 Simultaneous
3.7.4.2.1 all predictors are entered at the same time
3.7.4.2.2 use for exploratory analysis
3.7.4.3 Hierarchical
3.7.4.3.1 predictors are entered in a pre-defined order
3.7.4.3.2 used when regressions are informed by well-defined theory
3.7.4.4 Stepwise
3.7.4.4.1 predictors are entered in an order driven by how well they correlated with the outcome
3.7.4.4.2 not used often as unstable
3.7.5 SPSS output
3.7.5.1 1. Descriptive Statistics
3.7.5.2 2. Correlations
3.7.5.3 3. Assumptions - visual tests for homoscedasticity
3.7.5.4 4. Model
3.7.5.4.1 summary
3.7.5.4.1.1 how good the model is, R2
3.7.5.4.2 ANOVA significance
3.7.6 Reporting Results
3.7.6.1 1. Check descriptives and correlations
3.7.6.2 2. Difficult to check for linear relationships
3.7.6.3 3. Check that homoscedasticity assumption is not violated
3.7.6.4 4. Report the R2 value
3.7.6.4.1 R2 = F(df,df) , p =
3.7.6.5 5. Report the coefficients in a table
3.7.7 multicollinearity occurs when variables are highly correlated with each other. This is undesired.
3.8 Summary
3.8.1 Regression analyses allow to make predictions about outcome variables using predictor variables
3.8.2 All regressions assume homoscedasticity
3.8.3 Simple (bivariate) regression uses one predictor variable. Multiple regression uses more than one.
3.8.4 To report regressions:
3.8.4.1 i) report R2 and the ANOVA in the text
3.8.4.2 ii) report the coefficients in a table
4 Correlation is used to examine the relationship between variables
4.1 Regression is used to make predictions about scores on one variable based on knowledge of the values of others

### Similar

Statistics Key Words
SAMPLING
HISTOGRAMS
FREQUENCY TABLES: MODE, MEDIAN AND MEAN
CUMULATIVE FREQUENCY DIAGRAMS
TYPES OF DATA
GROUPED DATA FREQUENCY TABLES: MODAL CLASS AND ESTIMATE OF MEAN
Statistics Vocab
chapter 1,2 statistics
Statistics, Data and Area (Semester 2 Exam)
Chapter 7: Investigating Data