Empirische Methoden 2

Question	Answer
STATISTICS IS...	a branch of mathematics which is concerned with the collection, classification and analysis of numerical facts.
TWO AREAS OR SUBDIVISIONS OF STATISTICS	DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS
DESCRIPTIVE STATISTICS	make understandable a group of numbers from a research study. The numbers have a meaning and are described through values such as the median or the mean.
INFERENTIAL STATISTICS	it infers the properties of a population from the analysis of a data sample. It is valuable when it is not possible to examine each member of the entire population.
NOMINAL DATA	are items that do not have an order and can be differentiated only by a naming system. The items may have numbers assigned to them, but the numbers do not have a meaning. (gender, race, marital status)
ORDINAL DATA	items are set into some kind of order by their position or scale, according to a magnitude. (marks, positions in a race)
METRIC DATA	data have a meaning as a measurement and we can define a distance between the values. (height, temperature, weight)
DISCRETE VARIABLE	represent items that can be counted and values that can be listed out (gender, school marks). It can only adopt certain values.
CONTINUOUS VARIABLE	are measured along a continuous scale which can be divided into fractions. They allow for infinitely fine subdivision and can adopt any value of the scale (age, height, lenght of a screw).
INTERVAL SCALE	An interval scale of measurement is based on increments of the same size, but also on the lack of a true zero point. Example: temperature, where zero is not a point of reference.
RATIO SCALE	Ratio scales have all the properties of the other previous scales, plus they also have a meaningful zero point. For example: Weight, height, speed etc.
RELATIVE FREQUENCY	hoe often something happens divided by all outcomes
CUMULATIVE RELATIVE FREQUENCY	is the addition of the relative frequencies that were previously tabulated.
THE 5 QUALITIES OF A GREAT VISUALIZATION OF DATA (BE FIT)	BEAUTIFUL - attractive for the audience ENLIGHTENING - will change our minds FUNCTIONAL - accurate depiction of data INSIGHTFUL - reveals evidence TRUTHFUL - honest data
RULES FOR A GRAPHIC REPRESENTATION OF DATA	1. Title and description in both axes (x and y) 2. Zero point at the crossing point of both axes 3. no 3D graphics 4. respect to other in the variables and features
MEAN	Provides a measure for the central location of the data. It is the sum of the values divided by the number of values.
MEDIAN	is the value in the middle when the data are arranged in ascending order. It is the middle value when the number of values is odd and the average of the 2 middle values by an even number.
MODE	is the value that occurs with the greatest frequency.
THE BOXPLOT	it is a graph of numerical data based on the 5-number summary. It includes the minimum value, the 25th percentile, the median, the 75th percentile and the maximum value. It gives information on shape, center and variability of a data set.
VARIABILITY	shows how different scores are different from one another. The mean is used as a comparison for the variability. To measure the variability, the variance and the standard deviation are used.
VARIANCE	is a measure of variability based on the difference between the value of each observation (Xi) and the MEAN. (S2)
STANDARD DEVIATION	The standard deviation is defined as the positive square root of the variance. Thus, it is measured in the same units as the original data. It measures how concentrated are the data around the mean.
CHARACTERISTICS OF THE STANDARD DEVIATION	- it can never be negative - the smallest value is zero and means no deviation (all numbers are equal) - has the same units as the original data the more concentrated the values, the smallest the SD. (S)
COEFFICIENT OF VARIATION	It indicates how large the standard deviation is in relation to the mean. The coefficient of variation (CV) is a normalized measure of dispersion of a probability distribution.
CORRELATION COEFFICIENT (metric data)	measures the strenght and direction of the linear relationship between 2 numerical variables x and y. (metric data) r - always between -1 and 1.
CHI SQUARE (nominal data)	is the sum, over all categories, of the squared difference between observed and expected frequencies divided by the expected frequency multiplied by the total number of scores.
CRAMERS V	Cramers V shows how significant and important the relationship between variables is. Cramer's V varies between 0 and 1. Close to 0 it shows little relationship between variables. Close to 1, it indicates a strong association.
COVARIANCE	is a measure of the joint variability of two random variables. The sign of the covariance therefore shows the tendency in the linear relationship between the variables. The magnitude of the covariance is not easy to interpret. The normalized version of the covariance, the correlation coefficient, shows by its magnitude the strength of the linear relation.
DIRECT CORRELATION VS. INDIRECT CORRELATION	the two variables change in the same direction / the two variables change in opposite directions
REGRESSION	in the case of 2 numerical variables, you can come up with a line that enables you to predict Y from X, if the correlation (r) is from moderate to strong.
COEFFICIENT OF DETERMINATION (r2)	it is useful to predict future outcomes based on other related information. Is the square of the correlation coefficient. It ranges from 0 to 1. An r2 of 1.0 indicates that the regression line perfectly fits the data.
PROBABILITY	is to assign values to events in order to evaluate how likely or probable those events are.
NORMAL DISTRIBUTION	a variable X has a normal distribution if its values fall into a smooth, continuous curve with a bell-shaped pattern.
CHARACTERISTICS OF A NORMAL DISTRIBUTION	1. symmetric shape (each half is the mirror of the other) 2. its distribution has a bump in the middle, with tails going down. 3. the mean, mode and median are the same and lie in the middle of the distribution 4. the total area of the curve is 1 5. is defined by 2 parameters: the mean (0) and the standard deviation (-1)
CALCULATE WITH NORMAL DISTRIBUTIONS	1. READ ON THE TABLE 2. THROUGH OPPOSING PROBABILITY 3. READ THROUGH SYMMETRY 4. THROUGH SYMMETRIE AND OPPOSING PROBABILITY
STANDARD NORMAL DISTRIBUTION	A random variable that has a normal distribution with a mean of zero and a standard deviation of one. It has the same appearance as other normal distributions, but with the special properties of μ=0 und б=1
CENTRAL LIMIT THEOREM	If a variable doesn't have a normal distribution, the shape of the sampling distribution is approx normal, as long as the sample size is large enough (n of at least 30). CLT is only needed when the distribution of X is not normal!
CONFIDENCE INTERVALS	is a statistic plus or minus a margin of error. Your result becomes an interval. Example: kids who like football 40% (+- 3.5). Lower end of the interval: statistic - MOE. The width of the confidence interval is two times the MOE: from the lower end of the interval to the upper end of the interval.
MARGIN OF ERROR	it is the number that you add and substract to the sample to indicate that you are giving a range of possible values for the population parameter. (+-)
HOW TO TEST AN HYPOTHESIS?	Every hypothesis test contains a set of 2 opposing statements about a population parameter (H0 and H1)
NULL HYPOTHESIS (HO)	states that the population parameter is equal to the value. There is no statistical significance.
ALTERNATIVE HYPOTHESIS (H1)	the means of the population values are different and the H0 is rejected. There is statistical significance
TYPE 1 ERROR (P= ALFA)	means rejecting H0 when you shouldn't. It is asserting something that is absent.
TYPE 2 ERROR	means nos rejecting H0 when you should have. Occurs when H0 is false, but erroneously fails to de rejected.
CORRECT JUDGEMENTS WHEN TESTING A HYPOTHESIS	First, you can accept the null hypothesis when the null hypothesis is actually true. Secondly, you can reject the null hypothesis when the null hypothesis is actually false.
PROCEDURE FOR A STATISTICAL TEST (5 STEPS)	1. Provide a statement of the H0 and the H1 2. Set the level of risk associated with the H0 (significance level - 0,05) 3. Determine the critical value (1,96) 4. Compute the test statistic value 5. Compare the obtained value with the critical value.
ANSWER WHEN YOU ACCEPT H0	→ The test result falls within the interval. → It does not lie within the critical zone. → H0 will therefore be accepted, since there is weak evidence against it. → Our data is therefore not significant.
ANSWER WHEN YOU DON'T ACCEPT H0	→ The obtained value is within the critical interval. → H0 will therefore be rejected, since there is strong evidence against it. → Our data is therefore significant.

Next up

Empirische Methoden 2

Description

Resource summary

Similar

	Created by ESTEFANIA OLIVA about 7 years ago