Remove ads

We have detected that Javascript is not enabled in your browser. The dynamic nature of our site means that Javascript must be enabled to function properly. Please read our terms and conditions for more information.

Info
Ratings
Comments

Note by Rohit Gurjar, created more than 1 year ago

Probability & Statistics

Pinned to

502

No tags specified

probability
statitics

	Created by Rohit Gurjar almost 7 years ago

Rate this resource by clicking on the stars below:

(0)

Ratings (0)

0
0
0
0
0

0 comments

There are no comments, be the first and leave one below:

To join the discussion, please sign up for a new account or log in with your existing account.

14442551

note

2018-12-24T22:12:55Z

Probability & Statistics

Introduction

This is a scatter plot between petal width as the y-axis and petal length as the x-axis from the IRIS dataset.

The setosa flower is well separated but let us suppose picked some random point(xq, where q is the query point) which here is the intersection of versicolor and verginica. So, the prediction of the type of xq is not easy. So, here we are using probabilistic score to predict it.

Population & Sample

Population - Set of all the events or observations

Sample - Sample is the subset of the population which we have taken to make an estimate.

Here height is the property or statistics.

Gaussian & Normal Distribution and it's PDF

PDF: Probability Density Function

Cumalative Distribution function

CDF graph is having half side on the left side of the mean and the half on the right side.

Here, if the variance is less then it means it is closer to the x=0 line.

Symmetric Distribtn, Skewness,Kurtosis

Skewness: How dissimilar is the distribution as compared to the symmetric distribution.

Standard Normal Variate

'Z' here represents a random variable.

Why are we using Standard Normal Variate?

Because after standardized the data, my 34.1 % of the data lies between 0 and 1.

Kernel Density Estimation

1. KDE is one way to smooth a histogram. Smoothed functions are easier to represent in functional form i.e., as f(x) as compared to non-smooth functions. Additionally, smooth functions represented as f(x) as easier to integrate. We need integration as CDF is an integration over PDF. Smooth functions represented in functional form are easier to manipulate to build more complex operators on the data.

2. KDE is used to obtain a PDF of an r.v. Histograms and PDFs inform of the density of data for each possible value an r.v takes. We have explained about PDFs and their interpretations in details in the above videos in EDA especially in this one: https://www.appliedaicourse.com/course/applied-ai-course-online/lessons/histogram-and-introduction-to-pdfprobability-density-function-1/

If we decrease the window size and increase the number of bins then the graph of PDF becomes zagged like structure

This is the kernel density gaussian kernels.

Yes, the variance for these Gaussian kernels is a parameter which we can tune based on our need for smoother or jagged PDFs. If we keep the variance very small, we get a very jagged PDf and if we keep it too wide, we get a useless and flat PDF. So, it is a tradeoff. In practice, most of the plotting libraries use rules-of-thumb or various algorithms to determine a reasonable variance.

For example in SciPy, the bandwidth parameter determines the variance of each kernel. There are many methods to determine the bandwidth parameter as per this function reference: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html

Sampling Distribution & Central Limit Theorem

The central limit theorem states that if you have a population with mean μ and standard deviation σ and take sufficiently large random samples from the population with replacement, then the distribution of the sample means will be approximately normally distributed. This will hold true regardless of whether the source population is normal or skewed, provided the sample size is sufficiently large (usually n >= 30). If the population is normal, then the theorem holds true even for samples smaller than 30

Distribution implies PDF(Probability Distribution Function)

In practice, you do not get to see the all the values of a population. Let us say, you want to estimate the mean height of all humans. It is impossible to have the height of each of the billions of humans in your dataset. What we obtain is a sample of hights of a subset of humans. Now, that is why sample-means are useful and important. Even for a dataset, you do not see the features values for every datapoint int he universe. What you see as a training dataset is only a sample of points.

Q-Q Plot - How to test if a random variable is normally distributed or not?

AD - Anderson Darling(Strongest test to determine Gaussian distribution)

Q-Q plot is generally used to understand whether the random variable is Normal or Gaussian distributed or not

X, Y- random Variable

Uniform Distribution

#outcomes = no. of outcomes and they are equally probable in the above example.

Bernoulli and Binomial Distribution

Bernoulli distribution is the distribution which has literally two outcomes.

Example - coin toss

Log Normal Distribution

from the logNormal distribution, we can find out it would be normally/Gaussian distributed or not? And further can interpret everything mathematically.

Power Law Distribution

If X follows power law distribution, then it's called Pareto distribution

Power Transform (Box-Cox transform)

Here, to transform power Law or Pareto random variable to Gaussian, we are using Power Transform.

Co-variance

How to measure the relationship between the random variable.

Pearson Correlation Coefficient(PCC)

Another way of measuring the relationship between two random variables.

So, the problem with covariance is that it doesn't measure the variability of the variables which is identified in PCC using sigma(x) and sigma(y) i.e standard and deviations.

For Example - We know if x increases and then y also increases, it means Covariance becomes +ve but we don't how much it's positive i.e don't know about its variability.

PCC=0 or rho=0 means no relationship between X and Y.

https://www.goconqr.com/en/notes/14442551/edit

Spearman Rank Correlation Coefficient

PCC method is valid for linear relationships only.

So, this method will not fix a complex relationship but it will resolve monotonically non-decreasing relationships.

Correlation vs Causation

If X and Y are co-related then it is absurd to say that the X causes Y i.e it doesn't mean X causes Y

Confidence interval (C.I) Introduction

Point Estimate - It means we are just finding the sample mean of some random values which may be equal to the population mean.

So, we find another way to do the same and its called Confidence Interval.

Here, As n increases xbar closer to mu.

Computing confidence interval given the underlying distribution

Here, we assume X follows the Normal or Gaussian distribution.

C.I for mean of a normal random variable

CLT - Central limit theorem

Confidence interval using bootstrapping(Very Imp for CI))

Here, we don't know what kind of distribution we have. We take n as a reasonable value(not so small) i.e preferably take above 20.

If n increases, we will be able to estimate C.I. of median better.

Hypothesis testing methodology, Null-hypothesis, p-value

Resampling and permutation test

We accept a hypothesis based on the p-value. Interesting idea to compute value is the permutation testing.

13/25

Power Law Distribution

If X follows power law distribution, then it's called Pareto distribution

Dirac Delta Function - Everywhere else is zero but at one given value x=1, it attains peak.

When alpha tending to infinity then Dirac delta function condition occurs.

So, for Log Normal, it attains a peak value and then a fall off and for Perito distribution, it has a peak value and then slightly fall off.

To predict X and Y have a power law relation, we can first find logX and logY and the draw Q-Q plot.