data science: chi-square
Data Science Day 3:Chi-square Test
Learning Objectives
1.DefinetheChi-Squaredistribution
2.Explainthe 3Chi-squaretest applications scenario
TheChi- Square distributionis thesum of variance(squared standard normal deviates). The following equation represents a Chi-Square distribution with m degrees of freedom.
V= X1^2+X2^2+...+Xm^2
where X1, X2, ... Xm are m independent random variables having the standard normal distribution.The higher the degree of freedom, the more it approaches to a normal distribution.
The Chi-Square distribution has 3 basicproperties:
Not symmetric, Skewed to the right
No Negative Values
Total area under the curve=1
Three primary Chi-square test applications:
1.Test independence of two categorical variables:
Whether the two categorical variables have a strong association, or whether the two categorical variables are independently distributed in one sample space.
Null hypothesis:Two categorical variables are independent.
Note:There are two categorical variables from one sample space
2*.Test the Goodness of Fit (Pearson):
Whether the sample categorical data are consistent with a hypothesized distribution.
Null hypothesis: Sample data are consistent with a specified distribution
Note:It is one Categorical variable from one sample space
3.Test of Homogeneity:
Whether frequency counts of the categorical variable have the same distribution for different sample spaces.
Null hypothesis: The proportion of the categorical variable is the same in all sample space.
Note:It is one categorical variable from two or more different sample space.
* In Clinical Trials, we use Chi-square log-rank test in survival analysis.
We will show the application examples next time!
Thanks very much to Renee Wu, Ali Motamedi~
Happy learning!