Chi-Square test is a non-parametric test that is performed on 1 categorical variable.
Non-parametric means- data is not required to fit normal distribution.
This test is used to determine if the observed data is consistent with the expected data.
This test would be clear by taking an example:
Lets say there is a country. Growth in GDP or gross domestic product of that country over a period of 10 years is there in the data as below-
This country is showing progress in terms of percentage rise of GDP in span of ten years as per the data given.
Now there is one more table available as below which shows us data of 50 cities of that country (overall having 150 cities) having GDP in billion dollar scale in row 2:
The table above is our sample table.
Now, let’s see our problem statement- “Is GDP for the country changed over period of 10 years scale?”
If we were to drill down the problem statement, the main question here is – As per our observed data in table 1,
and as per our subset of that data for 50 cities of that country in table 2, “Has GDP showing progress over all these years?”
Our observation for GDP for 50 cities is 20 billion dollars, 15 billion dollars and 17 billion dollars for span of 10 years each. This is what we observe and what is given to us.
What is expected is now is, what we calculate as per table 1.
There are 50 cities and from 1981-1990, rise of GDP is 30% so expected value for this time span can be written as follows:
|Expected||50*0.3 = 15||50*0.35 = 17.5||50*0.45 = 22.5|
As per the table above, we expect that for 1981-1990 the GDP will be 15 billion dollars.
For 1991-2000, GDP will be 17.5 billion dollars.
For 2001-2010, GDP will be 22.5 billion dollars.
So for now we will check if Observed data is consistent with the expected data with help of CHI-SQUARE TEST.
There are certain steps involved in this test that one must follow:
1. Defining Null Hypothesis and Alternative Hypothesis.
A) Null Hypothesis: Our observed data meets the expected data.
B) Alternative Hypothesis: Observed data does not meet the expected data.
2. Stating level of significance (α).
Some companies define this as 0.05 which means scope of getting correct results are 95% out of 100%
For now let us take α=0.05
3. Defining degree of freedom (df).
df is the number of data groups we have minus 1.
Here we have 3 groups, so df = 3-1
df = 2 in our case
4. Defining Critical Value
C.V. or critical value can be checked by the chi-square table given below:
For α=0.05 and df=2 our critical value is 5.99
So, if after our chi-square test, if the result is greater than 5.99, we will consider Alternative Hypothesis and if result is less than 5.99, we will then consider Null Hypothesis.
5. Calculating Chi-Square (X^2)
The formula to calculate Chi-square is Σ(Observed value – expected value)^2 / expected value.
So, on the above table we apply the Chi-square formula.
Chi-Square = (20-15)^2/15 + (15-17.5)^2/17.5 + (17-22.5)^2/22.5
Chi-Square = 1.67 + 0.36 + 1.34 = 3.37
Now, our Chi-Square value is less than our critical value defined in step 4.
So, we can say that we accept our null-hypothesis i.e, the GDP of the Country is showing progress over the years as the Expected data fits the Observed data.
So, the sample of country having GDP that we observed is same as the GDP for 50 cities that we expected.