14.2 Two-Samples: Hypothesis Testing on the Difference in Proportions

Comparing two proportions is often necessary to see if they are significantly different from each other. For example, suppose you do a randomized control study on 40 people, half assigned to a treatment and other half assigned to a placebo. 18/20 from the experiment group got better, while 15/20 from the control group also got better. Are these two proportions significantly different from each other? Is the treatment effective?¹³

We are interested in testing the hypotheses:

\[H_0: p_1=p_2\]

\[H_1: p_1 \neq p_2\]

If the null hypothesis $H_0: p_1=p_2$ is true, using the fact that $p_1=p_2=p,$ the random variable

\[ Z=\dfrac{(\hat{p_1}-\hat{p_2})}{\sqrt{\hat{p}(1-\hat{p}) \left( \dfrac{1}{n_1} +\dfrac{1}{n_2}\right) } }\]

has approximately a standard normal distribution, $N(0,1)$.

Where $\hat{p_1}=x_1/n_1$ $=x_2/n_2 $ and the estimator of the common $p$, or pooled sample proportion, is:

\[\hat{p}=\dfrac{x_1+x_2}{n_1+n_2}\] \[\hat{p}=\dfrac{p_1n_1+p_2n_2}{n_1+n_2}\]

14.2.1 The Hypotheses and $p$-value

The null hypothesis is our statement of no effect. In this case our null hypothesis is that there is no difference between the two population proportions. We can write this as $H_0: p_1 = p_2$.¹⁴

The alternative hypothesis is one of three possibilities, depending upon the specifics of what we are testing for:

$H_1$: $p_1$ is greater than $p_2$.
- This is a one-tailed or one-sided test.
- Equivalent to: $H_1: p_1 - p_2 > 0$
- $p$-value is the proportion of the normal distribution that is greater than Z.
$H_1$: $p_1$ is less than $p_2$.
- This is also one-sided test.
- Equivalent to: $H_1: p_1 - p_2 < 0$
- $p$-value is the proportion of the normal distribution that is less than Z.
$H_1$: $p_1$ is not equal to $p_2$.
- This is a two-tailed or two-sided test.
- Equivalent to: $H_1: p_1 - p_2 \neq 0$
- $p$-value is the the proportion of the normal distribution that is greater than $|Z|$, the absolute value of $Z$.

14.2.2 Decision rule

Now we make a decision on whether to reject the null hypothesis (and thereby accept the alternative), or to fail to reject the null hypothesis. We make this decision by comparing our p-value to the level of significance $\alpha$.

If the $p$-value is less than or equal to $\alpha$, then we reject the null hypothesis. This means that we have a statistically significant result and that we are going to accept the alternative hypothesis.

If the $p$-value is greater than $\alpha$, then we fail to reject the null hypothesis. This does not prove that the null hypothesis is true. Instead it means that we did not obtain convincing enough evidence to reject the null hypothesis.

14.2.3 Example 01

Suppose the Acme Drug Company develops a new drug, designed to prevent colds. The company states that the drug is equally effective for men and women. To test this claim, they choose a a simple random sample of $100$ women and $200$ men from a population of $100,000$ volunteers.

At the end of the study, $38\%$ of the women caught a cold; and $51\%$ of the men caught a cold. Based on these findings, can we reject the company’s claim that the drug is equally effective for men and women? Use a $0.05$ level of significance.

From here

Data

\[ \hat{p_1}=0.38; \,\,\, \hat{p_2}=0.51; \,\,\, n_1=100 \,\,\, n_2=200 \]

Hypothesis

\[H_0: p_1 = p_2\] \[H_1: p_1 \neq p_2\]

t-stat

\[\hat{p}=\dfrac{0.38 \times 100 + 0.51 \times 200}{100+200}=0.467\]

\[ Z=\dfrac{(0.38-0.51)}{\sqrt{0.467 \times (1-0.467) \times \left( \dfrac{1}{100} +\dfrac{1}{200}\right) } } = \dfrac{-0.13}{0.061} = -2.13115\]

p-value

The $p$-value is the probability of being less or greater than 2.13 is $P(z < -2.13115) = 0.01659$, and $P(z > 2.13115) = 0.01659$.
Thus, the $p$-value = $0.01654 + 0.01654 = 0.03308$.
Since $p$-value is just less than $0.05$, we have enough evidence to reject $H_0$.

Conclusion

Since the $p$-value ($0.034$) is less than the significance level ($\alpha =0.05$), we cannot accept the null hypothesis.

Your turn

Suppose the previous example is stated a little bit differently. Suppose the Acme Drug Company develops a new drug, designed to prevent colds. The company states that the drug is more effective for women than for men. To test this claim, they choose a a simple random sample of 100 women and 200 men from a population of 100,000 volunteers.

At the end of the study, $38\%$ of the women caught a cold; and $51\%$ of the men caught a cold. Based on these findings, can we conclude that the drug is more effective for women than for men? Use a 0.01 level of significance.

Econometrics I | Class Notes