Reflections of a Data Scientist: (R) Hypothesis Tests of Proportions

In the past two articles, we discussed various topics related to proportions. In this article, we will continue to explore the various aspects pertaining to the subject of proportions. Specifically, this entry will discuss hypothesis tests as they are applicable to proportion data.

In testing data, we are faced with choosing between two separate hypotheses.

Those hypothesis are:

The Null Hypothesis - Which is a statement that is assumed to be true. This is the assertion that you will be seeking to disprove through the application of statistical methods.

The Alternative Hypothesis - This statement is the antithesis of the Null Hypothesis.

A rough example of stated hypothesis might resemble something like:

The Null Hypothesis - It is raining outside.

The Alternative Hypothesis - It is not raining outside.

This brings us to type errors, which cannot exist without hypothesis tests. There are two types of hypothesis errors.

The error types are:

Type I - Mistakenly rejecting a true Null Hypothesis.

Type II - Mistakenly failing to reject a false Null Hypothesis.

So let's try applying this knowledge to a few example problems.

###################

# A local university claims that only 20% of its incoming freshman class attend new student orientation. A statistician who is employed by the university believes that the real percentage is higher. He plans to ask 100 new students if they plan on attending the orientation. He will inform the admissions department if 30 or more students plan on attending. What is the probability of committing a Type I Error? #

H0:p = .20 (Null Hypothesis)
Ha:p > .20 (Alternative Hypothesis)

sqrt(.2 * .8/ 100)

[1] 0.04

(.3 - .2)/0.04

[1] 2.5

pnorm(2.5,lower.tail=FALSE)

[1] 0.006209665

Probability of Type I Error: .62 percent.

###################

# A radio manufacturer claims that 85% of the radios assembled from the latest batch are defective. A quality assurance representative believes that the number is lower and wishes to test at a 5% significance level. What is the conclusion if 90 of 125 radios are defective? #

H0:p = .85 (Null Hypothesis)
Ha:p < .85 (Alternative Hypothesis)

sqrt(.15 * .85/ 100)

[1] 0.03570714

# p = #
90/125

[1] 0.72

(.72 - .85)/0.03570714

[1] -3.640728

pnorm(-3.640728,lower.tail=FALSE)

[1] 0.9998641

Since 0.9998641 > .05, there is not sufficient evidence to reject the null hypothesis at the 5% significance level. The quality assurance representative should not challenge the manufacturer's claim.

###################

# A research group conducting a nutritional survey, decides to interview 500 households to test the hypothesis that 30% of the households eat dinner after 8:00 PM. Should the hypothesis be rejected at the 5% significance level if 200 families respond that they do indeed dine after 8:00 PM? #

H0:p = .30 (Null Hypothesis)
Ha:p NE .30 (Alternative Hypothesis)
Alpha = .05

sqrt(.30 * .70/ 100)

[1] 0.04582576

#p = #
200/500

[1] 0.4

(.4 - .3)/0.04582576

[1] 2.182179

pnorm(2.182179, lower.tail = FALSE) * 2

[1] 0.02909632

Since 0.02909632 < .05, there is sufficient evidence to reject the null hypothesis.

###################

# A local university has enrolled 10% of all eligible students from an adjacent neighborhood. The office of administration plans a survey of 2000 houses. If less 7% indicate that they are interested in potential enrollment, it will be concluded that the market share has dropped. What is the probability of a Type I Error? #

H0:p = .10 (Null Hypothesis)
Ha:p < .10 (Alternative Hypothesis)

sqrt(.10 * .90/ 100)

[1] 0.03

(.07 - .10)/0.03

[1] -1

pnorm(-1, lower.tail = TRUE)

[1] 0.1586553

Probability of Type I Error: 15.87 percent.

# What is the probability of a Type II Error if university enrollment is 8%. Meaning, what is the probability that we will fail to reject the 10% null hypothesis? #

(.07 - .08)/0.03

[1] -0.3333333
pnorm(-0.3333333, lower.tail = FALSE)

[1] 0.6305586

Probability of Type II Error: 63.05 percent.

In the next article, we will continue to investigate the hypothesis testing procedure, stay tuned Data Heads!

Reflections of a Data Scientist

Thursday, October 5, 2017

(R) Hypothesis Tests of Proportions

No comments:

Post a Comment