Reflections of a Data Scientist: (R) The Confidence Interval Estimate of Proportions

In this article, we will again be focusing on sample data. Specifically, what today's exercises will be demonstrating, is the process necessary to determine whether we can say that a proportion lies within a certain interval.

Example 1:

If 60% of a sample of 120 individuals leaving a diner claim to have spent over $12 for lunch, determine a 99% confidence interval estimate for the proportion of patrons who spent over $12.

sqrt(.4 * .6/ 100)

[1] 0.04898979

z <- qnorm(.005, lower.tail=FALSE) * 0.04898979
# .005 is due to our test containing 2 tails #

.60 + c(-z, z)

[1] 0.4738107 0.7261893

Conclusion: We are 99% certain that the proportion of diner patrons spending over $12 for lunch is between 0.4738107 (47.34%) and 0.7261893 (72.62%).

Example 2:

In random sample of lightbulbs being produced by a factory, 20 out of 300 were found to be shattered during the shipping process. Establish a 95% confidence interval estimate that accounts for these damages.

p = 20/300

[1] 0.06666667

sqrt(.066 * .934 / 100)

[1] 0.02482821

z <- qnorm(.025, lower.tail=FALSE) * 0.02482821

0.06666667 + c(-z,z)

[1] 0.01800427 0.11532907

Therefore, we can be 95% certain that the proportion of light bulbs damaged during the shipping process is between 0.01800427 (1.80% and 11.53%).

Furthermore, if we wished, we could apply these ratios to a total shipment to create an estimation.

If 1000 light bulbs shipped, we can be 95% confident that between 18 and 115.3 light bulbs are damaged within the shipment.

In the next article, we will be discussing hypothesis tests. Until then, stay tuned Data Monkeys!

Reflections of a Data Scientist

Wednesday, September 20, 2017

(R) The Confidence Interval Estimate of Proportions

No comments:

Post a Comment