Reflections of a Data Scientist: (R) Sample Size and Margin of Error

If you are in the business of creating surveys, there are two important statistical concepts that you should be somewhat familiar with.

The first concept is known as margin of error. Margin of error can be defined as a value or as a percentage. What the margin of error illustrates, is the assumed variation that will occur within a statistical inference. For example, if a survey was taken of 100 individuals, and 60 of those individuals answered positively to a survey prompt, with a margin of error value of 5, the true value which is being inferred from the survey could alternate in either direction by the value of 5. Which would mean that the true value could range anywhere from 55-66.

Let's explore this concept in the following examples:

A pollster wants to infer, from surveying a population, the likelihood of a certain candidate winning a local election. The population of the voting public is 1000 individuals, and the pollster surveys 500 members of the voting body. From his survey, he has determined that 60% of those individuals are voting for candidate A. Assuming a confidence interval of .95, and a population percentage of 50%, what will the margin of error be for this survey?

####################################################

popsize <- 1000 # N #
samplesize <- 500 # n #
confidencelevel <- .95
populationpercentage <- .50 # p #

####################################################

con2 <- (1 - confidencelevel) / 2

MOE <- qnorm(con2, lower.tail = FALSE) * sqrt(populationpercentage * (1 - populationpercentage)) / sqrt((popsize - 1) * samplesize/(popsize - samplesize))

MOE

[1] 0.03100526

The margin of error is 3.10%.

Therefore, the pollster can conclude, with 95% confidence, that 60% of the population will vote for candidate A, with a margin of error of 3.10%. (The true value could fluctuate between 56.90% and 63.10%.

The next concept that we will review is sample size. This is essentially how many people need to take your survey for it be statistically significant. In this case, we will be solving for that value, and we will be assuming a population percentage of 50%.

The same pollster wants to survey another group of individuals to determine what type of milk that they would most prefer to drink. The pollster already knows his population size (1200), and has decided on a confidence level (95%) and a margin of error (5%). To meet this predetermined criteria, how many individuals should the pollster survey?

##################################

popsize <- 1200 # N #
confidencelevel <- .95
marginoferror <- .05 # MOE #
populationpercentage <- .5 # p #

##################################

con2 <- (1 - confidencelevel) / 2

a <- qnorm(con2, lower.tail = FALSE)^2 * populationpercentage * (1 - populationpercentage) / (marginoferror)^2

b <- 1 + qnorm(con2, lower.tail = FALSE)^2 * populationpercentage * (1 - populationpercentage) / ((marginoferror)^2 * popsize)

a/b

[1] 290.9928

The pollster should survey 291 individuals.

Below are some links that provide simplified online calculators, just in case you are not using R. Also provided, within those links, are more detailed definitions of the concepts illustrated above.

https://www.surveysystem.com/sscalc.htm

https://www.surveymonkey.com/mp/margin-of-error-calculator/

https://www.surveymonkey.com/mp/sample-size-calculator/

Reflections of a Data Scientist

Sunday, October 15, 2017

(R) Sample Size and Margin of Error

No comments:

Post a Comment