Reflections of a Data Scientist: (R) The Normal Distribution

Now that you understand how to identify a normal distribution, we can utilize R to perform calculations that are specific to this distribution type.

When a normal distribution has been identified, we can estimate the probability that an event takes place as it occurs between two values.

I am assuming that you have some understanding of normal distributions in addition to what was discussed in the prior entry.

Example:

You are currently employed as a statistician in a factory that produces flashlights. The senior statistician informs you that the premium brand of flashlights that the factory produces, have a battery life expectancy which is normally distributed. The mean for the battery life of this particular brand is 20 hours, with a standard deviation of 5 hours.

What is the probability that a randomly selected flashlight from the production line will last between 20-25 hours?

pnorm(q=25, mean=20, sd=5, lower.tail=TRUE)

# Output = 0.8413447 #

0.8413447 - .50

# 0.3413447 or % 34.134 Probability #

If a flashlight's battery dies at exactly 8 hours after use, how many standard deviations away from the mean is this value?

# (x - mean) / standard deviation #

(8 - 20) / 5

# - 2.4 Standard Deviations #

What is the probability that a randomly selected flashlight from the production line will last between 18-24 hours?

pnorm(q=18, mean=20, sd=5, lower.tail=FALSE)

# Output = 0.6554217 #

0.6554217 - .50

# Output = 0.1554217 #

pnorm(q=24, mean=20, sd=5, lower.tail=TRUE)

# Output = 0.7881446 #

0.7881446 - .50

# Output = 0.2881446 #

0.2881446 + 0.1554217

# 0.4435663 or % 44.357 Probability #

What is the probability that a randomly selected flashlight from the production line will last between 22-26 hours?

pnorm(q=22, mean=20, sd=5, lower.tail=FALSE)

# Output = 0.3445783 #

.50 - 0.3445783

# Output = 0.1554217 #

pnorm(q=26, mean=20, sd=5, lower.tail=TRUE)

# Output = 0.8849303 #

0.8849303 - 0.1554217 - .50

# 0.2295086 or % 22.950 Probability #

At the same factory, while eating lunch, the senior statistician appears again. During this encounter, he decides to test your statistical abilities by asking you a series of questions.

These questions are:

Given a normal distribution with a mean of 55, what is the standard deviation if 45% of the values are above 70?

qnorm(.45, lower.tail=FALSE)

# Output = 0.1256613 #

70 - 55

# Output = 15 #

15 / .1256613

# Standard Deviation = 119.3685 #

Given a normal distribution with a standard deviation of 15, what is the mean if 25% of the values are below 45?

qnorm(.25, lower.tail = FALSE)

# Output = 0.6744898 #

45 + (0.6744898 * 15)

# Output = 55.11735 #

Given a normal distribution with 60% of the values above 100, and 90% of the values above 80, what are the mean and the standard deviation?

qnorm(.60, lower.tail=TRUE)

# Output = 0.2533471 #

qnorm(.90, lower.tail=TRUE)

# Output = 1.281552 #

# (100 - Mean)/Standard Deviation = 0.2533471 #
# (80 - Mean)/Standard Deviation = 1.281552 #

# 100 - Mean = 0.2533471 * Standard Deviation #
# 80 - Mean = 1.281552 * Standard Deviation #

Which can be worked out, algebraically, to solve for both mean and standard deviation.

That is all of this entry, which closes out the 50th article that I have written for this blog. Two things to remember about normal distributions: there is no perfect test for normality, and there is no way to provide a probability for a single event occurring within a continuous normal distribution. All that we can find, is the probability surrounding an event's parameters.

Stay tuned until next time, Data Heads.

Reflections of a Data Scientist

Monday, August 28, 2017

(R) The Normal Distribution - Pt. II

No comments:

Post a Comment