Reflections of a Data Scientist: (R) F-Test

You may remember the F-Test from the previous article on multiple linear regression. In this entry, we will further delve into the concept of the F-Test.

The F-Test is a statistical method for comparing two population variances. It’s most recognized utilization is as one of the aspects of the ANOVA method. This method will be discussed in a later article.

Essentially, the F-Test model enables the creation of a test statistic, critical value and a distribution model. With these values derived, a hypothesis test can be stated, and from such, the comparison of two variance can be achieved.

Some things to keep in mind before moving forward:

1. The F-Test assumes that the samples provided, originated from a normal distribution.

2. The F-Test attempts to discover whether two samples originate from populations with equal variances.

So for example, if we were comparing the following two samples:

samp1 <- c(-0.73544189, 0.36905647, 0.69982679, -0.91131589, -1.84019291, -1.02226811, -1.85088278, 2.24406451, 0.63377787, -0.80777949, 0.60145711, 0.43853971, -1.76386879, 0.32665597, 0.32333871, 0.90197004, 0.29803556, 0.47333427, 0.23710263, -1.48582332, -0.45548478, 0.36490345, -0.08390139, -0.46540965, -1.66657385)

samp2 <- c(0.67033912, -1.23197505, -0.18679478, 1.06563032, 0.08998155, 0.22634414, 0.06541938, -0.22454059, -1.00731073, -1.43042950, -0.62312404, -0.22700636, -0.71908729, -0.36873910, 0.15653935, -0.19328338, 0.56259671, 0.31443699, 1.02898245, 1.18903593, -0.14576090, 0.68375259, -0.15348007, 1.58654607, 0.01616986)

For a right tailed test, we would state the following hypothesis:

H0: σ2/1 =σ2/2
Ha: σ2/1>σ2/2

# Null Hypothesis = Variances are equal. #

# Alternative Hypothesis = The first measurement of variance is greater than the second measurement of variance. #

With both samples imported into R, we can now utilize the following code to perform the F-Test:

(We will assume an alpha of .05):

var.test(samp1, samp2, alternative = "greater", conf.level = .95)

Which produces the output:

F test to compare two variances

data: samp1 and samp2
F = 1.9112, num df = 24, denom df = 24, p-value =
0.05975
alternative hypothesis: true ratio of variances is greater than 1
95 percent confidence interval:
0.9634237 Inf
sample estimates:
ratio of variances

1.911201

Let us review each aspect of this output:

“ F = “ is the F-Test test statistic.

“num df = “ is the value of the degrees of freedom found within the numerator.

“denom df = “ is the value of the degrees of freedom found within the denomenator.

“p-value = “ is the probability of the corresponding F-Test statistic.

“95 percent confidence interval:” is the ratio between the two population variances at the 95% confidence level.

“ratio of variances” is the value of the variance of sample 1 divided by the variance of sample 2.

Looking at the p-value, which is greater than our alpha value (0.05975 > .05), we cannot conclude, that at a 95% confidence level, that our samples were taken from populations with differing variances.

Additionally, we can confirm this conclusions by comparing our F-Test statistic of 1.9112, to the F-Value which coincides with the appropriate degrees of freedom and alpha value. To find this value, we would typically consult a chart in the back of a statistics textbook. However, R makes the situation simpler by providing us with a method to reference this value.

Utilizing the code:

qf(.95, df1=24, df2=24) #Alpha .05, Numerator Degrees of Freedom = 24, Denomenator Degrees of Freedom = 24#

Produces the output:

[1] 1.98376

Again, we cannot conclude that because 1.9112 < 1.98376, that our samples were taken from populations with differing variances.

If we were to graph this test and distribution, the illustration would resemble:

If you would like to create your own f-distribution graphs, sans the mark-ups, you could use the following code:

curve(df(x, df1=24, df2=24), from=0, to=5) # Modify the degrees of freedom only #

Below is an illustration of a few various f-distribution types by varying degrees of freedom:

I hope that you found this article useful, in the next post, we will begin to discuss the concept of ANOVA.

* A helpful article pertaining to the F-Test statistic: http://atomic.phys.uni-sofia.bg/local/nist-e-handbook/e-handbook/eda/section3/eda359.htm

** Source for F-Distribution Image: https://en.wikipedia.org/wiki/F-distribution

Reflections of a Data Scientist

Monday, November 13, 2017

(R) F-Test

No comments:

Post a Comment