In this article, we will be discussing Odds Ratio, Relative Risk, The Cohen's Kappa Statistic, and their relation to data contained within contingency tables. Additionally, I will be sharing some code which I have created within the R platform which will simplify the process of generating these figures.

We will address the typical scenario which involves smoking as it relates to cancer diagnosis.

We will create three data vectors. Each data vector will contain data collected from a single individual. The first data vector will contain an identification variable pertaining to individual. The next data vector will contain a binary variable which identifies smokers and non-smokers alike. The final variable will indicate which individuals are afflicted with cancer.

This data frame now must be transformed into a matrix before it can be analyzed.

This can be achieved through the following code:

__Example:__We will address the typical scenario which involves smoking as it relates to cancer diagnosis.

We will create three data vectors. Each data vector will contain data collected from a single individual. The first data vector will contain an identification variable pertaining to individual. The next data vector will contain a binary variable which identifies smokers and non-smokers alike. The final variable will indicate which individuals are afflicted with cancer.

# Create Data Vectors #

id <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)

smoker <- c(0,0,0,1,1,0,0,1,1,1,1,1,0,1,1,1,0,0,0,0)

cancer <- c(0,0,1,1,0,0,1,1,1,1,1,1,0,0,1,1,1,0,0,0)

smokerscancer <- data.frame(id, smoker, cancer)# Create Data Vectors #

id <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)

smoker <- c(0,0,0,1,1,0,0,1,1,1,1,1,0,1,1,1,0,0,0,0)

cancer <- c(0,0,1,1,0,0,1,1,1,1,1,1,0,0,1,1,1,0,0,0)

smokerscancer <- data.frame(id, smoker, cancer)

This data frame now must be transformed into a matrix before it can be analyzed.

This can be achieved through the following code:

**# Organize Data into Contingency Table #**

x <- smokerscancer$smoker

y <- smokerscancer$cancer

cases <- data.frame(x,y)

cases$y <- c(ifelse(cases$y == 1, 'aTrue', 'bFalse'))

cases$x <- c(ifelse(cases$x == 1, 'aTrue', 'bFalse'))

cases <- cases[complete.cases(cases), ]

ctable <- table(cases)x <- smokerscancer$smoker

y <- smokerscancer$cancer

cases <- data.frame(x,y)

cases$y <- c(ifelse(cases$y == 1, 'aTrue', 'bFalse'))

cases$x <- c(ifelse(cases$x == 1, 'aTrue', 'bFalse'))

cases <- cases[complete.cases(cases), ]

ctable <- table(cases)

**# Apply Chi-Squared Test of Independence #**

chisq.test(ctable)

chisq.test(ctable)

Which produces the following output:

*Pearson's Chi-squared test with Yates' continuity correction*

data: ctable

X-squared = 3.2323, df = 1, p-value = 0.0722

data: ctable

X-squared = 3.2323, df = 1, p-value = 0.0722

For more information as how to interpret Chi-squared output, please consult this article.

The following code produces additional output, along with the original contingency table:

**# Produce Odds Ratio / Relative Risk Outputs #**

totala <- ctable[1,1] + ctable[1,2]

totalb <- ctable[2,1] + ctable[2,2]

percent1 <- (ctable[1,1] / totala) * 100

percent2 <- (ctable[1,2] / totala) * 100

percent3 <- (ctable[2,1] / totalb) * 100

percent4 <- (ctable[2,2] / totalb) * 100

totalc <- ctable[1,2] + ctable[2,2]

totald <- ctable[1,1] + ctable[2,1]

totale <- totalc + totald

rowtotala <-totald/(totalc + totald) * 100

rowtotalb <-totalc/(totalc + totald) * 100

rowtotalc <- (rowtotala + rowtotalb)

keycol <- c("Count", "(%)", "Count", "(%)", "SumCount", "(%)")

col1 <- c(ctable[1,1], percent1, ctable[2,1], percent3, totald, rowtotala)

col2 <- c(ctable[1,2], percent2, ctable[2,2], percent4, totalc, rowtotalb)

rowtotals <- c((ctable[1,1] + ctable[1,2]), (percent1 + percent2), (ctable[2,1] + ctable[2,2]), (percent3 + percent4), (totalc + totald), rowtotalc)

data <- data.frame(keycol, col1, col2, rowtotals)

vala <- percent3 / percent1

valb <- percent4 / percent2

valab <- valb/vala

num <- (ctable[1,1]) / (ctable[1,1] + ctable[1,2])

den <- (ctable[2,1]) / (ctable[2,1] + ctable[2,2])

relrisk <- num/den

totala <- ctable[1,1] + ctable[1,2]

totalb <- ctable[2,1] + ctable[2,2]

percent1 <- (ctable[1,1] / totala) * 100

percent2 <- (ctable[1,2] / totala) * 100

percent3 <- (ctable[2,1] / totalb) * 100

percent4 <- (ctable[2,2] / totalb) * 100

totalc <- ctable[1,2] + ctable[2,2]

totald <- ctable[1,1] + ctable[2,1]

totale <- totalc + totald

rowtotala <-totald/(totalc + totald) * 100

rowtotalb <-totalc/(totalc + totald) * 100

rowtotalc <- (rowtotala + rowtotalb)

keycol <- c("Count", "(%)", "Count", "(%)", "SumCount", "(%)")

col1 <- c(ctable[1,1], percent1, ctable[2,1], percent3, totald, rowtotala)

col2 <- c(ctable[1,2], percent2, ctable[2,2], percent4, totalc, rowtotalb)

rowtotals <- c((ctable[1,1] + ctable[1,2]), (percent1 + percent2), (ctable[2,1] + ctable[2,2]), (percent3 + percent4), (totalc + totald), rowtotalc)

data <- data.frame(keycol, col1, col2, rowtotals)

vala <- percent3 / percent1

valb <- percent4 / percent2

valab <- valb/vala

num <- (ctable[1,1]) / (ctable[1,1] + ctable[1,2])

den <- (ctable[2,1]) / (ctable[2,1] + ctable[2,2])

relrisk <- num/den

**ctable**

data

valab # (Odds Ratio) col1 (%/%) / col2 (%/%) #

relrisk # Relative Risk = TRUE col1 (%/%) #

data

valab # (Odds Ratio) col1 (%/%) / col2 (%/%) #

relrisk # Relative Risk = TRUE col1 (%/%) #

Which produces the following output:

y

x aTrue bFalse

aTrue 8 2

bFals 3 7

keycol col1 col2 rowtotals

1 Count 8 2 10

2 (%) 80 20 100

3 Count 3 7 10

4 (%) 30 70 100

5 SumCount 11 9 20

6 (%) 55 45 1 00

valab # (Odds Ratio) col1 (%/%) / col2 (%/%) #

[1] 9.333333

relrisk # Relative Risk = TRUE col1 (%/%) #

[1] 2.666667

y

x aTrue bFalse

aTrue 8 2

bFals 3 7

keycol col1 col2 rowtotals

1 Count 8 2 10

2 (%) 80 20 100

3 Count 3 7 10

4 (%) 30 70 100

5 SumCount 11 9 20

6 (%) 55 45 1 00

valab # (Odds Ratio) col1 (%/%) / col2 (%/%) #

[1] 9.333333

relrisk # Relative Risk = TRUE col1 (%/%) #

[1] 2.666667

But how are the results within this table interpreted?

If your data is structured in the manner in which the contingency table within the example is organized, we can make the following statements.

Risk ratios indicated that the risk of the outcome variable (cancer), within the category of smokers, increased by 167 %* relative to the group of non-smokers. Or, the outcome of this event was 2.667 times more likely to occur within the smoker group.

The odds ratio indicates that the odds of finding cancer within an individual who smokes, as compared to an individual who does not smoke, is 9.333.

Cohen’s Kappa, like the odds ratio and the risk ratio, is utilized to analyze the relationships of data within a contingency table. However, while the previously mentioned methods are universally applicable, Cohen’s Kappa can only be applied to contingency tables which measure categorical ratings.

Two film critics are rating a series of 20 movies. Each critic can either vote “Yes”, as an indication that he enjoyed the film, or “No”, as an indication that he did not enjoy the film. The results were recorded within a binary format.

With this information, we wish to measure the level of agreement between the two raters. One of the main ways in which this technique differs from traditional probabilistic methodologies, is that this particular technique accounts for chance agreement.

Now that we have our data values correctly input, we can proceed with the analysis.

I have

Opinion as to how to interpret the kappa statistic varies. Written below are the recorded opinions of professional analysts pertaining to such:

If we were to apply this model in more gregarious manner, for the sake of spectating at the aspects of its inner workings, the following code could be applied:

That’s all for now, Data Heads! Stay tuned for more awesome articles!

If your data is structured in the manner in which the contingency table within the example is organized, we can make the following statements.

Risk ratios indicated that the risk of the outcome variable (cancer), within the category of smokers, increased by 167 %* relative to the group of non-smokers. Or, the outcome of this event was 2.667 times more likely to occur within the smoker group.

The odds ratio indicates that the odds of finding cancer within an individual who smokes, as compared to an individual who does not smoke, is 9.333.

**= Absolute value of (1 – RR), multiplied by 100.*__Cohen’s Kappa__Cohen’s Kappa, like the odds ratio and the risk ratio, is utilized to analyze the relationships of data within a contingency table. However, while the previously mentioned methods are universally applicable, Cohen’s Kappa can only be applied to contingency tables which measure categorical ratings.

__Example:__Two film critics are rating a series of 20 movies. Each critic can either vote “Yes”, as an indication that he enjoyed the film, or “No”, as an indication that he did not enjoy the film. The results were recorded within a binary format.

**# Create Vectors #****movieid <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)**

criticoneratings <- c(0,0,0,1,1,0,0,1,1,1,1,1,0,1,1,1,0,0,0,0)

critictworatings <- c(0,0,1,1,0,0,1,1,1,1,1,1,0,0,1,1,1,0,0,0)

movieratings <- data.frame(movieid, criticoneratings, critictworatings)criticoneratings <- c(0,0,0,1,1,0,0,1,1,1,1,1,0,1,1,1,0,0,0,0)

critictworatings <- c(0,0,1,1,0,0,1,1,1,1,1,1,0,0,1,1,1,0,0,0)

movieratings <- data.frame(movieid, criticoneratings, critictworatings)

With this information, we wish to measure the level of agreement between the two raters. One of the main ways in which this technique differs from traditional probabilistic methodologies, is that this particular technique accounts for chance agreement.

Now that we have our data values correctly input, we can proceed with the analysis.

**# Organize Data into Contingency Table #**

x <- movieratings$criticoneratings

y <- movieratings$critictworatings

cases <- data.frame(x,y)

cases$y <- c(ifelse(cases$y == 1, 'aTrue', 'bFalse'))

cases$x <- c(ifelse(cases$x == 1, 'aTrue', 'bFalse'))

cases <- cases[complete.cases(cases), ]

ctable <- table(cases)

# With the “psych” package downloaded and enabled #

cohen.kappa(ctable,alpha=.05)

This produces the output:x <- movieratings$criticoneratings

y <- movieratings$critictworatings

cases <- data.frame(x,y)

cases$y <- c(ifelse(cases$y == 1, 'aTrue', 'bFalse'))

cases$x <- c(ifelse(cases$x == 1, 'aTrue', 'bFalse'))

cases <- cases[complete.cases(cases), ]

ctable <- table(cases)

# With the “psych” package downloaded and enabled #

cohen.kappa(ctable,alpha=.05)

This produces the output:

*Call: cohen.kappa1(x = x, w = w, n.obs = n.obs, alpha = alpha, levels = levels)*

Cohen Kappa and Weighted Kappa correlation coefficients and confidence boundaries

lower estimate upper

unweighted kappa 0.12

weighted kappa 0.12 0.5 0.88

Number of subjects = 20Cohen Kappa and Weighted Kappa correlation coefficients and confidence boundaries

lower estimate upper

unweighted kappa 0.12

**0.5**0.88weighted kappa 0.12 0.5 0.88

Number of subjects = 20

I have

**bolded**the kappa statistic (.5) contained within the output.Opinion as to how to interpret the kappa statistic varies. Written below are the recorded opinions of professional analysts pertaining to such:

*“Landis and Koch, characterized values < 0 as indicating no agreement, and 0-0.20 as slight, 0.21-0.40 as fair, 0.41-0.60 as moderate, 0.61-0.80 as substantial, and 0.81-1 as almost perfect agreement.*

Fleiss's equally arbitrary guidelines characterize kappa over 0.75 as excellent, 0.40 to 0.75 as fair to good, and below 0.40 as poor. “ **Fleiss's equally arbitrary guidelines characterize kappa over 0.75 as excellent, 0.40 to 0.75 as fair to good, and below 0.40 as poor. “ **

***- https://en.wikipedia.org/wiki/Cohen’s_kappa*If we were to apply this model in more gregarious manner, for the sake of spectating at the aspects of its inner workings, the following code could be applied:

**# Observed Proportionate Agreement #**

**# Define Cell Entries #**

**a <- ctable[1,1]****b <- ctable[1,2]**

**c <- ctable[2,1]****d <- ctable[2,2]**

**# Observed Proportionate Agreement (p0) #**

**p0 <- (a + d) / (a + b + c + d)**

**# Expected probability that both would say yes at random (pYes) #**

**pyes <- ((a + b) / (a + b + c + d)) * ((a + c) / (a + b + c + d))**

**# Expected probability that both would say no at random (pNo) #**

**pno <- ((c + d) / (a + b + c + d)) * ((b + d) / (a + b + c + d))**

**# Overall random agreement probability is the probability that they #****# agreed on either Yes or No (pe) #**

**pe <- pyes + pno**

**# Cohen's Kappa #**

**K <- (p0 - pe) / (1 - pe)**That’s all for now, Data Heads! Stay tuned for more awesome articles!

## No comments:

## Post a Comment

Note: Only a member of this blog may post a comment.