Reflections of a Data Scientist: (R) Hierarchical Cluster

Previously, we discussed how to appropriately perform hierarchical cluster analysis within the SPSS platform. In this entry, we will discuss the same method of analysis, however, we will be utilizing the R platform to perform this function.

If you are un-familiar with this particular methodology, or if you wish to re-familiarize yourself, please consult the following article: Hierarchical Cluster (SPSS).

Example:

For this demonstration, we will be utilizing the same data set which was previously utilized to demonstrate the process within the SPSS platform. This data set can be found within this site’s GitHub Repository.

# Load the data into the R platform #

# Be sure to change the ‘filepathway’ so that it matches the file location on your #
# computer #

HFrame <- read.table("C:\\filepathway\\hcluster.csv", fill = TRUE, header = TRUE, sep = "," )

# After specifying the variables to analyze (Cont_Var1, Cont_Var2, Cont_Var3), #

# we must utilize the dist() function to create a matrix which calculates the distance #
# between the variable observation values. #

clusters0 <- (dist(HFrame[, 4:6]))

# Now we are ready to prepare our model. "hclust()" is the function which we will #
# utilize to enable model generation. There are other agglomeration methods which #
# can specified to generate different model variations. However, in the case of our #
# example, we will be utilizing the “average” method, as it produces a model which #
# best resembles the equivalent SPSS output. #

clusters1 <- hclust(clusters0, method = "average" , members = NULL)

# Next, we will download and enable the package: “ggdendro”. This package allows #
# for the production of enhanced visualizations as it pertains hierarchical model #
# illustration. #

# Rotate the plot and remove default theme #

ggdendrogram(clusters1, rotate = TRUE, theme_dendro = FALSE)

# The above code should produce an output illustration which resembles an SPSS #
# graphic. #

# As was also the case within the SPSS example demonstration, we will choose 5 #
# clusters from which to classify our data observations. #

# k = 5 is designating the number of clusters to utilize for cluster categorization #

clusters2 <- cutree(clusters1, k = 5)

# Finally, we will download and enable the package: “dplyr" in order to utilize the #
# mutate() function. #

# This function allows us to create a new data frame which contains variable #
# observational values and their corresponding categorical distinctions. #

finalcluster <- mutate(HFrame, cluster = clusters2)

# The final data frame will resemble the following illustration #

Reflections of a Data Scientist

Tuesday, November 6, 2018

(R) Hierarchical Cluster

No comments:

Post a Comment