Reflections of a Data Scientist: (R) K-Means Cluster

In continuing with the premise of the prior article, we will again explore a previously discussed methodology which was last demonstrated within the SPSS platform.

If you are un-familiar with this particular methodology, or if you wish to re-familiarize yourself, please consult the following article: K-Means Cluster (SPSS).

Example:

For this demonstration, we will be utilizing the same data set which was previously utilized to demonstrate the analytical process within the SPSS platform. This data set can be found within this site’s GitHub Repository.

# Load the data into the R platform #

# Be sure to change the ‘filepathway’ so that it matches the file location on your #
# computer #

KMeans <- read.table("C:\\filepathway\\kmeans.csv", fill = TRUE, header = TRUE, sep = "," )

# We’re going to assume that the variables: ‘ZCont_Var1’, ‘ZCont_Var2’, ‘ZCont_3’ #
# are not included within the initial data frame. #

# Therefore, we must scale the variables: ‘Cont_Var1, ‘Cont_Var2’, ‘Cont_Var3’ prior to #

# performing analysis. #

ScaledKMeans <- scale(KMeans[4:6])

# In this example, we are going to create a two cluster model. Also, we will be utilizing #
# an ‘n’ value of 10. This figure represents the number of iterations which will be #
# attempted while the underlying mechanism of the model decides on an #
# appropriate configuration. #

ScaledKMeansCluster <- kmeans(ScaledKMeans, 2, nstart = 10)

# Once the model has been created, we will assign the cluster values to a data frame #
# in order to discern which cluster categorizations pertain to each observational value. #

KMeans$ClusterID <- as.factor(ScaledKMeansCluster$cluster)

# Finally, we will graph the results of the analysis. The variables which will represent #
# the scales of the graph’s axis are: Zcont_Var1 and Zcont_Var2. #

# To achieve the desired result, we must download and enable the package: “ggplot2” #

ggplot(KMeans, aes(KMeans$Zcont_Var1, KMeans$Zcont_Var2, color = KMeans$ClusterID)) + geom_point()

Console Output:

Reflections of a Data Scientist

Tuesday, November 6, 2018

(R) K-Means Cluster

No comments:

Post a Comment