Reflections of a Data Scientist: Hierarchical Cluster (SPSS)

In continuing with the recent of trend of articles pertaining to cluster analysis, today’s article will discuss Hierarchical Cluster analysis.

Example:

(We will again be utilizing a modified version of the pervious data set)

From the “Analyze” menu, select “Cluster”, then select “Hierarchical Cluster”.

To create the analysis, I have chosen the variables “Cont_Var1”, “Cont_Var2” and “Cont_Var3” to be utilized within our model.

After selecting the option “Statistics”, the following sub-menu should appear:

There is no need to change any of the options within this sub-menu.

From the “Plots” sub-menu, I have selected “Dendrogram”, as a “Dendogram” is a unique tool exclusive to this model type. I have de-selected “Icicle”, as it does not provide any additional pertinent data.

From the “Method” sub-menu, I have enabled the option which allows for the standardization of model variables. “Standardize” is what causes this occur. A few options are presented when utilizing this drop down menu, I have chosen the option “Range -1 to 1”.

“Cluster Method” and “Interval” can remain as the model defaults. However, other combinations can be utilized as well. These model options are specifying the underlying algorithm that will be utilized to produce analysis.

The “Save” option presents the above sub-menu. “Single Solution” outputs a column to the original data set that contains cluster membership of each observation based on the number of clusters specified by the user. “Range of solutions” outputs multiple columns which contain cluster membership for each specified cluster selection.

After all options have been specified, we can continue with the generation of our analysis by clicking “OK”.

This produces the following output:

The “Case Processing Summary” details the number of observations analyzed, and the number of observations missing from the model. You can ignore the “Agglomeration Schedule”.

Finally, we will inspect the “Dendrogram”. Through inspection of the clusters and the linkage of the data observations, we can ultimately decide on the number of clusters that we would like to include within our model. After such, we would then recreate the model. However, with the subsequent iteration, we would select “Single solution”, then specify the number of clusters needed from the “Save” sub-menu. The greater the number of clusters that are ultimately selected, the less broad the categorical clustering will be.

In inspecting the “Dendogram”, I have decided to create a model with 5 specific clusters.

The reason for my decision is illustrated in the above graphic, which demonstrates the cluster count which will be included within the new model. (NOTE: Case 7 is so independent in analysis that it qualifies as its own singular cluster).

Finally, we again return to our data set and while maintaining all previous options, we select “Save”. In this sub-menu, we specify “Single solution Number of clusters:” as 5.

Continuing on and creating our model, will generate a new column within our table which specifies cluster membership.

That’s all for hierarchical cluster analysis. Stay tuned until next time, Data Heads!

Reflections of a Data Scientist

Monday, February 5, 2018

Hierarchical Cluster (SPSS)

No comments:

Post a Comment