Reflections of a Data Scientist: (R) P-P Plot (SPSS)

Sharing many similarities with the Q-Q Plot, the P-P Plot is a lesser utilized graphical methodology used to compare the probability of data points from a single variable set, to the linear function of a normal probability distribution.

Example:

Let’s begin with our sample data set:

To begin our analysis, we must select, from the topmost menu, “Analyze”, then “Descriptive Statistics”, followed by “P-P Plot”.

This should cause the following menu to appear:

Through the utilization of the center arrow, designate “Y” as a variable. Once this has been completed, click “OK”. This should cause the following output to be generate.

Ideally, if the sample data is normally distributed, the dotted plots should follow the solid trend line as closely as possible. While the “Q-Q Plot”, plots the quartiles of a normal distribution against sample data, the “P-P Plot”, plots the cumulative probability distribution of the sample data against a cumulative normal probability distribution.

In almost all cases, to avoid confusion, you should preferably utilize the Q-Q Plot.

To generate the same output within R, we would utilize the following code:

Example (R):

# With the package ‘qqplotr’ downloaded and enabled #

# Create vector #

y <- c(70, 80, 73, 77, 60, 93, 85, 72, 90, 85)

# Transform vector into data frame #

y <- data.frame(norm = y)

# Create graphical output #

gg <- ggplot(data = y, mapping = aes(sample = norm)) +
stat_pp_band() +
stat_pp_line() +
stat_pp_point() +
labs(x = "Probability Points", y = "Cumulative Probability")
gg

This should produce the following visual output:

That’s all for now. See you soon for more interesting content, Data Heads!

Reflections of a Data Scientist

Friday, April 20, 2018

(R) P-P Plot (SPSS)

No comments:

Post a Comment