Sunday, July 18, 2021

Pivot Tables (MS-Excel)

You didn’t honestly believe that I would continue to write articles without mentioning every analyst’s favorite Excel technique, did you?

Example / Demonstration:

For this demonstration, we are going to be utilizing the, “Removing Duplicate Entries (MS-Excel).csv” data file. This file can found within GitHub data repo, upload data: July 12, 2018. If you are too lazy to navigate over the repo site, the raw .csv data can be found down below:

VARA,VARB,VARC,VARD
Mike,1,Red,Spade
Mike,2,Blue,Club
Mike,1,Red,Spade
Troy,2,Green,Diamond
Troy,1,Red,Heart
Archie,2,Orange,Heart
Archie,2,Yellow,Diamond
Archie,2,Orange,Heart
Archie,1,Red,Spade
Archie,1,Blue,Spade
Archie,2,Red,Club
Archie,2,Red,Club
Jack,1,Red,Diamond
Jack,2,Blue,Diamond
Jack,2,Blue,Diamond
Rob,1,Green,Club
Rob,2,Orange,Spade
Brad,1,Red,Heart
Susan,2,Blue,Heart
Susan,2,Yellow,Club
Susan,1,Pink,Heart
Seth,2,Grey,Heart
Seth,1,Green,Club
Joanna,2,Pink,Club
Joanna,1,Green,Spade
Joanna,1,Green,Spade
Bertha,2,Grey,Diamond
Bertha,1,Grey,Diamond
Liz,1,Green,Spade


Let’s get started!

First, we’ll take a nice look at the data as it exists within MS-Excel:


Now we’ll pivot to excellence!

The easiest way to start building pivot tables, is to utilize the “Recommended PivotTables” option button located within the “Insert” menu, listed within Excel’s ribbon menu.


This should bring up the menu below:


Go ahead and select all row entries, across all variable columns.

Once this has been completed, click “OK”.

This should generate the following menu:


Let’s break down each recommendation.

“Sum of VARB by VARD” – This table is summing the total of the numerical values contained within VARB, as they correspond with VARD entries.

“Count of VARA by VARD” – This table is counting the total number of occurrences of categorical values within variable column VARD.

“Sum of VARB by VARC” – This table is summing the total of numerical values contained within VARB, as they correspond with VARC entries.


“Count of VARA by VARC” – This table is counting the total number of occurrences of categorical values within variable column VARA.

“Sum of VARB by VARA” – This table is summing the total of the numerical values contained within VARB, as they correspond with VARA entries.

Now, there may come a time in which none of the above options match exactly what you are looking for. In this case, you will want to utilize the “PivotTable” option button, located within the “Insert” menu, listed within Excel’s ribbon menu.


Go ahead and select all row entries, across all variable columns.

Change the option button to “New Worksheet”, instead of “Existing Worksheet”.

Once this has been complete, click “OK”.

Once this has been accomplished, you’ll be graced with a new menu, on a new Excel sheet (same workbook).

I won’t go into every single output option that you have available, but I will list a few you may want to try yourself. Each output variation can be created by dragging and dropping the variables listed within the topmost box, in varying order, into the boxes below: 


If VARA and VARC are both added to Rows, you will view the categorical occurrences of variable entries from VARC, with VARA acting as the unique ID.

Order matters in each pivot table variable designation place.

So, if we reverse the position of VARA and VARC, and instead list VARC first, followed by VARA, then we will a table which lists the categorical occurrences of VARA, with VARC acting as a unique ID.

If we include VARA and VARC as rows (in that order), and set the values variable to Sum of VARB, then the output should more so resemble an accounting sheet, with the sum of each numerical value corresponding with VARA, categorized by VARC, is summed (VARB). 


If we instead wanted the count, as opposed to the sum, we could click on the drop down arrow located next to “Count of VARB”, which presents the following options:


From the options listed, we well select “Value Field Settings”.

This presents the following menu, from which we will select “Count”.


The result of following the previously listed steps is illustrated below:


The Pivot Table creation menu also allows for further customization through the addition of column variables.

In the case of our example, we will make the following modifications to our table output:


VARC will now be designated as a column variable, VARA will be a row variable, and the count of VARB will be out values variable.  

The result of these modifications is shown below:


Our output format now contains a table which contains the count of each occurrence of each color (VARC), as each color corresponds with each individual listed (VARA) within the original data set. 

In conclusion, the pivot table option within MS-Excel, offers a variety of different display outputs which can be utilized to display statistical summary data.

The most important skill to develop as it pertains to this feature, is the ability to ascertain when a pivot table is necessary for your data project needs.

So with that, we will end this article.

I will see you next time, Data Head.

-RD

Wednesday, July 14, 2021

Getting to Know the Greeks

In today’s article, we are going to go a bit off the beaten path and discuss, The Greek Alphabet!


You might be wondering, why the sudden change of subject content…?

In order to truly master of the craft of data science, you will be required to stretch your mind in creative ways. The Greek Alphabet is utilized throughout the fields of statistics, mathematics, finance, computer science, astronomy and other western intellectual pursuits. For this reason, it really ought to be taught in elementary schools. However, to my knowledge, in most cases, it is not.

The Romans borrowed heavily from Greek Civilization, and contemporary western civilization borrowed heavily from the Romans. Therefore, to truly be a person of culture, you should learn the Greek Alphabet, and really, as much as you possibly can about Ancient Greek Culture. This includes the legends, heroes, and philosophers. We might be getting more into this in other articles, but for today, we will be sticking to the alphabet.

The Greek Alphabet

The best way to learn the Greek alphabet is to be Greek (j/k, but not really). In all other cases, application is probably the best way to commit various Greek letters, as symbols, to memory.

I would recommend drawing each letter in order, uppercase, and lowercase, and saying the name of the letter as it is written.

Let’s try this together!

Α α (Alpha) (Pronounced: AL-FUH) - Utilized in statistics as the symbol which connotates significance level. In finance, it is the percentage return of an investment above or below a predetermined index.

B β (Beta) (Pronounced: BAY-TUH) - In statistics, this symbol is utilized to represent type II errors. In finance, it is utilized to determine asset volatility.

Γ γ (Gamma) (Pronounced: GAM-UH) - In physics, this symbol is utilized to represent particle decay (Gamma Decay). There also exists Alpha Decay, and Beta Decay. The type of decay situationally differs depending on the circumstances.

Δ δ (Delta) (Pronounced: DEL-TUH) - This is currently the most common strain of the novel coronavirus (7/2021). In the field of chemistry, uppercase Delta is utilized to symbolize heat being added to a reaction.

Ε ε (Epsilon) (Pronounced: EP-SIL-ON) - “Machine Epsilon” is utilized in computer science as a way of dealing with floating point values and their assessment within logical statements.

Ζ ζ (Zeta) (Pronounced: ZAY-TUH) - The most common utilization assignment which I have witnessed for this letter, is its designation as the variable which represents the Reimann Zeta Function (number theory).

Η η (Eta) (Pronounced: EE -TUH) - I’ve mostly seen this letter designated as variable for the Dedekind eta function (number theory).

Θ θ (Theta) (Pronounced: THAY-TUH) - Theta is utilized as the symbol to represent a pentaquark, a transitive subatomic particle.

Ι ι (Iota) (Pronounced: EYE-OL-TA) - I’ve never seen this symbol utilized for anything outside of astronomical designations. Maybe if you make it big in science, you could give Iota the love that it so deserves.

Κ k (Kappa) (Pronounced: CAP-UH) - Kappa is the chosen variable designation for Einstein’s gravitational constant.

Λ λ (Lambda) (Pronounced: LAMB-DUH) - A potential emergent novel coronavirus variant (7/2021). Lowercase Lambda is also utilized throughout the Poisson Distribution function.

Μ μ (Mu) (Pronounced: MEW) - Lowercase Mu is utilized to symbolize the mean of a population (statistics). In particle physics, it can also be applied to represent the elementary particle: Muon.

Ν ν (Nu) (Pronounced: NEW) - As a symbol, this letter represents degrees of freedom (statistics).

Ξ ξ (Xi) (Pronounced: SEE) - In mathematics, uppercase Xi can be utilized to represent the Reimann Xi Function.

Ο ο (Omnicron) (Pronounced: OMNI-CRON) - A symbol which does not get very much love, or use, unlike its subsequent neighbor…

Π π (Pi) (Pronounced: PIE) - In mathematics, lowercase Pi often represents the mathematical real transcendental constant ≈ 3.1415…etc.

Ρ ρ (Rho) (Pronounced: ROW) - In the Black-Scholes model, Rho represent the rate of change of a portfolio with respect to interest rates

Σ σ (Sigma) (Pronounced: SIG-MA) - Lower case Sigma represents the standard deviation of a population (statistics). Upper case sigma represents a sum function (mathematics).

Τ τ (Tau) (Pronounced: TAIL) - Lower case Tau represents an elementary particle within the field of particle physics

Υ υ (Upsilon) (Pronounced: EEP-SIL-ON) - Does not really get very much use…

Φ φ (Phi) (Pronounced: FAI) - Lowercase Phi is utilized to represent the Golden Ratio.

Χ χ (Chi) (Pronounced: KAI) - Lower case Chi is utilized as a variable throughout the Chi-Square distribution function.

Ψ ψ (Psi) (Pronounced: PSY) - Lower case Psi is used to represent the (generalized) positional states of a qubit within a quantum computer.

Ω ω (Omega) (Pronounced: OHMEGA) - Utilized for just about everything.

Αυτα για τωρα. Θα σε δω την επόμενη φορά!

-RD