Reflections of a Data Scientist: (R) Markov Chains

Per Wikipedia, “A Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state of the attained in the previous event”.

Explained in a less broad manner, a Markov chain could be described as a way of assessing probabilistic systems by assessing fluidity as it applies to both a single variable, and the other variables contained within a system.

For example, in the case of weather systems, a day which is cloudy may subsequently be followed by a day which is also cloudy, a day without clouds, or a rainy day. However, the probability of each subsequent event will undoubtedly be impacted by the composition of the current state.

Another example of the applied methodology is assessment of market share. If company A offers a product which potentially retains 60% of its current consumers annually, but also has the potential to lose 40% of that consumer base to company B on an annual basis, and company B potentially retains 80% of its current annually, but also has the potential to lose 20% of that consumer base to company A, what is the impact of the phenomenon described on an annual basis?

Let’s explore both examples:

First, we’ll create a model which can predict weather.

We’ll assume that the following probabilities appropriately describe the autumn forecasts for weather in Winnipeg.

Cloudy Clear Snowy Rainy

Cloudy 33% 17% 25% 25%

Clear 25% 50% 12% 13%

Snowy 19% 15% 33% 33%

Rainy 20% 20% 10% 50%

To further understand this probability matrix, assume that currently the day’s forecast in Winnipeg is “Cloudy”. This would typically indicate that the following day would have weather which is either “Cloudy” (33%), “Clear” (17%), “Snowy” (25%), or “Rainy” (25%).

Now, we’ll run the information through the R-Studio platform:

EXAMPLE A – Weather Model

# With the libraries ‘markovchain’ and ‘diagram’ downloaded and enabled #

# Create a Transition Matrix #

trans_mat <- matrix(c(.33, .17, .25, .25, .25, .50, .12, .13, .19, .15, .33, .33, .20, .20, .10, .50),nrow = 4, byrow = TRUE)

stateNames <- c("Cloudy","Clear", "Snowy", "Rainy")

row.names(trans_mat) <- stateNames

colnames(trans_mat) <- stateNames

# Check input #

trans_mat

# Console Output #

Cloudy Clear Snowy Rainy
Cloudy 0.33 0.17 0.25 0.25
Clear 0.25 0.50 0.12 0.13
Snowy 0.19 0.15 0.33 0.33
Rainy 0.20 0.20 0.10 0.50

# Create a Discrete Time Markov Chain #

disc_trans <- new("markovchain",transitionMatrix=trans_mat, states=c("Cloudy","Clear", "Snowy", "Rainy"), name="Weather")

# Check input #

disc_trans

# Console Output #

Weather
A 4 - dimensional discrete Markov Chain defined by the following states:
Cloudy, Clear, Snowy, Rainy
The transition matrix (by rows) is defined as follows:
Cloudy Clear Snowy Rainy
Cloudy 0.33 0.17 0.25 0.25
Clear 0.25 0.50 0.12 0.13
Snowy 0.19 0.15 0.33 0.33
Rainy 0.20 0.20 0.10 0.50

# Illustrate the Matrix Transitions #

plotmat(trans_mat,pos = NULL,

lwd = 1, box.lwd = 2,

cex.txt = 0.8,

box.size = 0.1,

box.type = "circle",

box.prop = 0.5,

box.col = "light yellow",

arr.length=.1,

arr.width=.1,

self.cex = .4,

self.shifty = -.01,

self.shiftx = .13,

main = "")

This produces the output graphic:

(As it pertains to the graphic- something important to note is the direction of the arrows. The arrow direction in the graphic is inverted. Therefore, I would only use the graphic as an auxiliary for personal reference.)

# We will assume that the current forecast is cloudy by creating the vector below #

Current_state<-c(1, 0, 0, 0)

# Now we will utilize the following code to predict the weather for tomorrow #

steps<-1

finalState<-Current_state*disc_trans^steps

finalState

# Console Output #

Cloudy Clear Snowy Rainy
[1,] 0.33 0.17 0.25 0.25

This output indicates that tomorrow will have a 33% chance of being cloudy, a 17% chance of being clear, a 25% chance of being snowy, and a 25% chance of being rainy.

# Let’s predict the weather for the following day #

steps<-2

finalState<-Current_state*disc_trans^steps

finalState

# Console Output #

Cloudy Clear Snowy Rainy
[1,] 0.2428372 0.2621651 0.1839856 0.311012

With this information, we can assume that generally there is a 24% chance of rain, a 26% chance of the day being clear, an 18% of the day being snowy, and a 31% chance of the day being rainy.

It would be helpful if the rounded figures summed to 1. But I think that you probably understand the example regardless.

EXAMPLE A – Market Share

Let’s re-visit our market share example:

Company A offers a product which potentially retains 60% of its current consumers annually, but also has the potential to lose 40% of that consumer base to company B on an annual basis, and company B potentially retains 80% of its current annually, but also has the potential to lose 20% of that consumer base to company A, what is the impact of the phenomenon described on an annual basis?

Let’s make a few assumptions.

First, we will assume that the projection given above is accurate.

Next, we’ll assume that the total customer base as it pertains to the product is 60,000,000.

Finally, we’ll assume that the Company A possesses 20% of this market, and Company B possesses 80% of this market. 12,000,000 individuals and 48,000,000 respectively.

# With the libraries ‘markovchain’ and ‘diagram’ downloaded and enabled #

# Create a Transition Matrix #

trans_mat <- matrix(c(0.6,0.4,0.8,0.2),nrow = 2, byrow = TRUE)

stateNames <- c("Company A","Company B")

row.names(trans_mat) <- stateNames

colnames(trans_mat) <- stateNames

# Check input #

trans_mat

# Console Output #

Company A Company B
Company A 0.6 0.4
Company B 0.8 0.2

# Create a Discrete Time Markov Chain #

disc_trans <- new("markovchain",transitionMatrix=trans_mat, states=c("Company A","Company B"), name="Market Share")

disc_trans

# Check input #

disc_trans

# Console Output #

Market Share
A 2 - dimensional discrete Markov Chain defined by the following states:
Company A, Company B

The transition matrix (by rows) is defined as follows:
Company A Company B
Company A 0.6 0.4
Company B 0.8 0.2

# Illustrate the Matrix Transitions #

plotmat(trans_mat,pos = NULL,

lwd = 1, box.lwd = 2,

cex.txt = 0.8,

box.size = 0.1,

box.type = "circle",

box.prop = 0.5,

box.col = "light yellow",

arr.length=.1,

arr.width=.1,

self.cex = .4,

self.shifty = -.01,

self.shiftx = .13,

main = "")

This produces the output graphic:

(Again, as it pertains to the graphic- something important to note is the direction of the arrows. The arrow direction in the graphic is inverted. Therefore, I would only use the graphic as an auxiliary for personal reference.)

# We will assume that the market share is as follows #

# This reflects the information provided in the example description above #

Current_state<- c(0.20,0.80)

# Now we will utilize the following code to predict the market share for the next year #

steps<-1

finalState<-Current_state*disc_trans^steps

finalState

# Console Output #

Company A Company B
[1,] 0.76 0.24

As illustrated, one year out, Company A now controls 76% of the market share (45,600,000)*, and Company B controls 24% of the market share (14,400,000).

* Assuming that original market share does not increase or decline in overall individuals. The calculation for the figures is: 60,000,000 * .76 and 60,000,000 * .24.

Similar to our previous example, we can also project the current trend for multiple consecutive time periods.

# The following code to predicts the market share for the following two years #

steps<-2

finalState<-Current_state*disc_trans^steps

finalState

# Console Output #

Company A Company B
[1,] 0.648 0.352

Steady state in the case of this example, will predict the potential equilibrium which will be reached if the trends continue ad infinitum.

# Steady state Matrix #

steadyStates(disc_trans)

# Console Output #

Company A Company B
[1,] 0.6666667 0.3333333

Company A in this scenario now controls approximately 66.66% of the market share, and Company B controls 33.33% of the market share.

Reflections of a Data Scientist

Monday, August 31, 2020

(R) Markov Chains

No comments:

Post a Comment