Reflections of a Data Scientist: (R) Is Wall Street Random?

Anyone who has spent any serious time within the investment world, has probably at some point, encountered the book, A Random Walk Down Wall Street. While the book does contain some interesting historical anecdotes, and disproves numerous methods of stock picking quackery, the title itself refers to the following theory:

Burton G. Malkiel, an economics professor at Princeton University and writer of A Random Walk Down Wall Street, performed a test where his students were given a hypothetical stock that was initially worth fifty dollars. The closing stock price for each day was determined by a coin flip. If the result was heads, the price would close a half point higher, but if the result was tails, it would close a half point lower. Thus, each time, the price had a fifty-fifty chance of closing higher or lower than the previous day. Cycles or trends were determined from the tests. Malkiel then took the results in chart and graph form to a chartist, a person who "seeks to predict future movements by seeking to interpret past patterns on the assumption that 'history tends to repeat itself'." The chartist told Malkiel that they needed to immediately buy the stock. Since the coin flips were random, the fictitious stock had no overall trend. Malkiel argued that this indicates that the market and stocks could be just as random as flipping a coin.

Source: https://en.wikipedia.org/wiki/Random_walk_hypothesis

It would seem that within the field of contemporary finance, that some critics believe that Malkiel's book is prefaced upon a flawed theory. In this article, we will perform our own analysis in order to determine which side is correct in their assumption. This isn’t to in any way to provide further evidence to either side as it pertains to the ages old: Managed Fund vs. Index Fund argument, but instead, to utilize the evidence which we have as it pertains to this purposed theory, and see if it withstands a thorough statistical assessment.

Random Walking

To begin our foray into proving / disproving, “The Random Walk Hypothesis”, let’s take a random walk through R-Studio.

First, we’ll set a number of sample observations (n = 101). Then, we’ll perform the same experiment that Malkiel performed with his students. We’ll do this by randomly generating two numbers (-1, 1), with 1 equating a step upward, and -1 equating a step downwards.

However, we’ll make a few slight modifications to our experiment. As prices for an equity index cannot (theoretically) go negative, or in most cases, reach zero only to rebound, I’ve added a few caveats to our simulation.

We will be creating a random walk. However, in every instance in which our random walk would take us below a zero threshold, an absolute value of the outcome will instead be returned. For example, in the case of a typical random walk, the values: (1,-1,1,-1,-1,-1), provide the corresponding cumulative elements of: (1,0,1,0,-1,-2), as each element is being summed against the previous sum of the prior elements.

Example:

Random Walk Generated Values = {1,-1,1,-1,-1,-1}

0 + 1 = 1

1 – 1 = 0

0 + 1 = 1

1 – 1 = 0

0 – 1 = -1

-1 – 1 = -2

Thus, our cumulative elements are:

0 + 1 = 1

1 – 1 = 0

0 + 1 = 1

1 – 1 = 0

0 – 1 = -1

-1 – 1 = -2

In our variation of the simulation, instead of returning negative values, we will only return the absolute values of the cumulative elements.

Example:

|1| = 1

|0| = 0

|1| = 1

|0| = 0

|-1| = 1

|-2| = 2

Another modification that we will be making, is that every element of 0 will be changed to the value of 1.

Example:

1 = 1

0 = 1

1 = 1

0 = 1

1 = 1

2 = 2

# Set the random seed to 7 so that this example’s outcome can be reproduced #

set.seed(7)

# 101 random elements are needed for our example #

n <- 101

# Create a random walk with caveats (absolute values returned for negative numbers) #

Random_Walk <- abs(cumsum(sample(c(-1, 1), n, TRUE)))

# Further modify the random walk values (values of 0 will be modified to 1) #

Random_Walk <- ifelse(Random_Walk == 0, 1, Random_Walk)

# We’ll be attempting to simulate Dow Index returns from 1923 – 2023 #

year <- 1923:2023

# Graph the Random_Walk simulation #

Random_Walk_Plot <- data.frame(year, Random_Walk)

plot(Random_Walk_Plot, type = "l", xlim = c(1923, 2023), ylim = c(0, 30),

col = "blue", xlab = "Year", ylab = "Random Walk")

This produces the following graphic:

Which in some ways resembles:

Let’s overlay a resized version of the random walk graphic against the Dow Jones Industrial Average’s annualized returns:

Obviously, this proves nothing. It only demonstrates that by selectively choosing a randomly generated pattern, that one can draw aesthetic comparisons to an existing pattern.

Instead of taking the Random Walk Hypothesis at face value, or postulating ill-informed criticisms, let’s attack this hypothesis like good data scientists. First, we’ll forget about amateurish comparative assessment, and run a few tests in order to make a well researched conclusion.

To test as to whether or not the market is random, we’ll need real world (US) market data. Of the available indexes, I decided to choose the Dow Jones Industrial Average. The reasons supporting this decision are as follows:

1. The S&P 500 Stock Composite Index was not created until March of 1957. Therefore, not as many data points are available as compared to the Dow Jones Industrial Average.

2. The NASDAQ Composite Index was not created until February of 1971. It also lacks the broad market exposure which is found within the Dow Jones Industrial Average.

The data gathered to perform the following analysis, which also provided the Dow Jones Annual Average graphic above, originated from the source below.

Source: https://www.macrotrends.net/1319/dow-jones-100-year-historical-chart

# Dow Jones Industrial Average Years Vector #

Dow_Year <- c(2023, 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, 1998, 1997, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1989, 1988, 1987, 1986, 1985, 1984, 1983, 1982, 1981, 1980, 1979, 1978, 1977, 1976, 1975, 1974, 1973, 1972, 1971, 1970, 1969, 1968, 1967, 1966, 1965, 1964, 1963, 1962, 1961, 1960, 1959, 1958, 1957, 1956, 1955, 1954, 1953, 1952, 1951, 1950, 1949, 1948, 1947, 1946, 1945, 1944, 1943, 1942, 1941, 1940, 1939, 1938, 1937, 1936, 1935, 1934, 1933, 1932, 1931, 1930, 1929, 1928, 1927, 1926, 1925, 1924, 1923)

# Dow Jones Industrial Average Annual Closing Price Vector #

Dow_Close <- c(33301.87, 33147.25, 36338.3, 30606.48, 28538.44, 23327.46, 24719.22, 19762.6, 17425.03, 17823.07, 16576.66, 13104.14, 12217.56, 11577.51, 10428.05, 8776.39, 13264.82, 12463.15, 10717.5, 10783.01, 10453.92, 8341.63, 10021.57, 10787.99, 11497.12, 9181.43, 7908.3, 6448.27, 5117.12, 3834.44, 3754.09, 3301.11, 3168.83, 2633.66, 2753.2, 2168.57, 1938.83, 1895.95, 1546.67,
1211.57, 1258.64, 1046.54, 875, 963.99, 838.74, 805.01, 831.17, 1004.65, 852.41, 616.24, 850.86, 1020.02, 890.2, 838.92, 800.36, 943.75, 905.11, 785.69, 969.26, 874.13, 762.95, 652.1, 731.14, 615.89,
679.36, 583.65, 435.69, 499.47, 488.4, 404.39, 280.9, 291.9, 269.23, 235.41, 200.13, 177.3, 181.16, 177.2, 192.91, 152.32, 135.89, 119.4, 110.96, 131.13, 150.24, 154.76, 120.85, 179.9, 144.13, 104.04, 99.9, 59.93, 77.9, 164.58, 248.48, 300, 200.7, 157.2, 151.08, 120.51, 95.52)

# Combine both vectors into a singular data frame #

Dow_Data_Frame <- data.frame(Dow_Year, Dow_Close)

# Preview the data frame #

Dow_Data_Frame

This produces the output:

> Dow_Data_Frame
Dow_Year Dow_Close
1 2023 33301.87
2 2022 33147.25
3 2021 36338.30
4 2020 30606.48
5 2019 28538.44
6 2018 23327.46
7 2017 24719.22
8 2016 19762.60

Though we aren’t testing for this hypothesis directly through the application of a singular methodology, our hypothesis for this general experiment would resemble something like:

H0 (null): The Dow Jones Industrial Average Index’s annual returns are NOT random.

Ha (alternative): The Dow Jones Industrial Average Index’s annual returns are random.

The primary test that we will perform is the Phillips-Perron Unit Test. This particular method assess time series data for order of integration. In simplified terms, order of integration is the minimum number of differences required to obtain a covariance-stationary series. In the case of the Phillips-Perron Unit Test, we will be utilizing the underlying order of methodology to assess for random walk potential.

# The package: ‘tseries’ must be downloaded and enabled in order to utilize the PP.test() function #

library(tseries)

# Phillips-Perron Unit Root Test - A methodology utilized to test data for random walk potential #

# Null - The time series IS integrated of order (not-random) #

# Alternative - The time series is NOT integrated of order (random) #

PP.test(Dow_Data_Frame$Dow_Close)

This produces the output:

Phillips-Perron Unit Root Test

data: Dow_Data_Frame$Dow_Close
Dickey-Fuller = -3.6678, Truncation lag parameter = 4, p-value = 0.03055

The secondary analysis which we will perform on our time series data, is the Dicky-Fuller Unit Root Test. This test assesses data for stationary potential.

# Dicky-Fuller Unit Root Test - A methodology utilized to test data for stationarity #

# Null - Data is NOT stationary #

# Alternative - Data IS stationary #

adf.test(Dow_Data_Frame$Dow_Close)

This produces the output:

Augmented Dickey-Fuller Test

data: Dow_Data_Frame$Dow_Close
Dickey-Fuller = -5.0152, Lag order = 4, p-value = 0.01
alternative hypothesis: stationary

Warning message:
In adf.test(Dow_Data_Frame$Dow_Close) :
p-value smaller than printed p-value

Assuming an alpha value of .05, we would reject the null hypothesis in both instances. Thus, we would conclude with 95% confidence, that the annual Dow Jones Industrial Average closing prices are not integrated of order, and are also stationary. The combination of these results would therefore indicate that The Dow Jones Industrial Average returns are random.

I believe that the Dicky-Fuller test provides much more interesting insight. As unless a type I error was committed, we would eventually expect to witness either a Seneca Cliff event or parabolic downturn within the market given a long enough time frame. WOULD, EXPECT, EVENTUALLY, and UNLESS being the key terms here. (Don’t time the market or trade on the basis of a singular statistical methodology).

Some of you might be wondering why The Wald-Wolfowitz Test was not utilized. As a reminder, this particular method is only applicable to factor series data. However, if we were to inappropriately apply it in this instance, it would resemble the following:

# THIS IS NOT APPROPRIATE FOR OUR EXAMPLE #

# THIS CODE IS FOR DEMONSTRATION PURPOSES ONLY #

# The package: ‘trend’ must be downloaded and enabled in order to utilize the ww.test() function #

library(trend)

# Wald Wolfowitz Test #

# Null - Each element in the sequence is (independently) drawn from the same distribution (not-random) #

# Alternative - Each element in the sequence is not (independently) drawn from the same distribution (random) #

Dow_Close_Factor < as.factor(Dow_Data_Frame$Dow_Close)

ww.test(Dow_Close_Factor)

So that is it for today’s article, Data Heads. It would seem that Malkiel is vindicated, at least as it pertains to the methodologies which we applied within this particular entry. I’ll be back again soon with more data content.

-RD

Saturday, June 17, 2023

(R) Is Wall Street Random?