Monday, August 28, 2023

Are US Indexes Overweight?

Ever since the COVID Pandemic hit the mainstream news cycle in 2020, I’ve noticed that many of the stocks which I follow seem to dwindle, while the major indexes continue to accelerate upward. I decided to do a bit of research on this subject, and the following is what I discovered along the way.

The Case of the NASDAQ Composite

The NASDAQ Composite is comprised of 3,279 separate equities, each possessing a differing weight as it pertains to their overall contribution to the index’s price. The top 5 equities by weight are:


In sum, 5 equities, and their corresponding evaluations, contribute to approximately 34.07% of the NASDAQ Composite's overall price. 

Over the past five years, the NASDAQ has experienced a hefty appreciation of 67.43%.


However, of this 67.43% evaluation upward, how much of the price shift can be attributed to the top 5 weighted equities from which the index is comprised?


If every other component within the NASDAQ traded completely flat throughout the duration of the previous 5 years, the Dow Jones Industrial Average would have increased in value by approximately 25.51%. As the index appreciated by 67.43%, we can conclude that some of the other components from which the index is comprised, immensely increased the overall potential aggregate (67.43 > 25.51).

Rating: HEFTYCHONK


If 5 companies make up 34.07% of the index’s weight, and 0.15% of the index has increased by approximately 22.51%, while the index itself has appreciated by 67.43%, we can evaluate this furry boi as a Heftychonk.

The Case of the Dow Jones Industrial Average

The Dow Jones Industrial Average is comprised of 30 separate equities, each possessing a differing weight as it pertains to their overall contribution to the index’s price. The top 10 equities by weight are:


In sum, 5 equities, and their corresponding evaluations, contribute approximately 33.38% of the Dow Jones Industrial Average’s overall price.

Over the past five years, the Dow Jones Industrial Average has experienced a healthy appreciation of 32.40%.


However, of this 32.40% evaluation upward, how much of this price shift can be attributed to the top 5 weighted equities?


If every other component within the Dow Jones Industrial Average traded completely flat throughout the duration of the previous 5 years, the Dow Jones Industrial Average would have increased in value by approximately 30.97%. As the index appreciated by 32.40%, we can conclude that some of the components from which the index is comprised, slightly increased the overall potential aggregate (32.40 > 30.97).

Rating: MEGACHONKER


If 5 companies make up 33.82% of the index’s weight, and 16.67% of the index has increased by approximately 30.97%, while the index itself has appreciated by 32.40%, we can evaluate this plump feline as being a MeGaChOnKeR.

The Case of the S&P

The S&P 500 is comprised of 503 separate equities, each possessing a differing weight as it pertains to their overall contribution to the index’s price. The top 10 equities by weight are as follows:


In sum, these 10 equities, and their corresponding evaluations, contribute approximately 30.29% of the S&P 500’s overall price.

Over the past five years, the S&P has experienced a robust appreciation of 52.92%.


However, of this 52.91% evaluation upward, how much of the price shift be attributed to the top 10 weighted equities from which the index is comprised?


If every other component within the S&P traded completely flat throughout the duration of the previous 5 years, the S&P index would have increased in value by approximately 57.35%. As the index appreciated by 52.92%, we can conclude that some of the components from which the index is comprised, actually reduced the overall potential aggregate (52.92 < 57.35).

Rating: OH LAWD HE COMIN


If 10 companies make up 30.29% of the index’s weight, and that 1.98% of the index has increased by approximately 57.35%, while the index itself has appreciated by 52.92%, we can evaluate this rotund hunk of a cat with the phrase, “OH LAWD HE COMIN”!

Conclusion

While every index has been disproportionately impacted by the performance of a few equities, it would appear that the NASDAQ, the index which is least top heavy and possesses the greatest number of companies, has appreciated far more in value than its contemporaries.

Maybe the NASDAQ, though often more volatile than other indexes, benefits from two conflicting attributes. The top 5 stocks within the index anchor the overall price of the index by the magnitude of their market cap, while the much smaller corporate equities, are directionally pressured by component proximity.

Like almost every other medium of existence, it would seem that titans emerge after a certain period of growth. To succeed in such circumstances one must either cast their lot with familiar titans, or learn to swim amongst them.

-RD

Sunday, July 23, 2023

Odd Man Out: The Problem with Serpentine Drafts

In this article, we’re going to continue the trend of discussing topics related to Fantasy Sports. Specifically, the innate problem which I see within the serpentine draft format. I feel that this topic is particularly appropriate for this time of year, as football fans are gearing up for their own fantasy league drafts.

If you’re unfamiliar with the serpentine draft format, it is best described as:

A serpentine draft, or sometimes referred to as a a "Snake" draft, is a type in which the draft order is reversed every round (eg 1..12, 12..1, 1..12, 12..1, etc.). For example, if you have the first pick in the draft, you will pick first in round one, and then last in round two.

Source: https://help.fandraft.com/support/solutions/articles/61000278703-draft-types-and-formats

I’ve created a few examples of this draft type below. The innate issue which I see within this draft format, pertains to the differentiation between the projected point value of each draft selection, as determined by a team’s draft position. The more teams present within a league, the greater the point disparity between teams.

Assuming that each team executed an optimized drafting strategy, we would expect the outcome to resemble something like the illustration below.

Each number within a cell represents the best player value available to each team, each round. The green cells contain starting player values, and the grey cells contain back-up player values.


As you can observe from the summed values below each outcome column, each team possess a one point advantage against the team which selected subsequently, and a one point disadvantage against the team which selected previously. The greatest differentiation occurring between the team which made the first selection within the draft order, as measured against the team which made the last selection within the draft order: 11 (1026 – 1015).

As previously mentioned, the less teams within a league, the less the number of selection rounds. As a result of such, there is less of a disparity between the teams which pick earlier within the order, as compared to teams which pick later within the order.

Below is the optimal outcome of a league comprised of ten teams.


While the single point differentiation persists between consecutive teams within the draft order, the differentiation between the first selector, and the last selector, has been reduced to: 9 (856 – 847).

This trend continues across ever smaller league sizes: 7 (1024 – 1017).


In each instance, we should expect the total differentiation of points between the first draft participant, and last draft participant, (if optimal drafting occurred), to be equal to: N – 1. Where N = the number of total draft participants within the league.

All things being equal, if each team is managed optimally, we should expect the first team within each draft to finish first within each league. Second place would belong to the team which drafted second. Third place belonging to the team which drafted third, and so on, etc.

If all players are equally at risk of being injured on each fantasy team, then this occurrence does little to upset the overall ranking of teams by draft order. It must be remembered that teams which drafted earlier within the order, will also possess better replacement players as compared to their competitors. Therefore, when injuries do occur, later drafting teams will be disproportionately impacted.

I would imagine that as AI integration begins to seep into all aspects of existence, that the opportunity for each team owner to draft with consistent optimization, will further stratify the inherent edge attributed to serpentine draft position. As it stands currently, there is still an opportunity for lower draft order teams to compete if one or more of their higher order competitors blunder a selection. 

In any case, I hope that what I have written in this article helped to describe what I like to refer to as the, “Odd Man Out” phenomenon. I hope to see you again soon, with more of the statistical content which you crave.

-RD

Monday, July 17, 2023

(R) Daily Fantasy Sports Line-up Optimizer (Basketball)

I’ve been mulling over whether or not I should give away this secret sauce on my site, and I’ve come to the conclusion that anyone who seriously contends within the Daily Fantasy medium, probably is already aware of this strategy. 

Today, through the magic of R software, I will demonstrate how to utilize code to optimize your daily fantasy sports line-up. This particular example will be specific to the Yahoo daily fantasy sports platform, and to the sport of basketball.

I also want to give credit, where credit is due.

The code presented below is a heavily modified variation of code initially created by: Patrick Clark.

The original code source case be found here: http://patrickclark.info/Lineup_Optimizer.html

Example:

First, you’ll need to access Yahoo’s Daily Fantasy page. I’ve created an NBA Free QuickMatch, which is a 1 vs. 1 contest against an opponent where no money changes hands.



This page will look a bit different during the regular season, as the NBA playoffs are currently underway. That aside, our next step is to download all of the current player data. This can be achieved by clicking on the “i” bubble icon.



Next, click on the “Export players list” link. This will download the previously mentioned player data.

The player data should resemble the (.csv) image below:



Prior to proceeding to the subsequent step, we need to do a bit of manual data clean up.

Any player who is injured or not starting, I removed from the data set. I also concatenated the First Name and Last Name fields, and placed that concatenation within the ID variable. Next, I removed all variables except for the following: ID (newly modified), Position, Salary, and FPPG (Fantasy Points Per Game).

The results should resemble the following image:



(Specific player data and all associated variables will differ depending on the date of download)

Now that the data has been formatted, we’re ready to code!

###################################################################

library(lpSolveAPI)

library(tidyverse)

# It is easier to input the data as an Excel file if possible #

# Player names (ID) have the potential to upset the .CSV format #

library(readxl)

# Be sure to set the played data file path to match your directory / file name #

PlayerPool <- read_excel("C:/Users/Your_Modified_Players_List.xlsx")

# Create some positional identifiers in the pool of players to simplify linear constraints #

# This code creates new position column variables, and places a 1 if a player qualifies for a position #

PlayerPool$PG_Check <- ifelse(PlayerPool$Position == "PG",1,0)

PlayerPool$SG_Check <- ifelse(PlayerPool$Position == "SG",1,0)

PlayerPool$SF_Check <- ifelse(PlayerPool$Position == "SF",1,0)

PlayerPool$PF_Check <- ifelse(PlayerPool$Position == "PF",1,0)

PlayerPool$C_Check <- ifelse(PlayerPool$Position == "C",1,0)

PlayerPool$One <- 1

# This code modifies the position columns so that each variable is a vector type #

PlayerPool$PG_Check <- as.vector(PlayerPool$PG_Check)

PlayerPool$SG_Check <- as.vector(PlayerPool$SG_Check)

PlayerPool$SF_Check <- as.vector(PlayerPool$SF_Check)

PlayerPool$PF_Check <- as.vector(PlayerPool$PF_Check)

PlayerPool$C_Check <- as.vector(PlayerPool$C_Check)

# This code orders each player ID by position #

PlayerPool <- PlayerPool[order(PlayerPool$PG_Check),]

PlayerPool <- PlayerPool[order(PlayerPool$SG_Check),]

PlayerPool <- PlayerPool[order(PlayerPool$SF_Check),]

PlayerPool <- PlayerPool[order(PlayerPool$PF_Check),]

PlayerPool <- PlayerPool[order(PlayerPool$C_Check),]

# Appropriately establish variables in order to perform the "solver" function #

Num_Players <- length(PlayerPool$One)

lp_model = make.lp(0, Num_Players)

set.objfn(lp_model, PlayerPool$FPPG)

lp.control(lp_model, sense= "max")

set.type(lp_model, 1:Num_Players, "binary")

# Total salary points avalible to the player #

# In the case of Yahoo, the salary points are set to ($)200 #

add.constraint(lp_model, PlayerPool$Salary, "<=",200)

# Maximum / Minimum Number of Players necessary for each position type #

add.constraint(lp_model, PlayerPool$PG_Check, "<=",3)

add.constraint(lp_model, PlayerPool$PG_Check, ">=",1)

# Maximum / Minimum Number of Players necessary for each position type #

add.constraint(lp_model, PlayerPool$SG_Check, "<=",3)

add.constraint(lp_model, PlayerPool$SG_Check, ">=",1)

# Maximum / Minimum Number of Players necessary for each position type #

add.constraint(lp_model, PlayerPool$SF_Check, "<=",3)

add.constraint(lp_model, PlayerPool$SF_Check, ">=",1)

# Maximum / Minimum Number of Players necessary for each position type #

add.constraint(lp_model, PlayerPool$PF_Check, "<=",3)

add.constraint(lp_model, PlayerPool$PF_Check, ">=",1)

# Maximum / Minimum Number of Players necessary for each position type (only require one (C)enter) #

add.constraint(lp_model, PlayerPool$C_Check, "=",1)

# Total Numner of Players Needed for the entire Fantasy Line-up #

add.constraint(lp_model, PlayerPool$One, "=",8)

# Perform the Solver function #

solve(lp_model)

# Projected_Score provides the projected score summed from the optimized projected line-up (FPPG) #

Projected_Score <- crossprod(PlayerPool$FPPG,get.variables(lp_model))

get.variables(lp_model)

# The optimal_lineup data frame provides the optimized line-up selection #

optimal_lineup <- subset(data.frame(PlayerPool$ID, PlayerPool$Position, PlayerPool$Salary), get.variables(lp_model) == 1)


If we take a look at our:

Projected_Score

We should receive an output which resembles the following:

> Projected_Score
    [,1]
[1,] 279.5

Now, let’s take a look at our:

optimal_lineup

Our output should resemble something like:

PlayerPool.ID PlayerPool.Position PlayerPool.Salary
3 Marcus Smart PG 20
51 Bradley Beal SG 43
108 Tyrese Haliburton SG 16
120 Jerami Grant SF 27
130 Eric Gordon SF 19
148 Brandon Ingram SF 36
200 Darius Bazley PF 19
248 Steven Adams C 20

With the above information, we are prepared to set our line up.

You could also run this line of code:

optimal_lineup <- subset(data.frame(PlayerPool$ID, PlayerPool$Position, PlayerPool$Salary, PlayerPool$FPPG), get.variables(lp_model) == 1)

optimal_lineup


Which provides a similar output that also includes point projections:

PlayerPool.ID PlayerPool.Position PlayerPool.Salary PlayerPool.FPPG
3 Marcus Smart PG 20 29.8
51 Bradley Beal SG 43 50.7
108 Tyrese Haliburton SG 16 26.9
120 Jerami Grant SF 27 38.4
130 Eric Gordon SF 19 30.7
148 Brandon Ingram SF 36 43.2
200 Darius Bazley PF 19 29.7
248 Steven Adams C 20 30.1

Summing up PlayerPool.FPPG, we reach the value: 279.5. This was the same value which we observed within the Projected_Score matrix.

Conclusion:

While this article demonstrates a very interesting concept, I would be remiss if I did not advise you to NOT gamble on daily fantasy. This post was all in good fun, and for educational purposes only. By all means, defeat your friends and colleagues in free leagues, but do not turn your hard-earned money over to gambling websites.

The code presented within this entry may provide you with a minimal edge, but shark players are able to make projections based on far more robust data sets as compared to league FPPG. 

In any case, the code above can be repurposed for any other daily fantasy sport (football, soccer, hockey, etc.). Remember, only to play for fun and for free. 

-RD 

Sunday, July 9, 2023

(R) Bedford's Law

In today’s article, we will be discussing Benford’s Law, specifically as it is utilized as an applied methodology to assess financial documents for potential fraud:

First, a bit about the phenomenon which Benford sought to describe:

The discovery of Benford's law goes back to 1881, when the Canadian-American astronomer Simon Newcomb noticed that in logarithm tables the earlier pages (that started with 1) were much more worn than the other pages. Newcomb's published result is the first known instance of this observation and includes a distribution on the second digit, as well. Newcomb proposed a law that the probability of a single number N being the first digit of a number was equal to log(N + 1) − log(N).

The phenomenon was again noted in 1938 by the physicist Frank Benford, who tested it on data from 20 different domains and was credited for it. His data set included the surface areas of 335 rivers, the sizes of 3259 US populations, 104 physical constants, 1800 molecular weights, 5000 entries from a mathematical handbook, 308 numbers contained in an issue of Reader's Digest, the street addresses of the first 342 persons listed in American Men of Science and 418 death rates. The total number of observations used in the paper was 20,229. This discovery was later named after Benford (making it an example of Stigler's law).


Source: https://en.wikipedia.org/wiki/Benford%27s_law

So what does this actually mean in laymen’s terms?

Essentially, given a series of numerical elements from a similar source, we should expect certain leading digits to occur, and correspond to, a particular distribution patter. 



If a series of elements perfectly corresponds with Benford’s Law, then the elements within the series should follow the above pattern as it pertains to leading digit frequency. Ex. Numbers which begin the number “1”, should occur 30.1% of the time. Numbers which begin with the number “2”, should occur 17.6% of the time. Numbers which begin with the number “3”, should occur 12.5% of the time.

The distribution is derived as follows:



The utilization of Benford’s Law is applicable to numerous scenarios:

1. Accounting fraud detection

2. Use in criminal trials

3. Election data 

4. Macroeconomic data

5. Election data 

6. Price digit analysis 

7. Genome data 

8. Scientific fraud detection 

As it relates to screening for financial fraud, if the application of methodology related to the Benford’s Law Distribution returns a result in which the sample elements do not correspond with the distribution, then fraud is not necessarily the conclusion which we would immediately assume. However, the findings may indicate that additional data scrutinization is necessary. 

Example:

Let’s utilize Benford’s Law to analyze Cloudflare’s (NET) Balance Sheet (12/31/2021).



Even though it’s an un-necessary step as it relates to our analysis, let’s first discern the frequency of each leading digit. These digits are underlined in red within the graphic above.



What Benford’s Law will seeks to assess, is the comparison of leading digits as they occurred within our experiment, to our expectations as they exist within the Benford’s Law Distribution.



The above table illustrates the frequency of occurrence of each leading digit within our analysis, versus the expected percentage frequency as stated by Benford’s Law.

Now let’s perform the analysis:

# H0: The first digits within the population counts follow Benford's law #

# H1: The first digits within the population counts do not follow Benford's law #

# requires benford.analysis #

library(benford.analysis)

# Element entries were gathered from Cloudflare’s (NET) Balance Sheet (12/31/2021) #

NET <- c(2372071.00, 1556273.00, 815798.00, 1962675.00, 815798.00, 134212.00, 791014.00, 1667291.00, 1974792.00, 791014.00, 1293206.00, 845217.00, 323612.00, 323612.00)

# Perform Analysis #

trends <- benford(NET, number.of.digits = 1, sign = "positive", discrete=TRUE, round=1)

# Display Analytical Output #

trends

# Plot Analytical Findings #

plot(trends)


Which provides the output:

Benford object:

Data: NET
Number of observations used = 14
Number of obs. for second order = 10
First digits analysed = 1

Mantissa:

Statistic Value
    Mean 0.51
    Var 0.11
Ex.Kurtosis -1.61
    Skewness 0.25

The 5 largest deviations:

digits absolute.diff
1 8 2.28
2 1 1.79
3 2 1.47
4 4 1.36
5 7 1.19

Stats:


Pearson's Chi-squared test

data: NET
X-squared = 14.729, df = 8, p-value = 0.06464


Mantissa Arc Test

data: NET
L2 = 0.092944, df = 2, p-value = 0.2722

Mean Absolute Deviation (MAD): 0.08743516
MAD Conformity - Nigrini (2012): Nonconformity
Distortion Factor: 8.241894

Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values!

~ Graphical Output Provided by Function ~



(The most important aspects of the output are bolded)

Findings:

Pearson's Chi-squared test

data: NET
X-squared = 14.729, df = 8, p-value = 0.06464
Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values!




A chi-square goodness of fit test was performed to examine whether the first digit of balance sheet items from the company Cloudflare (12/31/2021), adhere to Benford's law. Entries were found to be in adherence, with non-significance at the p < .05 level, χ2 (8, N = 14) = 14.73, p = 0.07.

As it relates to the graphic, in ideal circumstances, each blue data bar should have its uppermost portion touching the broken red line.

Example(2):

If you’d prefer to instead run the analysis simply as a chi-squared test which does not require the “benford.analysis” package, you can effectively utilize the following code. The image below demonstrates the concept being employed.



Model <- c(6, 1, 2, 0, 0, 0, 2, 3, 0)

Results <- c(0.30102999566398100, 0.17609125905568100, 0.12493873660830000, 0.09691001300805650, 0.07918124604762480, 0.06694678963061320, 0.05799194697768670, 0.05115252244738130, 0.04575749056067510)


chisq.test(Model, p=Results, rescale.p = FALSE)

Which provides the output:

    Chi-squared test for given probabilities

data: Model
X-squared = 14.729, df = 8, p-value = 0.06464


Which are the same findings that we encountered while performing the analysis previously.

That’s all for now! Stay studious, Data Heads! 

-RD

Saturday, July 1, 2023

Money Hustle II


Following up on the prior article, a reader of this site wrote to me and asked (paraphrasing):

“I really enjoyed your last entry as it pertains to randomly selected portfolios. However, I remain skeptical. How would another random selection of stocks perform within a historical bear market? Also, how would these stocks perform across a longer duration of time?”

Great questions! We would not be scientists if we didn’t attempt to replicate our results!

We’ll start with a new list of random equities. The same rules apply. No commodity funds or bond funds will be included within our Random Portfolio. We will be going back to the late 1990’s, to the zenith of the dot com bubble, and then into the lethargy of the early 2000’s. This experiment will be imperfect, as amongst other confounding factors, our random stock picker can only pick from equities which are still public, and still in existence. Any equity which did not make it to the present, unfortunately, cannot be included within our experiment.

#### WARNING – INVESTMENT OF ANY SORT BEARS RISK! THIS ARTICLE IS NOT FINANCIAL ADVICE! DO NOT REPLICATE THIS EXPERIMENT AND EXPECT TO MAKE MONEY! ####

To fairly decide as to which equities our fund would hold, I utilized the websites:

https://www.rayberger.org/random-stock-picker/

~AND~

https://www.randstock.ca/selector

So, let’s meet our new components:

Fair Isaac Corporation (FICO) - Fair Isaac Corporation develops analytic, software, and data decisioning technologies and services that enable businesses to automate, enhance, and connect decisions in the Americas, Europe, the Middle East, Africa, and the Asia Pacific. Sector(s): Technology

Par Technology Corporation (PAR) - PAR Technology Corporation, together with its subsidiaries, provides technology solutions to the restaurant and retail industries worldwide. Sector(s): Software Application

IMAX Corporation (IMAX) - IMAX Corporation, together with its subsidiaries, operates as a technology platform for entertainment and events worldwide. Sector(s): Communication Services

Spectrum Brands Holdings, Inc. (SPB) - Spectrum Brands Holdings, Inc. operates as a branded consumer products company worldwide. It operates through three segments: Home and Personal Care; Global Pet Care; and Home and Garden. Sector(s): Consumer Defensive

Patterson Companies, Inc. (PDCO) - Patterson Companies, Inc. engages in distribution of dental and animal health products in the United States, the United Kingdom, and Canada. Sector(s): Healthcare

The Walt Disney Company (DIS) - The Walt Disney Company, together with its subsidiaries, operates as an entertainment company worldwide. Sector(s): Communication Services

Regis Corporation (RGS) - Regis Corporation owns, operates, and franchises hairstyling and hair care salons in the United States, Canada, Puerto Rico, and the United Kingdom. Sector(s): Consumer Cyclical

Synovus Financial Corp (SNV) - Synovus Financial Corp. operates as the bank holding company for Synovus Bank that provides commercial and consumer banking products and services. Sector(s): Financial Services

Royal Bank of Canada (RY) - Royal Bank of Canada operates as a diversified financial service company worldwide. Sector(s): Financial Services

Incyte Corporation (INCY) - Incyte Corporation, a biopharmaceutical company, engages in the discovery, development, and commercialization of therapeutics for hematology/oncology, and inflammation and autoimmunity areas in the United States, Europe, Japan, and internationally. Sector(s): Healthcare

With a purchase date of each equity being set at 1/1/1998 (closing price), our fictitious Random Fund performs as shown below:



Again, I choose a few benchmarks to compare this performance against:

Fidelity Magellan Fund (FMAGX) – A famous actively managed mutual fund which possess the following strategy, “The Fund seeks capital appreciation. Fidelity Management & Research may buy "growth" stocks or "value" stocks or a combination of both. They rely on fundamental analysis of each issuer and its potential for success in light of its current financial condition, its industry position, and economic and market conditions.”

Source: https://www.marketwatch.com/investing/fund/fmagx

Vanguard Total Stock Market Index Fund (VTSAX) – Description provided, “Created in 1992, Vanguard Total Stock Market Index Fund is designed to provide investors with exposure to the entire U.S. equity market, including small-, mid-, and large-cap growth and value stocks. The fund’s key attributes are its low costs, broad diversification, and the potential for tax efficiency. Investors looking for a low-cost way to gain broad exposure to the U.S. stock market who are willing to accept the volatility that comes with stock market investing may wish to consider this fund as either a core equity holding or your only domestic stock fund.”

Source: https://investor.vanguard.com/investment-products/mutual-funds/profile/vtsax

### And Introducing a New Challenger! ###

Columbia Seligman Technology & Info Fd;A (SLMCX) – “The Fund seeks to provide shareholders with capital gain. The Fund will invest at least 80% of its net assets in securities of companies operating in the communications, information and related industries, including companies operating in the information technology and telecommunications sectors.”

Source: https://www.marketwatch.com/investing/fund/slmcx

Why (SLMCX) and not the previous competitive benchmark (AIEQ)?

As the AI Powered Equity ETF (AIEQ) did not exist in the 1990’s, I decided to select a fund which is managed by a company which once employed Charles Kadlec. Who is Charles Kadlec? He is an author and financier, having written the book: Dow 100,000: Fact or Fiction.


Dow 100,000: Fact or Fiction was released in September 1999, having likely been composed throughout 1998. So, in the spirit of the technical optimism which existed from that era, and which still exists within the hearts of AIEQ investors, I choose the Columbia Seligman Technology and Information Fund Class A (SLMCX) as our final benchmark. 

First, let’s consider our returns over the span of a five year period:


As was mentioned within the prior article, we should typically expect actively managed funds to hold together more consistently throughout market downturns. In this case, the downturn was the collapse of the tech-bubble. 

As we stretch our assessment period out to the present, our returns are as follows:



Over a larger period of time, our poor Random Fund got clobbered. The total stock market index fund outperformed the actively managed fund – Magellan. However, in this instance, the (managed) tech sector heavy SLMCX fund destroyed all competitors.

Let’s take a look at the top holdings of the well performing SLMCX:



This mix heavily resembles the top holdings of Fidelity’s NASDAQ Composite Index Fund:



However, SLMCX is a bit heavier in chip sector allocation.

Though the gains of SLMCX are impressive in comparison, it would appear that SLMCX's composition is mostly derived from the NASDAQ composite. When compared against the broad US index, SLMCX is a stud. When compared against the NASDAQ composite over a 5-year period, SLMCX underperforms.



Why?

As it pertains to the SLMCX and the NASDAQ both trouncing their competition, it might have something to do with the composition of US markets.

For the past few decades, the US technology sector has benefitted from both a lax regulatory environment, and significant taxpayer investment. National Defense spending allocations often find their way into tech sector contracts. Research grant money and public infrastructure investments also helps drive rapid growth.

Random Fund melted on the pavement in comparison to both its index and managed counterparts, most likely due to its holdings not being predominately comprised of composite components.

Managed funds are forced to dink and dunk, spending more time out of the market than their index counterparts. These transactions cause taxes and other costs to be assessed. There is also the management fee. Composites themselves are almost a self-fulfilling prophecy. Stocks included within a composite appreciate in value when funds which benchmark the composite are purchased. The stocks themselves also can be purchased individually, which further drives mutual appreciation.

So why do investors even consider actively managed funds / hedge funds as investment options? 

1. Diversification. Typically, wealthier individuals prefer to very broadly diversify their capital allocation. To assist in achieving aspects of an individualized allocation strategy, managed funds offer the opportunity to seek out alternative or exotic investments. 

2. Liquidity. As most financial crises, both individual and societal, are caused by liquidity contractions, investors with excess capital who are savvy, may want to maintain both excess liquidity and market exposure at the cost of potential sub-alpha returns. This offset is accepted for the sake of liquidity access in times of financial uncertainty. 

3. Financial Planning. Financial advisors can assist in personal financial guidance and estate planning. In some instances, the advisor relationship entails that the advisee be invested in financial products provided by the advisor's firm. These funds, while sometimes bearing a premium, may assist the client in maintaining the above listed attributes. This is achieved while also providing the client with a personalized financial plan.

That is it for this article.

I'll see you next week!

-RD

Monday, June 26, 2023

Money Hustle

Following up on prior entries related to the stock market and market timing strategies, I’d like to present the following article for review:

https://prosperitythinkers.com/personal-finance/three-monkeys-and-cat-pick-stocks/

This article will be the topic for today’s post.

Returning again to the wisdom of Burton Malkiel, who was the subject of our last article, we find this sage of finance being quoted as stating,

“A blindfolded monkey throwing darts at a newspaper’s financial pages could select a portfolio that would do just as well as one carefully selected by experts.”

It would seem that some researchers took this quote a bit too literally as the article suggests, and decided to perform variations of the above described experiment.

The article discusses these contests of stock picking in detail. As apparently, in different instances, a cat, school children, and dart throwers, were employed to discern differing investment portfolios. In each instance described, the seemingly random selection of equities, regardless of the employed methodologies, outperformed both actively managed funds, and broad indexes.

I decided to perform my own trial research as it relates to this sort investment approach.

#### WARNING – INVESTMENT OF ANY SORT BEARS RISK! THIS ARTICLE IS NOT FINANCIAL ADVICE! DO NOT REPLICATE THIS EXPERIMENT AND EXPECT TO MAKE MONEY! ####

First, I chose a few financial benchmarks.

AI Powered Equity ETF (AIEQ) – An ETF which is described as, “AIEQ uses artificial intelligence to analyze and identify US stocks believed to have the highest probability of capital appreciation over the next 12 months, while exhibiting volatility similar to the overall US market. The fund selects 30 to 125 constituents and has no restrictions on the market cap of its included securities. The model suggests weights based on capital appreciation potential and correlation to other included companies, subject to a 10% cap per holding. It is worth noting that while AIEQ relies heavily on its quantitative model, the fund is actively-managed, and follows no index.”

Source: https://www.etf.com/AIEQ

Fidelity Magellan Fund (FMAGX) – A famous actively managed mutual fund which possess the following strategy, “The Fund seeks capital appreciation. Fidelity Management & Research may buy "growth" stocks or "value" stocks or a combination of both. They rely on fundamental analysis of each issuer and its potential for success in light of its current financial condition, its industry position, and economic and market conditions.”

Source: https://www.marketwatch.com/investing/fund/fmagx

Vanguard Total Stock Market Index Fund (VTSAX) – Description provided, “Created in 1992, Vanguard Total Stock Market Index Fund is designed to provide investors with exposure to the entire U.S. equity market, including small-, mid-, and large-cap growth and value stocks. The fund’s key attributes are its low costs, broad diversification, and the potential for tax efficiency. Investors looking for a low-cost way to gain broad exposure to the U.S. stock market who are willing to accept the volatility that comes with stock market investing may wish to consider this fund as either a core equity holding or your only domestic stock fund.”

Source: https://investor.vanguard.com/investment-products/mutual-funds/profile/vtsax

With these benchmarks defined, I set off too to create my own randomly established equity fund. To fairly decide which equities my fund would hold, I utilized the two websites:

https://www.rayberger.org/random-stock-picker/ 

~AND~

https://www.randstock.ca/selector

The only equity selections which I outright rejected from inclusion were fixed income ETFs, and ETFs which sought to replicate the performance of a commodity.

All funds would receive an equal allocations of capital ($ 1000), and the initial issue price of my Random Fund would be set at $ 10.00 a share.

Oshkosh Corporation (OSK) - Oshkosh Corporation designs, manufacture, and markets specialty trucks and access equipment vehicles worldwide. Sector(s): Industrials

Franklin FTSE Brazil ETF (FLBR) - The FTSE Brazil Capped Index is based on the FTSE Brazil Index and is designed to measure the performance of Brazilian large- and mid-capitalization stocks.  Sector(s): ETF

Public Storage (PSA) - Public Storage, a member of the S&P 500 and FT Global 500, is a REIT that primarily acquires, develops, owns, and operates self-storage facilities. Sector(s) - Real Estate

Dorian LPG Ltd. (LPG) - Dorian LPG Ltd., together with its subsidiaries, engages in the transportation of liquefied petroleum gas (LPG) through its LPG tankers worldwide. The company owns and operates very large gas carriers (VLGCs). Sector(s) - Energy

Delta Air Lines, Inc. (DAL) - Delta Air Lines, Inc. provides scheduled air transportation for passengers and cargo in the United States and internationally. Sector(s) – Industrials

Grupo Industrial Saltillo, S.A.B. de C.V. (SALT) (MX) - Grupo Industrial Saltillo, S.A.B. de C.V. engages in the design, manufacture, wholesale, and marketing of products for automotive, construction, and houseware industries in Mexico, Europe, and Asia. Sector(s) - Consumer Cyclical

Equity LifeStyle Properties, Inc. (ELS) - We are a self-administered, self-managed real estate investment trust (REIT) with headquarters in Chicago. Sector(s) – Real Estate

SPDR Portfolio S&P 500 ETF (SPLG) - Under normal market conditions, the fund generally invests substantially all, but at least 80%, of its total assets in the securities comprising the index. Sector(s) – ETF

DHT Holdings, Inc. (DHT)
- DHT Holdings, Inc., through its subsidiaries, owns and operates crude oil tankers primarily in Monaco, Singapore, and Norway. As of March 16, 2023, it had a fleet of 23 very large crude carriers. Sector(s) - Energy

Norfolk Southern Corporation (NSC) - Norfolk Southern Corporation, together with its subsidiaries, engages in the rail transportation of raw materials, intermediate products, and finished goods in the United States. Sector(s) – Industrials

With a purchase date of each equity being set at 5/1/2019 (closing price), our fictitious Random Fund performed as shown below:

               
Now let’s compare the fund’s performance against our previously decided upon benchmarks:



Graphing the performance of each fund across multiple years:



Or looking at returns over the span of a five year period:



Strangely enough, our Random Fund outperforms the advisor managed fund (Magellan Fund), the broad based index fund (VTSAX), and the AI managed fund (AIEQ). While actively managed funds typically underperform index benchmarks for a multitude of reasons, I found it incredibly odd that my e-monkey Random Fund outperformed even the index itself. This is also what happened to be the case in the similarly conducted experiments mentioned within the article.

Let’s think of a few informed reasons as to why this might be:

1. Actively managed funds often turn over equities with far greater frequency when compared to their index counterparts. Index funds typically owe the majority of their equity turnover to modifications made to the underlying index. This causes both tax implications, and a lack of dividend opportunities.

2. Our Random Fund contains less components, and is less balanced than its competitors. Therefore, we would expect the fund to be more generally impacted by variance. In bull markets, I would expect such a selection of equities to outperform an underlying index. However, in bear markets, the inverse should hold true. In that, our Random Fund would likely lose more value than its contemporaries.

3. The time frame which is being utilized to assess performance is both short in duration and directionally positive for equities.

4. Actively managed funds attempt to protect investors from downside, which also limits the upside potential for returns.

5. As it relates to #5, actively managed funds must also keep greater amounts of cash and cash equivalents on hand. This equates to time out of the market.

6. Actively managed funds have higher management fees, which are utilized to compensate fund managers.

7. Index components are themselves, popular equities. This means that incoming investment money either chases the price of individual equites upward through individual stock purchase, or through periodic fund purchases.

8. Randomly selecting equities is far less biased than making an “informed selection”. Such bias often manifests in vastly over-estimating ones abilities, and under-assessing the abilities of others. Such an over estimation may cause an individual manager to go overweight in a particular sector or individual equity, and in doing such, miss opportunities elsewhere. As indexes are broad, there is always the exposure to opportunity within a bull market. This combined with the other previously numbered aspects, likely partially explains the underperformance of actively managed funds.

9. As it relates to point #9. In a bull market, over a short time span, randomly selecting a small number of equities may be the best method to employing a strategy which beats alpha. As bias will be eliminated, and beta will be increased.

10. Active fund managers may not possess the fiduciary ability to allow their investment strategies to materialize. As the annual alpha is always the metric which high net worth clients measure all performance against, an informed, but otherwise risk incurring position must deliver results in the intermediate term, or risk liquidation at a loss in both capital and opportunity. 

That's all for today.

Until next time, stay curious data heads!

-RD

Saturday, June 17, 2023

(R) Is Wall Street Random?


Anyone who has spent any serious time within the investment world, has probably at some point, encountered the book, A Random Walk Down Wall Street. While the book does contain some interesting historical anecdotes, and disproves numerous methods of stock picking quackery, the title itself refers to the following theory:

Burton G. Malkiel, an economics professor at Princeton University and writer of A Random Walk Down Wall Street, performed a test where his students were given a hypothetical stock that was initially worth fifty dollars. The closing stock price for each day was determined by a coin flip. If the result was heads, the price would close a half point higher, but if the result was tails, it would close a half point lower. Thus, each time, the price had a fifty-fifty chance of closing higher or lower than the previous day. Cycles or trends were determined from the tests. Malkiel then took the results in chart and graph form to a chartist, a person who "seeks to predict future movements by seeking to interpret past patterns on the assumption that 'history tends to repeat itself'." The chartist told Malkiel that they needed to immediately buy the stock. Since the coin flips were random, the fictitious stock had no overall trend. Malkiel argued that this indicates that the market and stocks could be just as random as flipping a coin.

Source: https://en.wikipedia.org/wiki/Random_walk_hypothesis

It would seem that within the field of contemporary finance, that some critics believe that Malkiel's book is prefaced upon a flawed theory. In this article, we will perform our own analysis in order to determine which side is correct in their assumption. This isn’t to in any way to provide further evidence to either side as it pertains to the ages old: Managed Fund vs. Index Fund argument, but instead, to utilize the evidence which we have as it pertains to this purposed theory, and see if it withstands a thorough statistical assessment.

Random Walking

To begin our foray into proving / disproving, “The Random Walk Hypothesis”, let’s take a random walk through R-Studio.

First, we’ll set a number of sample observations (n = 101). Then, we’ll perform the same experiment that Malkiel performed with his students. We’ll do this by randomly generating two numbers (-1, 1), with 1 equating a step upward, and -1 equating a step downwards.

However, we’ll make a few slight modifications to our experiment. As prices for an equity index cannot (theoretically) go negative, or in most cases, reach zero only to rebound, I’ve added a few caveats to our simulation.

We will be creating a random walk. However, in every instance in which our random walk would take us below a zero threshold, an absolute value of the outcome will instead be returned. For example, in the case of a typical random walk, the values: (1,-1,1,-1,-1,-1), provide the corresponding cumulative elements of: (1,0,1,0,-1,-2), as each element is being summed against the previous sum of the prior elements.


Example:

Random Walk Generated Values = {1,-1,1,-1,-1,-1}

0 + 1 = 1

1 – 1 = 0

0 + 1 = 1

1 – 1 = 0

0 – 1 = -1

-1 – 1 = -2

Thus, our cumulative elements are:

0 + 1 = 1

1 – 1 = 0

0 + 1 = 1

1 – 1 = 0

0 – 1 = -1

-1 – 1 = -2

In our variation of the simulation, instead of returning negative values, we will only return the absolute values of the cumulative elements. 

Example:

|1| = 1

|0| = 0

|1| = 1

|0| = 0

|-1| = 1

|-2| = 2


Another modification that we will be making, is that every element of 0 will be changed to the value of 1. 


Example:

1 = 1

0 = 1

1 = 1

0 = 1

1 = 1

2 = 2

# Set the random seed to 7 so that this example’s outcome can be reproduced #

set.seed(7)

# 101 random elements are needed for our example #

n <- 101

# Create a random walk with caveats (absolute values returned for negative numbers) #

Random_Walk <- abs(cumsum(sample(c(-1, 1), n, TRUE)))

# Further modify the random walk values (values of 0 will be modified to 1) #

Random_Walk <- ifelse(Random_Walk == 0, 1, Random_Walk)

# We’ll be attempting to simulate Dow Index returns from 1923 – 2023 #

year <- 1923:2023

# Graph the Random_Walk simulation #

Random_Walk_Plot <- data.frame(year, Random_Walk)

plot(Random_Walk_Plot, type = "l", xlim = c(1923, 2023), ylim = c(0, 30),

    col = "blue", xlab = "n", ylab = "Rw")


This produces the following graphic:



Which in some ways resembles:


Let’s overlay a resized version of the random walk graphic against the Dow Jones Industrial Average’s annualized returns:


Obviously, this proves nothing. It only demonstrates that by selectively choosing a randomly generated pattern, that one can draw aesthetic comparisons to an existing pattern. 

Instead of taking the Random Walk Hypothesis at face value, or postulating ill-informed criticisms, let’s attack this hypothesis like good data scientists. First, we’ll forget about amateurish comparative assessment, and run a few tests in order to make a well researched conclusion. 

To test as to whether or not the market is random, we’ll need real world (US) market data. Of the available indexes, I decided to choose the Dow Jones Industrial Average. The reasons supporting this decision are as follows: 


1. The S&P 500 Stock Composite Index was not created until March of 1957. Therefore, not as many data points are available as compared to the Dow Jones Industrial Average.

2. The NASDAQ Composite Index was not created until February of 1971. It also lacks the broad market exposure which is found within the Dow Jones Industrial Average.

The data gathered to perform the following analysis, which also provided the Dow Jones Annual Average graphic above, originated from the source below.

Source: https://www.macrotrends.net/1319/dow-jones-100-year-historical-chart


# Dow Jones Industrial Average Years Vector #

Dow_Year <- c(2023, 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, 1998, 1997, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1989, 1988, 1987, 1986, 1985, 1984, 1983, 1982, 1981, 1980, 1979, 1978, 1977, 1976, 1975, 1974, 1973, 1972, 1971, 1970, 1969, 1968, 1967, 1966, 1965, 1964, 1963, 1962, 1961, 1960, 1959, 1958, 1957, 1956, 1955, 1954, 1953, 1952, 1951, 1950, 1949, 1948, 1947, 1946, 1945, 1944, 1943, 1942, 1941, 1940, 1939, 1938, 1937, 1936, 1935, 1934, 1933, 1932, 1931, 1930, 1929, 1928, 1927, 1926, 1925, 1924, 1923)

# Dow Jones Industrial Average Annual Closing Price Vector #

Dow_Close <- c(33301.87, 33147.25, 36338.3, 30606.48, 28538.44, 23327.46, 24719.22, 19762.6, 17425.03, 17823.07, 16576.66, 13104.14, 12217.56, 11577.51, 10428.05, 8776.39, 13264.82, 12463.15, 10717.5, 10783.01, 10453.92, 8341.63, 10021.57, 10787.99, 11497.12, 9181.43, 7908.3, 6448.27, 5117.12, 3834.44, 3754.09, 3301.11, 3168.83, 2633.66, 2753.2, 2168.57, 1938.83, 1895.95, 1546.67,
1211.57, 1258.64, 1046.54, 875, 963.99, 838.74, 805.01, 831.17, 1004.65, 852.41, 616.24, 850.86, 1020.02, 890.2, 838.92, 800.36, 943.75, 905.11, 785.69, 969.26, 874.13, 762.95, 652.1, 731.14, 615.89,
679.36, 583.65, 435.69, 499.47, 488.4, 404.39, 280.9, 291.9, 269.23, 235.41, 200.13, 177.3, 181.16, 177.2, 192.91, 152.32, 135.89, 119.4, 110.96, 131.13, 150.24, 154.76, 120.85, 179.9, 144.13, 104.04, 99.9, 59.93, 77.9, 164.58, 248.48, 300, 200.7, 157.2, 151.08, 120.51, 95.52)


# Combine both vectors into a singular data frame #

Dow_Data_Frame <- data.frame(Dow_Year, Dow_Close)

# Preview the data frame #

Dow_Data_Frame


This produces the output:

> Dow_Data_Frame
Dow_Year Dow_Close
1 2023 33301.87
2 2022 33147.25
3 2021 36338.30
4 2020 30606.48
5 2019 28538.44
6 2018 23327.46
7 2017 24719.22
8 2016 19762.60

Though we aren’t testing for this hypothesis directly through the application of a singular methodology, our hypothesis for this general experiment would resemble something like:

H0 (null): The Dow Jones Industrial Average Index’s annual returns are NOT random.

Ha (alternative): The Dow Jones Industrial Average Index’s annual returns are random.

The primary test that we will perform is the Phillips-Perron Unit Test. This particular method assess time series data for order of integration. In simplified terms, order of integration is the minimum number of differences required to obtain a covariance-stationary series. In the case of the Phillips-Perron Unit Test, we will be utilizing the underlying order of methodology to assess for random walk potential.

# The package: ‘tseries’ must be downloaded and enabled in order to utilize the PP.test() function #

library(tseries)

# Phillips-Perron Unit Root Test - A methodology utilized to test data for random walk potential #

# Null - The time series IS integrated of order (not-random) #

# Alternative - The time series is NOT integrated of order (random) #

PP.test(Dow_Data_Frame$Dow_Close)


This produces the output:

    Phillips-Perron Unit Root Test

data: Dow_Data_Frame$Dow_Close
Dickey-Fuller = -3.6678, Truncation lag parameter = 4, p-value = 0.03055


The secondary analysis which we will perform on our time series data, is the Dicky-Fuller Unit Root Test. This test assesses data for stationary potential.

# Dicky-Fuller Unit Root Test - A methodology utilized to test data for stationarity #

# Null - Data is NOT stationary #

# Alternative - Data IS stationary #

adf.test(Dow_Data_Frame$Dow_Close)


This produces the output:

    Augmented Dickey-Fuller Test

data: Dow_Data_Frame$Dow_Close
Dickey-Fuller = -5.0152, Lag order = 4, p-value = 0.01
alternative hypothesis: stationary

Warning message:
In adf.test(Dow_Data_Frame$Dow_Close) :
p-value smaller than printed p-value


Assuming an alpha value of .05, we would reject the null hypothesis in both instances. Thus, we would conclude with 95% confidence, that the annual Dow Jones Industrial Average closing prices are not integrated of order, and are also stationary. The combination of these results would therefore indicate that The Dow Jones Industrial Average returns are random.

I believe that the Dicky-Fuller test provides much more interesting insight. As unless a type I error was committed, we would eventually expect to witness either a Seneca Cliff event or parabolic downturn within the market given a long enough time frame. WOULD, EXPECT, EVENTUALLY, and UNLESS being the key terms here. (Don’t time the market or trade on the basis of a singular statistical methodology).

Some of you might be wondering why The Wald-Wolfowitz Test was not utilized. As a reminder, this particular method is only applicable to factor series data. However, if we were to inappropriately apply it in this instance, it would resemble the following:

# THIS IS NOT APPROPRIATE FOR OUR EXAMPLE #

# THIS CODE IS FOR DEMONSTRATION PURPOSES ONLY #

# The package: ‘trend’ must be downloaded and enabled in order to utilize the ww.test() function #

library(trend)

# Wald Wolfowitz Test #

# Null - Each element in the sequence is (independently) drawn from the same distribution (not-random) #

# Alternative - Each element in the sequence is not (independently) drawn from the same distribution (random) #

Dow_Close_Factor < as.factor(Dow_Data_Frame$Dow_Close)

ww.test(Dow_Close_Factor)


So that is it for today’s article, Data Heads. It would seem that Malkiel is vindicated, at least as it pertains to the methodologies which we applied within this particular entry. I’ll be back again soon with more data content.

-RD