Monday, October 16, 2023

The Friendship Paradox

Like many of the critical attributes of life, that which is most evident, lies obscured by monotony. This is especially true as it pertains to mathematical paradoxes, as the most enlightening insights within the field, have the habit of appearing both obvious and universally evident after discovery. Like many mystical traditions, these insights are best discovered through contradiction and reduction.

The Paradox

The Friendship Paradox, in simpler terms, identifies the common phenomenon in which an individual, typically possesses less friends than his friends. Additionally, the sum of friends which his friends possess, will be greater than the sum of his total number of friends.

This paradox possesses wide reaching implications, as it describes events which are self-arising and irrefutable. However, before we can detail applicability, we must demonstrate the paradox as it was initially discerned.

First, let’s get some terminology down.

In graph theory, circular graphics are known as nodes, or vertexes. The lines which illustrate relationships between the nodes are known as edges.


Now, let’s utilize this style of graphical representation to demonstrate the relationships between 5 individuals.


The chart below represents the above relationships, but in a different format.


As each relationship is symmetric, if one friend considers himself to be a friend of another individual, that individual also considers the initial individual to be his friend as well. As shown above, A is friends with B and E. B is friends with both A and E, and also friends with C.

If we derive the mean as it relates to the average number of friends that an individual possesses within our experiment, we come to the value of 2.8.


In this instance, E possesses the most friends, and every individual who is friends with E, possesses less. Therefore, the average number of friends that an individual within a group possesses (2.8), will likely be greater than the actual number of friends that a singular individual possesses. 

The Philosophical Implications   

If a single individual begins to quantify a particular phenomenon as it relates to their person on an individualized basis, or even as it relates to a novel phenomenon, then the natural consequence of this endeavor is that this individual from the onset will find himself at a disadvantage.

For example, a new creation upon its genesis, possessing autonomy, will immediately be concerned with attaining sustenance. This was not a concern which was possessed within the prior state of non-being. In contemplating one’s beauty, a young woman immediately begins to compare herself to those whom she perceives as being more beautiful. We would never anticipate the inverse to occur.

This is the paradox of living, striving to possess more while the value of that which we possess becomes diminished. This is due to the passage of time, but also due to singular possession of a resource also diminishing in value. Something within our possession loses value from the moment of possession, as both the individual and the possession are diminished by the natural passage of time.

Example (2):

Here is another example, if an individual walks into a crowded elevator filled with random strangers, then there is a greater probability that this individual will have the same number, or a greater number of friends, than each stranger within the elevator. However, if the same individual were invited to a party hosted by a friend, then there is a lesser probability of this individual possessing more friends, or a similar number of friends, as compared to each other party attendees. In the elevator scenario, there is no guarantee that any individual within the elevator possesses a single friend. This also includes the individual entering the already crowded elevator. However, in the party scenario, each party goer has at least one friend, that being - the party’s host. In this case, the count begins at the neutral value of 1, except for the case of the party’s host, who is friends with every individual in attendance.

As described above, the friendship paradox also seeks to demonstrate, “the sum of friends which his friends possess, will be greater than the sum of his total number of friends.”

In the case of our first example, this value would be calculated as follows:


To better illustrate this phenomenon, I’ve constructed a new example relationship diagram below:


In this instance, E has more friends than A, B, C, D.

E has 4 friends. While A, B, C, D each have 1 friend (E).

In total: A, B, C, D possess the same number of friends in sum (1+1+1+1).

If A, B, C, or D possessed one additional friend – F, then in total, they would together possess more friends in sum than E (1 + 1 + 1 + 2).

If this were the case, the paradox would hold, as E would have a total of 4 friends, but the total number of his friends of friends would be greater (5).

Conclusion

That's all for today.

I hope that you enjoyed this entry and will visit again soon.

-RD

(R) Utilizing Crowd Prediction Methodologies to Draft the Optimal Fantasy Football Team (II)

What would an application be without proof? A notion?

To prove that the ADP drafting strategy is superior to other ranking methodologies, I performed the following analysis.

#############################################################################

Data Source(s):

#############################################################################


ADP

https://fantasydata.com/nfl/fantasy-football-leaders?position=1&season=2022&seasontype=1&scope=1&subscope=1&scoringsystem=2&startweek=1&endweek=1&aggregatescope=1&range=1

Offense

https://fantasydata.com/nfl/ppr-adp?season=2022&leaguetype=2&type=ppr

Kickers

https://fantasydata.com/nfl/fantasy-football-leaders?position=6&season=2022&seasontype=1&scope=1&subscope=1&scoringsystem=2&startweek=1&endweek=1&aggregatescope=1&range=1

DST

https://fantasydata.com/nfl/fantasy-football-leaders?position=7&season=2022&seasontype=1&scope=1&subscope=1&scoringsystem=2&startweek=1&endweek=1&aggregatescope=1&range=1

#############################################################################

The Analysis (n = 300)

#############################################################################


ADP <- c(1.4, 2.4, 2.7, 4.2, 4.9, 5.8, 7, 7.2, 8.8, 10, 10.6, 12.4, 13.1, 14, 15.3, 16, 16.6, 17.9, 18.5, 19.1, 19.6, 19.8, 21.8, 22.7, 24.7, 25.9, 26.1, 27.3, 27.9, 29.3, 30.5, 31.6, 32.4, 33.2, 33.2, 34.1, 36.1, 38, 39.1, 39.3, 39.4, 39.7, 42.3, 43.2, 43.5, 44.2, 45.1, 46.5, 46.8, 47.3, 48.4, 50.2, 50.9, 51.5, 52.2, 53.9, 54.1, 56, 56.7, 57.9, 59.4, 61.2, 61.5, 62.1, 62.9, 63, 63.8, 63.9, 68, 68.1, 69.2, 69.9, 70.1, 71, 71.4, 71.7, 72, 72.9, 75.3, 75.4, 77.1, 77.9, 82.3, 83.8, 84.4, 84.8, 85, 86.7, 87.9, 89.5, 90.2, 90.2, 90.7, 91.3, 91.7, 91.8, 94.4, 95.9, 96.2, 98.5, 98.7, 99.1, 101.9, 102.6, 102.7, 103.6, 103.7, 105.4, 106.5, 106.7, 107.8, 108.6, 109.7, 109.9, 110.6, 111.6, 113.1, 113.6, 113.9, 114.9, 117.3, 117.7, 118.5, 119, 119.6, 120.5, 121, 121.2, 121.3, 122.7, 124.2, 125, 128.9, 129.8, 130.2, 130.2, 130.3, 130.8, 131.2, 132, 135.1, 135.9, 136.1, 136.3, 137.9, 138.9, 140.1, 140.9, 141.8, 142.7, 143.8, 144.8, 145.3, 145.4, 145.7, 147, 147, 148.1, 148.4, 148.9, 149, 149.4, 151.3, 151.4, 152.1, 152.5, 152.6, 153.8, 154.9, 155.6, 155.7, 156.7, 156.8, 156.8, 158, 158.5, 158.8, 158.9, 159.4, 160.1, 160.5, 160.7, 161, 161.5, 162, 163, 163.5, 164, 164.6, 164.6, 165, 165.2, 165.6, 166, 167, 168, 169, 169.5, 170, 171, 172, 173, 174, 175, 176, 177, 178, 178, 179, 180, 180, 181, 182, 183, 184, 185, 186, 187, 187, 188, 189, 190, 190, 191, 192, 192, 193, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265)


PPR <- c(146.4, 356.36, 372.7, 302.76, 368.66, 201.4, 223.46, 237.8, 242.4, 239.5, 211.7, 335.5, 191.1, 316.6, 248.6, 284, 316.3, 301.6, 281.4, 395.52, 168.4, 42, 226.1, 347.2, 216.5, 190.5, 225.4, 185.8, 200.2, 164, 299.6, 75.6, 220.9, 281.26, 141.3, 205.1, 199.1, 229, 417.4, 177.7, 81.2, 176.5, 43.6, 200.5, 180.7, 115.1, 159.4, 259.2, 350.7, 328.3, 167.6, 84.8, 226.8, 236.08, 51.1, 84.9, 98.3, 378.04, 222.8, 145.6, 90.9, 204.2, 216.7, 200.52, 142.7, 52.2, 171.6, 180, 156, 101.5, 126.8, 248.8, 74.2, 166.4, 79, 177.9, 225.76, 267.6, 141.2, 185.3, 249.1, 239.2, 53.5, 246, 87.1, 198.6, 254.6, 271.66, 88.1, 115.6, 227.8, 151.7, 174.8, 105.7, 108.38, 148.7, 69.4, 202.5, 241.9, 88.6, 148.2, 12.46, 168.3, 219.08, 115.7, 178.6, 147.3, 57.3, 87.9, 159.4, 291.58, 55.8, 198.2, 112.7, 237.3, 126, 98.2, 73.5, 135.7, 165.9, 150, 43.4, 135, 105.4, 215.4, 225.9, 70.4, 230.92, 167.12, 139.1, 166.5, 87.7, 88.4, 102, 122.4, 7, 94.1, 114, 105.04, 155.28, 102.9, 160, 103.9, 101.6, 159, 102, 98, 161, 295.98, 295.62, 115, 87.9, 112, 133, 101, 103, 55.2, 43.92, 116, 97, 131, 180.3, 121, 130.6, 143, 142, 119.8, 215.7, 25.5, 125, 123.6, 99, 39, 110, 97.1, 130, 129, 97, 57, 18, 118, 154, 168, 100, 116.9, 20.1, 104, 98.2, 186, 97, 60.2, 110, 101, 52.2, 155.6, 93.8, 3.2, 110, 169.3, 97.6, 34.4, 161.1, 78.9, 25.8, 51.6, 112.3, 115, 176.3, 198.1, 106, 26.4, 82.7, 149.1, 117.8, 88.3, 81.7, 110.1, 83, 4.5, 50.1, 164.1, 73, 12, 57.68, 139, 61.6, 112, 77.4, 45.4, 46.5, 15.1, 35.5, 72.7, 75.2, 84.5, 110, 0.2, 142.1, 34, 12.2, 83, 82.6, 21.1, 77.2, 196.3, 11.4, 51.7, 8.4, 54.1, 161.24, 46.2, 8.6, 8.4, 13.8, 289, 37.8, 170.08, 128.4, 89.6, 112.8, 104.1, 284.32, 154.16, 59.3, 24.9, 114.5, 121.42, 158.5, 59.8, 98.92, 0, 115.1, 10.2, 181.52, 14.8, 4.2, 12.8, 18.8, 103.9, 196.56, 1.3, 11.8, 16.2, 26, 39.1, 43.1, 53.6, 103.8, 27.3, 303.88, 30.8, 64.5, 3.5, 73.88, 110.2, 64.7, 84.3, 30.5, 10.2, 70.3)

cor.test(ADP, PPR)

Which produces the output:

Pearson's product-moment correlation

data: ADP and PPR

t = -13.394, df = 298, p-value < 2.2e-16

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

-0.6791003 -0.5370390

sample estimates:

cor

-0.6130004


#############################################################################

Creating the Visual Output

#############################################################################


my_data <- data.frame(ADP, PPR)

library("ggpubr")

ggscatter(my_data, x = "ADP", y = "PPR",

add = "reg.line", conf.int = TRUE,

cor.coef = TRUE, cor.method = "pearson",

xlab = "ADP (2022)", ylab = "Player Score (PPR)")

#############################################################################

Conclusion

#############################################################################


There was a negative correlation between the two variables: ADP (n = 300), and Player Score (PPR). ADP (M = 134.81, SD = 73.31), Player Score (PPR) (M = 133.33, SD = 86.66), Conditions; t(298) = -13.394 , p < .01. Pearson Product-Moment Correlation Coefficient: r = -0.61.

#############################################################################

As the findings indicate, there is a significant negative correlation as it pertains to ADP and player performance (PPR - 2022 Season). In plain terms, this means that we should typically expect to see better fantasy performance from players with lower ADP rankings. I hope that everyone enjoyed this article and my adherence to APA reporting standards. <3

Until next time. 

Stay cool, Data Heads.

-RD

Monday, October 9, 2023

Utilizing Crowd Prediction Methodologies to Draft the Optimal Fantasy Football Team

To win your league in Fantasy Football, or at least qualify for the playoffs, you don’t need to buy magazines, study tape, or watch ESPN. All that is required, is a general understanding of crowd psychology.

What is ADP?

ADP represents the average draft position for players in fantasy drafts. Each league, for each fantasy sport, typically displays this player value during the drafting process. This value, as the name suggests, is derived from where a particular player was selected by other “fantasy team owners”, during prior drafts.

As was discussed within a previous article demonstrating draft order and its impact on a fantasy team’s final placement, if each fantasy team owner drafted optimally within their respective position, we would expect the final standings to directly reflect the initial draft order.

However, how can an individual be sure that he is drafting optimally? The answer is simpler than one might assume. To achieve optimal drafting potential, one must adhere to drafting players with the best available ADP value throughout the drafting process.

Why does it Work?

Following the crowd consensus, should provide a fantasy participant with their best opportunity for victory. The strictest adherent to this methodology, will benefit most from the number non-adherents within their league. Let’s consider why this is the case.

While pundit, or site rankings of players are often determined by a single individual, or a group of informed individuals, ADP rankings are determined by draft consensus. Meaning, that there are more minds at work as it pertains to determining a player’s draft value. These ranks are also assigned through the drafting process. This differs from other ranking processes, in that, the act of drafting establishes the value. This is similar as to how market participants set prices through buying and selling assets. Whereas, the ranking process is more akin to the way in which planned economies function.   

We’ll assume that ADP perfectly correlates with the eventual points scored by each player within the league. Therefore, with this assumption in place, and also assuming that each league participant drafts optimally, we should expect to see point distributions resemble something like the graphic below. 


(In our example scenario, each subsequent player is valued at one point less than the previous player.)

Therefore, all things being equal, we would expect the final point totals for each league participant to be:


Gaining the Edge

In every instance, the largest advantage belongs to the team which drafts first, with diminishing advantage being assessed sequentially throughout the remaining draft order. To attempt to compensate for this diminishment, or to expand one’s edge regardless of draft position, a league participant should strongly adhere to the ADP value ranking system while drafting. By not attempting to gain an edge through self perceived insight, opportunities will arise as a result of opponents who attempt otherwise.

Every draft misstep is the micro-process of reallocating points from your team to another team within your league. In the example below, the teams highlighted in green are adhering to a strict ADP drafting strategy. The teams highlighted in red, instead are going for a less strict approach.


As is shown in the graphic, the ADP adhering teams were able to benefit from the mistakes made by their opponents. In each instance, the green teams were able to draft the players which were passed upon by their red counterparts. Thus, the ADP adhering teams increased their edge at the expense of the non-adhering teams.


I Know that I Know Nothing


The above strategy, functions on the foundation of two re-enforcing cognitive biases. One being the overestimation of one’s own abilities and talents, and the other being the discounting of the abilities and talents of others.

As far as football is concerned, I have personally witnessed friends who watch far more football than I do, who know far more about the players than I do, blow out their drafts, and fail to make their league’s playoffs in complex and interesting ways. In almost every case, the culprit tends to be impatience and exotic maneuvering. What’s also strange about this cohort of individuals, is that they tend to quickly abandon the strange individualized strategies which initially required a high level of conviction to attempt. This phenomenon itself might warrant an article in the future. 

Be sure to watch the waiver wire, as further edge can be gained from managers who prematurely release underperforming players. Also, it should be noted, that ADP rankings as a drafting criteria, are only applicable in leagues which do not utilize custom rule sets.

Sunday, July 23, 2023

Odd Man Out: The Problem with Serpentine Drafts

In this article, we’re going to continue the trend of discussing topics related to Fantasy Sports. Specifically, the innate problem which I see within the serpentine draft format. I feel that this topic is particularly appropriate for this time of year, as football fans are gearing up for their own fantasy league drafts.

If you’re unfamiliar with the serpentine draft format, it is best described as:

A serpentine draft, or sometimes referred to as a a "Snake" draft, is a type in which the draft order is reversed every round (eg 1..12, 12..1, 1..12, 12..1, etc.). For example, if you have the first pick in the draft, you will pick first in round one, and then last in round two.

Source: https://help.fandraft.com/support/solutions/articles/61000278703-draft-types-and-formats

I’ve created a few examples of this draft type below. The innate issue which I see within this draft format, pertains to the differentiation between the projected point value of each draft selection, as determined by a team’s draft position. The more teams present within a league, the greater the point disparity between teams.

Assuming that each team executed an optimized drafting strategy, we would expect the outcome to resemble something like the illustration below.

Each number within a cell represents the best player value available to each team, each round. The green cells contain starting player values, and the grey cells contain back-up player values.


As you can observe from the summed values below each outcome column, each team possess a one point advantage against the team which selected subsequently, and a one point disadvantage against the team which selected previously. The greatest differentiation occurring between the team which made the first selection within the draft order, as measured against the team which made the last selection within the draft order: 11 (1026 – 1015).

As previously mentioned, the less teams within a league, the less the number of selection rounds. As a result of such, there is less of a disparity between the teams which pick earlier within the order, as compared to teams which pick later within the order.

Below is the optimal outcome of a league comprised of ten teams.


While the single point differentiation persists between consecutive teams within the draft order, the differentiation between the first selector, and the last selector, has been reduced to: 9 (856 – 847).

This trend continues across ever smaller league sizes: 7 (1024 – 1017).


In each instance, we should expect the total differentiation of points between the first draft participant, and last draft participant, (if optimal drafting occurred), to be equal to: N – 1. Where N = the number of total draft participants within the league.

All things being equal, if each team is managed optimally, we should expect the first team within each draft to finish first within each league. Second place would belong to the team which drafted second. Third place belonging to the team which drafted third, and so on, etc.

If all players are equally at risk of being injured on each fantasy team, then this occurrence does little to upset the overall ranking of teams by draft order. It must be remembered that teams which drafted earlier within the order, will also possess better replacement players as compared to their competitors. Therefore, when injuries do occur, later drafting teams will be disproportionately impacted.

I would imagine that as AI integration begins to seep into all aspects of existence, that the opportunity for each team owner to draft with consistent optimization, will further stratify the inherent edge attributed to serpentine draft position. As it stands currently, there is still an opportunity for lower draft order teams to compete if one or more of their higher order competitors blunder a selection. 

In any case, I hope that what I have written in this article helped to describe what I like to refer to as the, “Odd Man Out” phenomenon. I hope to see you again soon, with more of the statistical content which you crave.

-RD

Monday, July 17, 2023

(R) Daily Fantasy Sports Line-up Optimizer (Basketball)

I’ve been mulling over whether or not I should give away this secret sauce on my site, and I’ve come to the conclusion that anyone who seriously contends within the Daily Fantasy medium, probably is already aware of this strategy. 

Today, through the magic of R software, I will demonstrate how to utilize code to optimize your daily fantasy sports line-up. This particular example will be specific to the Yahoo daily fantasy sports platform, and to the sport of basketball.

I also want to give credit, where credit is due.

The code presented below is a heavily modified variation of code initially created by: Patrick Clark.

The original code source case be found here: http://patrickclark.info/Lineup_Optimizer.html

Example:

First, you’ll need to access Yahoo’s Daily Fantasy page. I’ve created an NBA Free QuickMatch, which is a 1 vs. 1 contest against an opponent where no money changes hands.



This page will look a bit different during the regular season, as the NBA playoffs are currently underway. That aside, our next step is to download all of the current player data. This can be achieved by clicking on the “i” bubble icon.



Next, click on the “Export players list” link. This will download the previously mentioned player data.

The player data should resemble the (.csv) image below:



Prior to proceeding to the subsequent step, we need to do a bit of manual data clean up.

Any player who is injured or not starting, I removed from the data set. I also concatenated the First Name and Last Name fields, and placed that concatenation within the ID variable. Next, I removed all variables except for the following: ID (newly modified), Position, Salary, and FPPG (Fantasy Points Per Game).

The results should resemble the following image:



(Specific player data and all associated variables will differ depending on the date of download)

Now that the data has been formatted, we’re ready to code!

###################################################################

library(lpSolveAPI)

library(tidyverse)

# It is easier to input the data as an Excel file if possible #

# Player names (ID) have the potential to upset the .CSV format #

library(readxl)

# Be sure to set the played data file path to match your directory / file name #

PlayerPool <- read_excel("C:/Users/Your_Modified_Players_List.xlsx")

# Create some positional identifiers in the pool of players to simplify linear constraints #

# This code creates new position column variables, and places a 1 if a player qualifies for a position #

PlayerPool$PG_Check <- ifelse(PlayerPool$Position == "PG",1,0)

PlayerPool$SG_Check <- ifelse(PlayerPool$Position == "SG",1,0)

PlayerPool$SF_Check <- ifelse(PlayerPool$Position == "SF",1,0)

PlayerPool$PF_Check <- ifelse(PlayerPool$Position == "PF",1,0)

PlayerPool$C_Check <- ifelse(PlayerPool$Position == "C",1,0)

PlayerPool$One <- 1

# This code modifies the position columns so that each variable is a vector type #

PlayerPool$PG_Check <- as.vector(PlayerPool$PG_Check)

PlayerPool$SG_Check <- as.vector(PlayerPool$SG_Check)

PlayerPool$SF_Check <- as.vector(PlayerPool$SF_Check)

PlayerPool$PF_Check <- as.vector(PlayerPool$PF_Check)

PlayerPool$C_Check <- as.vector(PlayerPool$C_Check)

# This code orders each player ID by position #

PlayerPool <- PlayerPool[order(PlayerPool$PG_Check),]

PlayerPool <- PlayerPool[order(PlayerPool$SG_Check),]

PlayerPool <- PlayerPool[order(PlayerPool$SF_Check),]

PlayerPool <- PlayerPool[order(PlayerPool$PF_Check),]

PlayerPool <- PlayerPool[order(PlayerPool$C_Check),]

# Appropriately establish variables in order to perform the "solver" function #

Num_Players <- length(PlayerPool$One)

lp_model = make.lp(0, Num_Players)

set.objfn(lp_model, PlayerPool$FPPG)

lp.control(lp_model, sense= "max")

set.type(lp_model, 1:Num_Players, "binary")

# Total salary points avalible to the player #

# In the case of Yahoo, the salary points are set to ($)200 #

add.constraint(lp_model, PlayerPool$Salary, "<=",200)

# Maximum / Minimum Number of Players necessary for each position type #

add.constraint(lp_model, PlayerPool$PG_Check, "<=",3)

add.constraint(lp_model, PlayerPool$PG_Check, ">=",1)

# Maximum / Minimum Number of Players necessary for each position type #

add.constraint(lp_model, PlayerPool$SG_Check, "<=",3)

add.constraint(lp_model, PlayerPool$SG_Check, ">=",1)

# Maximum / Minimum Number of Players necessary for each position type #

add.constraint(lp_model, PlayerPool$SF_Check, "<=",3)

add.constraint(lp_model, PlayerPool$SF_Check, ">=",1)

# Maximum / Minimum Number of Players necessary for each position type #

add.constraint(lp_model, PlayerPool$PF_Check, "<=",3)

add.constraint(lp_model, PlayerPool$PF_Check, ">=",1)

# Maximum / Minimum Number of Players necessary for each position type (only require one (C)enter) #

add.constraint(lp_model, PlayerPool$C_Check, "=",1)

# Total Numner of Players Needed for the entire Fantasy Line-up #

add.constraint(lp_model, PlayerPool$One, "=",8)

# Perform the Solver function #

solve(lp_model)

# Projected_Score provides the projected score summed from the optimized projected line-up (FPPG) #

Projected_Score <- crossprod(PlayerPool$FPPG,get.variables(lp_model))

get.variables(lp_model)

# The optimal_lineup data frame provides the optimized line-up selection #

optimal_lineup <- subset(data.frame(PlayerPool$ID, PlayerPool$Position, PlayerPool$Salary), get.variables(lp_model) == 1)


If we take a look at our:

Projected_Score

We should receive an output which resembles the following:

> Projected_Score
    [,1]
[1,] 279.5

Now, let’s take a look at our:

optimal_lineup

Our output should resemble something like:

PlayerPool.ID PlayerPool.Position PlayerPool.Salary
3 Marcus Smart PG 20
51 Bradley Beal SG 43
108 Tyrese Haliburton SG 16
120 Jerami Grant SF 27
130 Eric Gordon SF 19
148 Brandon Ingram SF 36
200 Darius Bazley PF 19
248 Steven Adams C 20

With the above information, we are prepared to set our line up.

You could also run this line of code:

optimal_lineup <- subset(data.frame(PlayerPool$ID, PlayerPool$Position, PlayerPool$Salary, PlayerPool$FPPG), get.variables(lp_model) == 1)

optimal_lineup


Which provides a similar output that also includes point projections:

PlayerPool.ID PlayerPool.Position PlayerPool.Salary PlayerPool.FPPG
3 Marcus Smart PG 20 29.8
51 Bradley Beal SG 43 50.7
108 Tyrese Haliburton SG 16 26.9
120 Jerami Grant SF 27 38.4
130 Eric Gordon SF 19 30.7
148 Brandon Ingram SF 36 43.2
200 Darius Bazley PF 19 29.7
248 Steven Adams C 20 30.1

Summing up PlayerPool.FPPG, we reach the value: 279.5. This was the same value which we observed within the Projected_Score matrix.

Conclusion:

While this article demonstrates a very interesting concept, I would be remiss if I did not advise you to NOT gamble on daily fantasy. This post was all in good fun, and for educational purposes only. By all means, defeat your friends and colleagues in free leagues, but do not turn your hard-earned money over to gambling websites.

The code presented within this entry may provide you with a minimal edge, but shark players are able to make projections based on far more robust data sets as compared to league FPPG. 

In any case, the code above can be repurposed for any other daily fantasy sport (football, soccer, hockey, etc.). Remember, only to play for fun and for free. 

-RD 

Sunday, July 9, 2023

(R) Bedford's Law

In today’s article, we will be discussing Benford’s Law, specifically as it is utilized as an applied methodology to assess financial documents for potential fraud:

First, a bit about the phenomenon which Benford sought to describe:

The discovery of Benford's law goes back to 1881, when the Canadian-American astronomer Simon Newcomb noticed that in logarithm tables the earlier pages (that started with 1) were much more worn than the other pages. Newcomb's published result is the first known instance of this observation and includes a distribution on the second digit, as well. Newcomb proposed a law that the probability of a single number N being the first digit of a number was equal to log(N + 1) − log(N).

The phenomenon was again noted in 1938 by the physicist Frank Benford, who tested it on data from 20 different domains and was credited for it. His data set included the surface areas of 335 rivers, the sizes of 3259 US populations, 104 physical constants, 1800 molecular weights, 5000 entries from a mathematical handbook, 308 numbers contained in an issue of Reader's Digest, the street addresses of the first 342 persons listed in American Men of Science and 418 death rates. The total number of observations used in the paper was 20,229. This discovery was later named after Benford (making it an example of Stigler's law).


Source: https://en.wikipedia.org/wiki/Benford%27s_law

So what does this actually mean in laymen’s terms?

Essentially, given a series of numerical elements from a similar source, we should expect certain leading digits to occur, and correspond to, a particular distribution patter. 



If a series of elements perfectly corresponds with Benford’s Law, then the elements within the series should follow the above pattern as it pertains to leading digit frequency. Ex. Numbers which begin the number “1”, should occur 30.1% of the time. Numbers which begin with the number “2”, should occur 17.6% of the time. Numbers which begin with the number “3”, should occur 12.5% of the time.

The distribution is derived as follows:



The utilization of Benford’s Law is applicable to numerous scenarios:

1. Accounting fraud detection

2. Use in criminal trials

3. Election data 

4. Macroeconomic data

5. Election data 

6. Price digit analysis 

7. Genome data 

8. Scientific fraud detection 

As it relates to screening for financial fraud, if the application of methodology related to the Benford’s Law Distribution returns a result in which the sample elements do not correspond with the distribution, then fraud is not necessarily the conclusion which we would immediately assume. However, the findings may indicate that additional data scrutinization is necessary. 

Example:

Let’s utilize Benford’s Law to analyze Cloudflare’s (NET) Balance Sheet (12/31/2021).



Even though it’s an un-necessary step as it relates to our analysis, let’s first discern the frequency of each leading digit. These digits are underlined in red within the graphic above.



What Benford’s Law will seeks to assess, is the comparison of leading digits as they occurred within our experiment, to our expectations as they exist within the Benford’s Law Distribution.



The above table illustrates the frequency of occurrence of each leading digit within our analysis, versus the expected percentage frequency as stated by Benford’s Law.

Now let’s perform the analysis:

# H0: The first digits within the population counts follow Benford's law #

# H1: The first digits within the population counts do not follow Benford's law #

# requires benford.analysis #

library(benford.analysis)

# Element entries were gathered from Cloudflare’s (NET) Balance Sheet (12/31/2021) #

NET <- c(2372071.00, 1556273.00, 815798.00, 1962675.00, 815798.00, 134212.00, 791014.00, 1667291.00, 1974792.00, 791014.00, 1293206.00, 845217.00, 323612.00, 323612.00)

# Perform Analysis #

trends <- benford(NET, number.of.digits = 1, sign = "positive", discrete=TRUE, round=1)

# Display Analytical Output #

trends

# Plot Analytical Findings #

plot(trends)


Which provides the output:

Benford object:

Data: NET
Number of observations used = 14
Number of obs. for second order = 10
First digits analysed = 1

Mantissa:

Statistic Value
    Mean 0.51
    Var 0.11
Ex.Kurtosis -1.61
    Skewness 0.25

The 5 largest deviations:

digits absolute.diff
1 8 2.28
2 1 1.79
3 2 1.47
4 4 1.36
5 7 1.19

Stats:


Pearson's Chi-squared test

data: NET
X-squared = 14.729, df = 8, p-value = 0.06464


Mantissa Arc Test

data: NET
L2 = 0.092944, df = 2, p-value = 0.2722

Mean Absolute Deviation (MAD): 0.08743516
MAD Conformity - Nigrini (2012): Nonconformity
Distortion Factor: 8.241894

Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values!

~ Graphical Output Provided by Function ~



(The most important aspects of the output are bolded)

Findings:

Pearson's Chi-squared test

data: NET
X-squared = 14.729, df = 8, p-value = 0.06464
Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values!




A chi-square goodness of fit test was performed to examine whether the first digit of balance sheet items from the company Cloudflare (12/31/2021), adhere to Benford's law. Entries were found to be in adherence, with non-significance at the p < .05 level, χ2 (8, N = 14) = 14.73, p = 0.07.

As it relates to the graphic, in ideal circumstances, each blue data bar should have its uppermost portion touching the broken red line.

Example(2):

If you’d prefer to instead run the analysis simply as a chi-squared test which does not require the “benford.analysis” package, you can effectively utilize the following code. The image below demonstrates the concept being employed.



Model <- c(6, 1, 2, 0, 0, 0, 2, 3, 0)

Results <- c(0.30102999566398100, 0.17609125905568100, 0.12493873660830000, 0.09691001300805650, 0.07918124604762480, 0.06694678963061320, 0.05799194697768670, 0.05115252244738130, 0.04575749056067510)


chisq.test(Model, p=Results, rescale.p = FALSE)

Which provides the output:

    Chi-squared test for given probabilities

data: Model
X-squared = 14.729, df = 8, p-value = 0.06464


Which are the same findings that we encountered while performing the analysis previously.

That’s all for now! Stay studious, Data Heads! 

-RD