PECOTA

The logic and methodology underlying PECOTA have been described in several publications, but the detailed formulas are proprietary and have not been shared with the baseball research community.

[citation needed] Silver described the inspiration for his approach as follows: The basic idea behind PECOTA is really a fusion of two different things – [Bill] James's work on similarity scores and Gary Huckabay's work on Vlad, [Baseball Prospectus's] previous projection system, which tried to assign players to a number of different career paths.

As is described in the Baseball Prospectus website's glossary:[10] PECOTA compares each player against a database of roughly 20,000 major league batter seasons since World War II.

Phenotypic attributes, including handedness, height, weight, career length (for major leaguers), and minor league level (for prospects).4.

When it isn't, the program is designed to 'cheat' by expanding its tolerance for dissimilar players until a reasonable sample size is reached.

[11] Furthermore, Silver describes the following distinct feature: The PECOTA similarity scores are based primarily on looking at a three-year window of a pitcher’s performance.

[13] As baseball analyst and journalist Alan Schwarz writes, "Silver ... designed a sophisticated variance algorithm that has examined every big-league pitcher's statistics since 1946 to determine which numbers best forecast effectiveness, specifically earned run average.

Rather than generate one line of expected statistics, Pecota presents seven – some optimistic, some pessimistic – each with its own confidence level.

They will sometimes appear to exceed it in any given season, and other times fall short, because of the sample size problems that we described earlier.PECOTA accounts for these sorts of factors by creating not a single forecast point, as other systems do, but rather a range of possible outcomes that the player could expect to achieve at different levels of probability.

[citation needed]Surely, this approach is more complicated than the standard method of applying an age adjustment based on the 'average' course of development of all players throughout history.

In March 2009, Silver announced that PECOTA's extremely complex and laborious set of database manipulations and calculations would be moving to a different platform.

[citation needed] Beginning in 2000, the Cleveland Indians developed a proprietary analytical database called DiamondView to evaluate scouting information gathered by the team; this system later incorporated player performance indicators and financial indicators, for purposes of evaluating and projecting the performance of all major league players.

[23] First introduced in 2003,[24] PECOTA projections are produced each year and published both in the Baseball Prospectus annual monographs and on the BaseballProspectus.com website.

The 2006 version introduced metrics for the market valuation of players based on the predicted performance levels.

[30] Although Baseball Prospectus promotes PECOTA commercially as "deadly accurate," all projection systems are subject to considerable uncertainty.

Nate Silver's own comparison of the performance of alternative projection systems for hitters in 2007 also showed that PECOTA led the field, though a couple of others were close.

The number of runs a team will score and allow during the coming season is estimated based on the playing times and PECOTA's predicted individual performance of each player, using a "Marginal Lineup Value" algorithm created by David Tate and further developed by Keith Woolner.

[33] A team's expected wins is based on applying an improved version of Bill James' Pythagorean Formula to the estimated number of runs scored and allowed by the roster of players under the given playing-time assumptions.

[37] An independent evaluation by the website Vegas Watch showed that PECOTA had the lowest error in predicting Major League team wins in 2008 of all the best known forecasts, both those that were sabermetrically based and those that relied on individual expertise.