On Monday’s podcast, Ouij came on the show to talk about his model and method for predictive analysis.  Specifically, we discussed the Pythagorean Win/Loss Expectation formula.  After it was over, I wanted to know: Just how does it work, and more importantly… how well does it work?

It’s not much, but I decided to run the 2012 standings through the P-formula to see how well the expectations matched up to what happened.  Some very surprising (and not surprising at all) results followed.

First, let’s review what the formula is:

(click for larger)

Before you get all freaked out, look again.  There are only three numbers you need to know:

• The Number of Runs a Team Scores
• The Number of Runs a Team Gives Up
• The Number of Games in a Season (162 for Baseball)

If you want to read about how Bill James came up with this formula, click here.  For now, we’re just going to go with it.  I took that formula and the 2012 standings and came up with this:

(click for larger) Slight color coding on the sheet: no fill is with in two runs, then green, yellow and pink for progressively distant predictions.

That’s the MLB team by division (*note I put Houston in the AL West to keep the formatting easy-they still finished dead last in the NL Central).  Then along the left side you can see:  I took the 1. runs scored and the 2. runs allowed, and put them in the Pythag W/L formula.  That gave me the 3. Expected Win%.  There are 162 games in a season, so the 4. Expected Wins are 162* the Expected Win%.  That’s how many games a team should have won based on their runs scored vs. allowed.  I then compared that to the team’s 5. actual 2012 Win total.  The final line is the 6. difference between actual wins (aW) and estimated wind (eW) .  A positive number indicates outperforming the expectation, a negative means underperforming.

Example So Far: So for the Nationals, who scored 1. 731 Runs and gave up 2. 594 runs, the formula yields they should have won 3. 60% of their games.  60% of 162 games is 4. 97.57 games, or rounded up to 98 games.  The Nationals 5. actually won 98 games and that is good for 6. less than a game difference from what would have been predicted.

A couple of things really pop out on this chart to me, including that the NL East seems to have broken down exactly like the run totals suggested they should have.  That is to say, no one was very luck/unlucky (win total wise, anyway) in the NL East-which seems to reinforce the idea that the Nationals success was not a fluke-but nor was the Braves.

Take a look at the NL Central though:  The Cardinals, based on expectation, should have won the division with the Reds finishing second.  Compare that to what happened in the playoffs: The 91 eWin Reds losing to the 89 eWin Giants (estimated Wins) makes more sense than the 97 aWin Reds vs 94 aWin Giants (actual Wins).  Also,  it might be a little less shocking to think of the 94 eWin Cards upsetting the 94 eWin Braves and 98 eWin Nationals than the 88 aWin Cards team that took the field.

The biggest/strangest swing easily is in the AL East: The Baltimore Orioles, who had quite the magical season, finished 11 wins greater than expected.  The Tampa Bay Rays six game drop off from what was expected was enough to not just cost them a playoff spot, but the division (or so it would seem).  Their 96 eWins should have put them slightly ahead of the 95 eWin Yankees, and the 82 eWin O’s.    Indeed, the O’s were 13-5 against the Red Sox and 10-8 against the Rays in 2012.  Had Boston and Tampa not droped the ball compared to the eWin total, the AL East would have looked very different.

Rounding out the league, the White Sox too, apparently, should have beaten out World Series contender Detroit.  The other surprise in the AL? Probably that the Oakland A’s, who had an equally magical season to Baltimore, were actually right in line with where the formula says they should be. That bodes well for Oakland repeating (even though the AL West might be tougher this year).

Now look: It’s not a perfect formula-and that’s a good thing.  If it was, it might rob the game of what makes it compelling - This is why we play the games, as they say.  Still, the formula seems valid.  Using data we know is correct, 17 of 30 teams were within 2 games of what they were predicted to finish. Twenty were within 3 games.  The formula, especially for how simple it is, seems to work very well-and that’s half the problem solved.

The other half, of course, is the tricky part.  While I used runs we knew happened, trying to figure out what will happen in 2013 before 2013 happens is a lot tougher.  A valid argument is only sound with good premises, and the formula is only good if you put in good run projections. Garbage in = Garbage out.  That’s where most of Ouij’s, and others, work comes in. But the above (wholly unscientific) experiment suggests that if you do come up with the right amount of runs scored and allowed, you’ll have a good picture of what will happen that year.