“Come from behind” matchwinners in Cricket

This post analyzes cricket’s (http://en.wikipedia.org/wiki/Cricket) one day international games to identify batsmen who helped their teams in come from behind wins. This is an attempt to identify if some of the big names in cricket are truly worthy of their reputation.

Once matchwinners for past games have been identified (train data), a statistical model based on Random Forest (http://en.wikipedia.org/wiki/Random_forest) has been created to help predict if a particular cricketer (test data) can be a matchwinner.

The data for this analysis is scraped using Python and Pandas from http://www.espncricinfo.com/ci/engine/series/index.html (Not all of the gory code is published in this post though.)

 Two main criteria needed to be defined to model this problem – “come from behind win” and “matchwinner.”
 Here’s my criteria for a “come from behind” win:
  • The team batting second should win the match
  • They must be atleast 2 wickets down at the 25th over
  • The required run rate at the time of 4th wicket should be at least 4.0
 A matchwinner is one who:
  • Is the top scorer in the second innings, with a score of at least 50
  • Has played at least 25 matches

<span class="n">cricket_data</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s">"matches.csv"</span><span class="p">)</span> <span class="c"># see the Data Wrangling section below for details on how this csv file was scraped</span>
<span class="n">cricket_data</span> <span class="o">=</span> <span class="n">psql</span><span class="o">.</span><span class="n">sqldf</span><span class="p">(</span><span class="s">"select team1, count(*) as wins from cricket_data group by team1"</span><span class="p">,</span> <span class="nb">locals</span><span class="p">())</span>
<span class="c">#print cricket_data.head()</span>

<span class="n">y</span> <span class="o">=</span> <span class="n">cricket_data</span><span class="o">.</span><span class="n">wins</span>
<span class="n">labels</span> <span class="o">=</span> <span class="n">cricket_data</span><span class="o">.</span><span class="n">team1</span>
<span class="n">N</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">y</span><span class="p">)</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">N</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span>
<span class="n">width</span> <span class="o">=</span> <span class="mf">0.5</span>
<span class="n">bar</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">bar</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">width</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"blue"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">ylabel</span><span class="p">(</span> <span class="s">'Number of "come from behind" wins'</span> <span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xticks</span><span class="p">(</span><span class="n">x</span><span class="o">+</span><span class="n">width</span><span class="o">/</span><span class="mf">2.0</span><span class="p">,</span> <span class="n">labels</span><span class="p">,</span> <span class="n">rotation</span><span class="o">=</span><span class="mi">90</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="s">'small'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()

<a href="https://springml.com/wp-content/uploads/2015/05/download1.png"><img class="alignnone size-medium wp-image-6539" src="https://springml.com/wp-content/uploads/2015/05/download1-300x227.png" alt="download1" width="300" height="227" /></a>

</span>

<span class="n">cricket_data</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s">"matches.csv"</span><span class="p">)</span>
<span class="n">scorer_data</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s">"attributes.csv"</span><span class="p">)</span> <span class="c">#see Data Wrangling section for how this file is created</span>

<span class="n">cricket_data</span> <span class="o">=</span> <span class="n">psql</span><span class="o">.</span><span class="n">sqldf</span><span class="p">(</span><span class="s">"select scorer1, scorer1Name, scorer1runs, count(*) as countz, sd.matches from cricket_data LEFT OUTER JOIN scorer_data sd ON cricket_data.scorer1=sd.player group by scorer1 order by countz desc"</span><span class="p">,</span> <span class="nb">locals</span><span class="p">())</span>
<span class="c">#cricket_data = cricket_data.head(5)</span>
<span class="c">#cricket_data = psql.sqldf("select scorer1, scorer1Name, scorer1runs, count(*) as countz from cricket_data group by scorer1 order by countz desc", locals())</span>
<span class="n">cricket_data</span> <span class="o">=</span> <span class="n">cricket_data</span><span class="p">[(</span><span class="n">cricket_data</span><span class="o">.</span><span class="n">scorer1runs</span> <span class="o">&gt;</span> <span class="mi">50</span><span class="p">)</span> <span class="o">&amp;</span> <span class="p">(</span><span class="n">cricket_data</span><span class="o">.</span><span class="n">matches</span> <span class="o">&gt;</span> <span class="mi">25</span><span class="p">)]</span><span class="o">.</span><span class="n">reset_index</span><span class="p">()</span>
<span class="n">cricket_data</span> <span class="o">=</span> <span class="n">cricket_data</span><span class="o">.</span><span class="n">head</span><span class="p">(</span><span class="mi">25</span><span class="p">)</span>

<span class="n">y</span> <span class="o">=</span> <span class="n">cricket_data</span><span class="o">.</span><span class="n">countz</span>
<span class="c">#convert this to name</span>
<span class="n">labels</span> <span class="o">=</span> <span class="n">cricket_data</span><span class="o">.</span><span class="n">scorer1Name</span>
<span class="n">N</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">y</span><span class="p">)</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">N</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span>
<span class="c">#print x</span>
<span class="n">width</span> <span class="o">=</span> <span class="mf">0.5</span>
<span class="n">bar</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">bar</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">width</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"green"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">ylabel</span><span class="p">(</span> <span class="s">'Number of "matchwinning" innings'</span> <span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xticks</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">labels</span><span class="p">,</span> <span class="n">rotation</span><span class="o">=</span><span class="mi">90</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="s">'small'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>

<span class="p"><a href="https://springml.com/wp-content/uploads/2015/05/download2.png"><img class="alignnone size-medium wp-image-6540" src="https://springml.com/wp-content/uploads/2015/05/download2-300x225.png" alt="download2" width="300" height="225" /></a></span>