Where Does Your Team Rank in the Boxing Day League table?

Boxing Day is the most wonderful time of year for football fans.

In Britain we have a traditional busy calendar of fixtures the day after Christmas. Television dictates that the matches no longer all kick off at 3pm on 26th December but this year we have 16 of the 20 Premier League clubs playing today.

So how does your team rank in the Boxing Day league table? [skip to find out here]

To find out, I returned to James Curley’s engsoccerdata package. This contains results of all matches in the top four divisions of English football from 1888 to 2015.

Let’s take a look at it:

str(england)
'data.frame': 192004 obs. of 12 variables:
 $ Date : Factor w/ 14273 levels "1888-09-08","1888-09-15",..: 17 23 32 15 6 20 24 7 36 14 ...
 $ Season : num 1888 1888 1888 1888 1888 ...
 $ home : Factor w/ 142 levels "Aberdare Athletic",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ visitor : Factor w/ 142 levels "Aberdare Athletic",..: 10 15 17 26 48 51 95 102 122 132 ...
 $ FT : Factor w/ 95 levels "0-0","0-1","0-10",..: 13 4 39 64 73 46 14 1 36 37 ...
 $ hgoal : num 1 0 2 5 6 3 1 0 2 2 ...
 $ vgoal : num 1 2 3 1 2 1 2 0 0 1 ...
 $ division: Factor w/ 6 levels "1","2","3","3a",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ tier : num 1 1 1 1 1 1 1 1 1 1 ...
 $ totgoal : num 2 2 5 6 8 4 3 0 2 3 ...
 $ goaldif : num 0 -2 -1 4 4 2 -1 0 2 1 ...
 $ result : Factor w/ 3 levels "A","D","H": 2 1 1 3 3 3 1 2 3 3 ...

First up, let’s isolate Boxing Day

We can see that england$Date is in the date format YYYY-MM-DD.

We can use a grep regular expression to get Boxing Day only:

boxing_day <- grep("-12-26",england$Date)
new <- england[boxing_day, ]

Here we are looking for the 26th of December of any year and filtering that into a new data frame called new.

Our new data frame will look just like our source data, but only with Boxing Day dates.

To calculate our league table, we’re going to assume three points for a win (in fact it was only introduced in England in 1981).

Let’s add two new columns, new$home_points and new$away_points and leave them blank for now.

new$home_points <- NA
new$away_points <- NA

Here’s the head of our data:

 Date Season home visitor FT hgoal vgoal
59 1888-12-26 1888 Derby County Bolton Wanderers 2-3 2 3
119 1888-12-26 1888 West Bromwich Albion Preston North End 0-5 0 5
144 1889-12-26 1889 Aston Villa Accrington F.C. 1-2 1 2
191 1889-12-26 1889 Derby County Bolton Wanderers 3-2 3 2
251 1889-12-26 1889 West Bromwich Albion Preston North End 2-2 2 2
256 1889-12-26 1889 Wolverhampton Wanderers Blackburn Rovers 2-4 2 4
 division tier totgoal goaldif result home_points away_points
59 1 1 5 -1 A NA NA
119 1 1 5 -5 A NA NA
144 1 1 3 -1 A NA NA
191 1 1 5 1 H NA NA
251 1 1 4 0 D NA NA
256 1 1 6 -2 A NA NA

You might be able to see that we are effectively going to need to make two tables and join them because each team will sometimes be at home on Boxing Day and sometimes away.

The ‘A’ in new$result[1] (the first row) means an away win (for Bolton against Derby).

Introducing if statements

In any league football match there are three possibilities: a home win, an away win or a draw.

We can use an if statement to fill in our points.

The basic syntax of an if statement is:

if (condition) {
   do something here
}

So a statement to work out whether the home team won would be as follows:

 if (new$result[i]=="H") {
 new$home_points[i] <- 3
 new$away_points[i] <- 0
 }

Where [i] is the row number to be examined.

This works if the home team won, but for a draw or an away win it does nothing

We can add in an else if and an else statement to take care of the other possibilities:

else if (new$result[i]=="A") {
 new$home_points[i] <- 0
 new$away_points[i] <- 3
 } else if (new$result[i]=="D") {
 new$home_points[i] <- 1
 new$away_points[i] <- 1
 }
 
}

Let’s bring in our loop

This is a new and powerful tool in R. Loops enable you to perform the same function multiple times.

They can be a bit fiddly to get right. But once you do it performs thousands of calculations in an instant.

Here’s my for loop to calculate the home and away points for each Boxing Day fixture.

for (i in 1:3556) {
 
 if (new$result[i]=="H") {
 new$home_points[i] <- 3
 new$away_points[i] <- 0
 } else if (new$result[i]=="A") {
 new$home_points[i] <- 0
 new$away_points[i] <- 3
 } else if (new$result[i]=="D") {
 new$home_points[i] <- 1
 new$away_points[i] <- 1
 }
 
}

Let’s break this down:

My for loop contains my if statement. We are telling R ‘for as long i is less than the length of the data frame (i.e. 3,556), do the if statement.

Hitting ‘Run’ on that completes the function 3,556 times. Each time i is increased by one.

This means the first time, it runs the function on row 1 of new, the second time on row 2 etc. up to the final row.

Do you see the power of the for loop?

We now have points scores for each fixture. It’s now time to create our league table.

First of all we’re going to aggregate the points to create totals for each home and away side.

Given that each club will play home or away a roughly even number of times over a long enough time period, we will effectively have two tables for each established league club.

Here’s how we sum the home and away teams’ points:

home_ag <- aggregate(home_points ~ home, new, sum)
away_ag <- aggregate(away_points ~ visitor, new, sum)

We’ll use the merge function to combine each club’s home and away points totals. The ‘home’ and ‘visitor’ columns are our keys because they contain the same club names.

combined <- merge(home_ag, away_ag, by.x = "home",by.y = "visitor")
combined$total_points <- combined$home_points + combined$away_points

Excellent! Here’s a preview of how the league table looks:

It’s adding up all the points accrued from home wins, home draws, away wins and away draws.

The big problem with our table as it stands is that it pays no account of how many Boxing Day fixtures a club has played.

A team who has only played 30 Boxing Day fixtures in their history (perhaps they fell out of the top four divisions at some point) would only be able to derive a maximum of 90 points – not enough to make our top 10 even with an improbably perfect road.

It’s no surprise then that the top of this table is filled with the stalwarts of the English game such as Manchester United, Preston North End and West Bromwich Albion.

A much better way would be to use points per game.

We’ll use the aggregate function again, this time with length as our function to count the number of times the teams appear.

home_count <- aggregate(home_points ~ home, new, length)
away_count <- aggregate(away_points ~ visitor, new, length)

combined_count <- merge(home_count, away_count, by.x="home", by.y = "visitor")
#remember here that 'points' doesn't actually mean points here, it means the count of points
combined_count$totalmatches <- combined_count$home_points + combined_count$away_points

We’ll now join them together using merge again and work out points per game:

full <- merge(combined, combined_count, by = "home")
full$ppg <- full$total_points / full$totalmatches

We’re nearly there!

All this point I’m going to have to clean the data to remove the teams that are not currently part of the top four divisions.

I’ll use write.table to print off a CSV.

Here’s our final table:

Rank Club Points Matches Points per game
1 Burton Albion 12 4 3.0
2 Milton Keynes Dons 25 11 2.3
3 Manchester United 165 92 1.8
4 Oxford United 68 39 1.7
5 Brighton & Hove Albion 122 71 1.7
6 Preston North End 148 87 1.7
7 Liverpool 144 87 1.7
8 Nottingham Forest 138 85 1.6
9 Huddersfield Town 123 76 1.6
10 Luton Town 109 69 1.6
11 Chelsea 131 83 1.6
12 Everton 138 88 1.6
13 Tottenham Hotspur 122 78 1.6
14 Ipswich Town 91 59 1.5
15 Accrington Stanley 40 26 1.5
16 Hartlepool United 106 69 1.5
17 Cambridge United 43 28 1.5
18 Aston Villa 146 96 1.5
19 Carlisle United 95 63 1.5
20 Arsenal 123 82 1.5
20 Chesterfield 114 76 1.5
22 Sheffield United 139 93 1.5
23 Derby County 141 95 1.5
24 West Bromwich Albion 148 100 1.5
24 Coventry City 111 75 1.5
26 Fulham 115 78 1.5
27 Southend United 107 73 1.5
28 Middlesbrough 123 84 1.5
29 Hull City 120 82 1.5
30 Gillingham 98 67 1.5
31 Manchester City 137 94 1.5
32 Wigan Athletic 48 33 1.5
33 Exeter City 96 67 1.4
34 Plymouth Argyle 107 75 1.4
35 Reading 104 73 1.4
36 Brentford 101 71 1.4
37 Southampton 105 74 1.4
38 Portsmouth 102 72 1.4
39 Stoke City 127 90 1.4
40 Bradford City 110 78 1.4
41 Crystal Palace 100 71 1.4
42 Bristol Rovers 101 72 1.4
43 Cardiff City 105 75 1.4
43 Cheltenham Town 21 15 1.4
45 Barnsley 123 88 1.4
46 Swindon Town 102 73 1.4
47 Charlton Athletic 95 68 1.4
48 Mansfield Town 78 56 1.4
49 Birmingham City 122 88 1.4
50 Scunthorpe United 69 50 1.4
51 AFC Bournemouth 95 69 1.4
52 Barnet 22 16 1.4
53 Walsall 103 75 1.4
54 Blackburn Rovers 122 89 1.4
55 Northampton Town 94 69 1.4
56 Watford 92 68 1.4
57 Rotherham United 89 66 1.3
58 Millwall 91 68 1.3
59 Bristol City 111 83 1.3
60 Leeds United 100 75 1.3
61 Wolverhampton Wanderers 129 97 1.3
62 Bolton Wanderers 122 92 1.3
63 Port Vale 95 72 1.3
64 Leyton Orient 105 81 1.3
65 Sunderland 120 93 1.3
66 Morecambe 9 7 1.3
67 Shrewsbury Town 68 53 1.3
68 Sheffield Wednesday 118 92 1.3
69 Crewe Alexandra 91 71 1.3
70 Notts County 103 81 1.3
71 Doncaster Rovers 89 70 1.3
72 Leicester City 109 86 1.3
73 Norwich City 89 71 1.3
74 Swansea City 94 75 1.3
75 Bury 99 79 1.3
76 Oldham Athletic 98 80 1.2
77 Stevenage Borough 6 5 1.2
77 Crawley Town 6 5 1.2
77 Newcastle United 108 90 1.2
80 Rochdale 76 64 1.2
81 Grimsby Town 90 79 1.1
82 Colchester United 60 53 1.1
83 Blackpool 95 84 1.1
84 Queens Park Rangers 79 71 1.1
85 Peterborough United 52 47 1.1
86 Burnley 92 84 1.1
87 West Ham United 74 69 1.1
88 Wycombe Wanderers 22 21 1.0
89 AFC Wimbledon 4 4 1.0
89 Fleetwood Town 4 4 1.0
91 Yeovil Town 10 11 0.9
92 Newport County 42 48 0.9

If past results are anything to go by Aston Villa should be a little concerned lining up against Burton Albion today.Analysis

The Brewers have won all four of their Boxing Day fixtures in their history in the English football league – the only current side to do so (funnily enough, the only other one was the now-defunct Burton Wanderers).

Another relative newcomer to the top four divisions are Milton Keynes Dons.

They too have an excellent festive record, picking up 2.3 points a game.

However, it’s unlikely that either club will match Manchester United’s current record in 100 years’ time.

It’s little surprise that the club with the most top division titles is also the best long-term performer on Boxing Day.

Obviously United will generally be up against a tougher calibre of opposition than that of Burton Albion or the MK Dons.

Championship high-fliers Brighton are in fifth place while in seventh comes the club with the second-highest number of top division titles Liverpool.

At the other end, Yeovil and Newport County both have a dismal Boxing Day record, picking up less than a point a game.

Today they play Exeter and Portsmouth respectively, both of whom pick up 1.4 points per game on 26th December.

The table is rounded to one decimal place but it is sorted by more than one decimal point.

For example, fourth-placed Oxford United’s record is 1.7436 points per game compared to 1.7183 for fifth-placed Brighton.

In summary:

We have looked at if statements and for loops for the first time to create a Boxing Day all-time league table!

Remember you can see a list of functions and tips on the cheat sheet.

Leave a Reply

Your email address will not be published. Required fields are marked *