Boxing Day is the most wonderful time of year for football fans.
In Britain we have a traditional busy calendar of fixtures the day after Christmas. Television dictates that the matches no longer all kick off at 3pm on 26th December but this year we have 16 of the 20 Premier League clubs playing today.
So how does your team rank in the Boxing Day league table? [skip to find out here]
To find out, I returned to James Curley’s engsoccerdata package. This contains results of all matches in the top four divisions of English football from 1888 to 2015.
Let’s take a look at it:
str(england) 'data.frame': 192004 obs. of 12 variables: $ Date : Factor w/ 14273 levels "1888-09-08","1888-09-15",..: 17 23 32 15 6 20 24 7 36 14 ... $ Season : num 1888 1888 1888 1888 1888 ... $ home : Factor w/ 142 levels "Aberdare Athletic",..: 3 3 3 3 3 3 3 3 3 3 ... $ visitor : Factor w/ 142 levels "Aberdare Athletic",..: 10 15 17 26 48 51 95 102 122 132 ... $ FT : Factor w/ 95 levels "0-0","0-1","0-10",..: 13 4 39 64 73 46 14 1 36 37 ... $ hgoal : num 1 0 2 5 6 3 1 0 2 2 ... $ vgoal : num 1 2 3 1 2 1 2 0 0 1 ... $ division: Factor w/ 6 levels "1","2","3","3a",..: 1 1 1 1 1 1 1 1 1 1 ... $ tier : num 1 1 1 1 1 1 1 1 1 1 ... $ totgoal : num 2 2 5 6 8 4 3 0 2 3 ... $ goaldif : num 0 -2 -1 4 4 2 -1 0 2 1 ... $ result : Factor w/ 3 levels "A","D","H": 2 1 1 3 3 3 1 2 3 3 ...
First up, let’s isolate Boxing Day
We can see that england$Date is in the date format YYYY-MM-DD.
We can use a grep regular expression to get Boxing Day only:
boxing_day <- grep("-12-26",england$Date) new <- england[boxing_day, ]
Here we are looking for the 26th of December of any year and filtering that into a new data frame called new.
Our new data frame will look just like our source data, but only with Boxing Day dates.
To calculate our league table, we’re going to assume three points for a win (in fact it was only introduced in England in 1981).
Let’s add two new columns, new$home_points and new$away_points and leave them blank for now.
new$home_points <- NA new$away_points <- NA
Here’s the head of our data:
Date Season home visitor FT hgoal vgoal 59 1888-12-26 1888 Derby County Bolton Wanderers 2-3 2 3 119 1888-12-26 1888 West Bromwich Albion Preston North End 0-5 0 5 144 1889-12-26 1889 Aston Villa Accrington F.C. 1-2 1 2 191 1889-12-26 1889 Derby County Bolton Wanderers 3-2 3 2 251 1889-12-26 1889 West Bromwich Albion Preston North End 2-2 2 2 256 1889-12-26 1889 Wolverhampton Wanderers Blackburn Rovers 2-4 2 4 division tier totgoal goaldif result home_points away_points 59 1 1 5 -1 A NA NA 119 1 1 5 -5 A NA NA 144 1 1 3 -1 A NA NA 191 1 1 5 1 H NA NA 251 1 1 4 0 D NA NA 256 1 1 6 -2 A NA NA
You might be able to see that we are effectively going to need to make two tables and join them because each team will sometimes be at home on Boxing Day and sometimes away.
The ‘A’ in new$result[1] (the first row) means an away win (for Bolton against Derby).
Introducing if statements
In any league football match there are three possibilities: a home win, an away win or a draw.
We can use an if statement to fill in our points.
The basic syntax of an if statement is:
if (condition) { do something here }
So a statement to work out whether the home team won would be as follows:
if (new$result[i]=="H") { new$home_points[i] <- 3 new$away_points[i] <- 0 }
Where [i] is the row number to be examined.
This works if the home team won, but for a draw or an away win it does nothing
We can add in an else if and an else statement to take care of the other possibilities:
else if (new$result[i]=="A") { new$home_points[i] <- 0 new$away_points[i] <- 3 } else if (new$result[i]=="D") { new$home_points[i] <- 1 new$away_points[i] <- 1 } }
Let’s bring in our loop
This is a new and powerful tool in R. Loops enable you to perform the same function multiple times.
They can be a bit fiddly to get right. But once you do it performs thousands of calculations in an instant.
Here’s my for loop to calculate the home and away points for each Boxing Day fixture.
for (i in 1:3556) { if (new$result[i]=="H") { new$home_points[i] <- 3 new$away_points[i] <- 0 } else if (new$result[i]=="A") { new$home_points[i] <- 0 new$away_points[i] <- 3 } else if (new$result[i]=="D") { new$home_points[i] <- 1 new$away_points[i] <- 1 } }
Let’s break this down:
My for loop contains my if statement. We are telling R ‘for as long i is less than the length of the data frame (i.e. 3,556), do the if statement.
Hitting ‘Run’ on that completes the function 3,556 times. Each time i is increased by one.
This means the first time, it runs the function on row 1 of new, the second time on row 2 etc. up to the final row.
Do you see the power of the for loop?
We now have points scores for each fixture. It’s now time to create our league table.
First of all we’re going to aggregate the points to create totals for each home and away side.
Given that each club will play home or away a roughly even number of times over a long enough time period, we will effectively have two tables for each established league club.
Here’s how we sum the home and away teams’ points:
home_ag <- aggregate(home_points ~ home, new, sum) away_ag <- aggregate(away_points ~ visitor, new, sum)
We’ll use the merge function to combine each club’s home and away points totals. The ‘home’ and ‘visitor’ columns are our keys because they contain the same club names.
combined <- merge(home_ag, away_ag, by.x = "home",by.y = "visitor") combined$total_points <- combined$home_points + combined$away_points
Excellent! Here’s a preview of how the league table looks:
It’s adding up all the points accrued from home wins, home draws, away wins and away draws.
The big problem with our table as it stands is that it pays no account of how many Boxing Day fixtures a club has played.
A team who has only played 30 Boxing Day fixtures in their history (perhaps they fell out of the top four divisions at some point) would only be able to derive a maximum of 90 points – not enough to make our top 10 even with an improbably perfect road.
It’s no surprise then that the top of this table is filled with the stalwarts of the English game such as Manchester United, Preston North End and West Bromwich Albion.
A much better way would be to use points per game.
We’ll use the aggregate function again, this time with length as our function to count the number of times the teams appear.
home_count <- aggregate(home_points ~ home, new, length) away_count <- aggregate(away_points ~ visitor, new, length) combined_count <- merge(home_count, away_count, by.x="home", by.y = "visitor") #remember here that 'points' doesn't actually mean points here, it means the count of points combined_count$totalmatches <- combined_count$home_points + combined_count$away_points
We’ll now join them together using merge again and work out points per game:
full <- merge(combined, combined_count, by = "home") full$ppg <- full$total_points / full$totalmatches
We’re nearly there!
All this point I’m going to have to clean the data to remove the teams that are not currently part of the top four divisions.
I’ll use write.table to print off a CSV.
Here’s our final table:
Rank | Club | Points | Matches | Points per game |
1 | Burton Albion | 12 | 4 | 3.0 |
2 | Milton Keynes Dons | 25 | 11 | 2.3 |
3 | Manchester United | 165 | 92 | 1.8 |
4 | Oxford United | 68 | 39 | 1.7 |
5 | Brighton & Hove Albion | 122 | 71 | 1.7 |
6 | Preston North End | 148 | 87 | 1.7 |
7 | Liverpool | 144 | 87 | 1.7 |
8 | Nottingham Forest | 138 | 85 | 1.6 |
9 | Huddersfield Town | 123 | 76 | 1.6 |
10 | Luton Town | 109 | 69 | 1.6 |
11 | Chelsea | 131 | 83 | 1.6 |
12 | Everton | 138 | 88 | 1.6 |
13 | Tottenham Hotspur | 122 | 78 | 1.6 |
14 | Ipswich Town | 91 | 59 | 1.5 |
15 | Accrington Stanley | 40 | 26 | 1.5 |
16 | Hartlepool United | 106 | 69 | 1.5 |
17 | Cambridge United | 43 | 28 | 1.5 |
18 | Aston Villa | 146 | 96 | 1.5 |
19 | Carlisle United | 95 | 63 | 1.5 |
20 | Arsenal | 123 | 82 | 1.5 |
20 | Chesterfield | 114 | 76 | 1.5 |
22 | Sheffield United | 139 | 93 | 1.5 |
23 | Derby County | 141 | 95 | 1.5 |
24 | West Bromwich Albion | 148 | 100 | 1.5 |
24 | Coventry City | 111 | 75 | 1.5 |
26 | Fulham | 115 | 78 | 1.5 |
27 | Southend United | 107 | 73 | 1.5 |
28 | Middlesbrough | 123 | 84 | 1.5 |
29 | Hull City | 120 | 82 | 1.5 |
30 | Gillingham | 98 | 67 | 1.5 |
31 | Manchester City | 137 | 94 | 1.5 |
32 | Wigan Athletic | 48 | 33 | 1.5 |
33 | Exeter City | 96 | 67 | 1.4 |
34 | Plymouth Argyle | 107 | 75 | 1.4 |
35 | Reading | 104 | 73 | 1.4 |
36 | Brentford | 101 | 71 | 1.4 |
37 | Southampton | 105 | 74 | 1.4 |
38 | Portsmouth | 102 | 72 | 1.4 |
39 | Stoke City | 127 | 90 | 1.4 |
40 | Bradford City | 110 | 78 | 1.4 |
41 | Crystal Palace | 100 | 71 | 1.4 |
42 | Bristol Rovers | 101 | 72 | 1.4 |
43 | Cardiff City | 105 | 75 | 1.4 |
43 | Cheltenham Town | 21 | 15 | 1.4 |
45 | Barnsley | 123 | 88 | 1.4 |
46 | Swindon Town | 102 | 73 | 1.4 |
47 | Charlton Athletic | 95 | 68 | 1.4 |
48 | Mansfield Town | 78 | 56 | 1.4 |
49 | Birmingham City | 122 | 88 | 1.4 |
50 | Scunthorpe United | 69 | 50 | 1.4 |
51 | AFC Bournemouth | 95 | 69 | 1.4 |
52 | Barnet | 22 | 16 | 1.4 |
53 | Walsall | 103 | 75 | 1.4 |
54 | Blackburn Rovers | 122 | 89 | 1.4 |
55 | Northampton Town | 94 | 69 | 1.4 |
56 | Watford | 92 | 68 | 1.4 |
57 | Rotherham United | 89 | 66 | 1.3 |
58 | Millwall | 91 | 68 | 1.3 |
59 | Bristol City | 111 | 83 | 1.3 |
60 | Leeds United | 100 | 75 | 1.3 |
61 | Wolverhampton Wanderers | 129 | 97 | 1.3 |
62 | Bolton Wanderers | 122 | 92 | 1.3 |
63 | Port Vale | 95 | 72 | 1.3 |
64 | Leyton Orient | 105 | 81 | 1.3 |
65 | Sunderland | 120 | 93 | 1.3 |
66 | Morecambe | 9 | 7 | 1.3 |
67 | Shrewsbury Town | 68 | 53 | 1.3 |
68 | Sheffield Wednesday | 118 | 92 | 1.3 |
69 | Crewe Alexandra | 91 | 71 | 1.3 |
70 | Notts County | 103 | 81 | 1.3 |
71 | Doncaster Rovers | 89 | 70 | 1.3 |
72 | Leicester City | 109 | 86 | 1.3 |
73 | Norwich City | 89 | 71 | 1.3 |
74 | Swansea City | 94 | 75 | 1.3 |
75 | Bury | 99 | 79 | 1.3 |
76 | Oldham Athletic | 98 | 80 | 1.2 |
77 | Stevenage Borough | 6 | 5 | 1.2 |
77 | Crawley Town | 6 | 5 | 1.2 |
77 | Newcastle United | 108 | 90 | 1.2 |
80 | Rochdale | 76 | 64 | 1.2 |
81 | Grimsby Town | 90 | 79 | 1.1 |
82 | Colchester United | 60 | 53 | 1.1 |
83 | Blackpool | 95 | 84 | 1.1 |
84 | Queens Park Rangers | 79 | 71 | 1.1 |
85 | Peterborough United | 52 | 47 | 1.1 |
86 | Burnley | 92 | 84 | 1.1 |
87 | West Ham United | 74 | 69 | 1.1 |
88 | Wycombe Wanderers | 22 | 21 | 1.0 |
89 | AFC Wimbledon | 4 | 4 | 1.0 |
89 | Fleetwood Town | 4 | 4 | 1.0 |
91 | Yeovil Town | 10 | 11 | 0.9 |
92 | Newport County | 42 | 48 | 0.9 |
If past results are anything to go by Aston Villa should be a little concerned lining up against Burton Albion today.Analysis
The Brewers have won all four of their Boxing Day fixtures in their history in the English football league – the only current side to do so (funnily enough, the only other one was the now-defunct Burton Wanderers).
Another relative newcomer to the top four divisions are Milton Keynes Dons.
They too have an excellent festive record, picking up 2.3 points a game.
However, it’s unlikely that either club will match Manchester United’s current record in 100 years’ time.
It’s little surprise that the club with the most top division titles is also the best long-term performer on Boxing Day.
Obviously United will generally be up against a tougher calibre of opposition than that of Burton Albion or the MK Dons.
Championship high-fliers Brighton are in fifth place while in seventh comes the club with the second-highest number of top division titles Liverpool.
At the other end, Yeovil and Newport County both have a dismal Boxing Day record, picking up less than a point a game.
Today they play Exeter and Portsmouth respectively, both of whom pick up 1.4 points per game on 26th December.
The table is rounded to one decimal place but it is sorted by more than one decimal point.
For example, fourth-placed Oxford United’s record is 1.7436 points per game compared to 1.7183 for fifth-placed Brighton.
In summary:
We have looked at if statements and for loops for the first time to create a Boxing Day all-time league table!
Remember you can see a list of functions and tips on the cheat sheet.
Guys, i just visited a website,actually i was searching for National League North and landed there.And seriously i was amazed by there content.Just take a look guys.
https://thenln.com/national-league-north-table-2/