Is it harder to generate xG against Atlético Madrid than any other La Liga team?

Article by Ben Griffis

This short article will discuss a statistical model that we call a “regression model”. This type of model will be used to answer the question in the title. Does Diego Simeone’s Atlético actually have the best defense in La Liga? It’s a much-discussed subject, and a statistical model, a regression model, can help us answer that. For this scenario, my Null Hypothesis will be that there is no difference between Atlético’s defense and the other teams. Null hypotheses are important for statistical tests, as we will use statistics to see if we have evidence to rejuct this null hypothesis as (most likely) flase.

To answer this question, I first gathered information on all La Liga matches from the past 4 completed season and the current 21/22 season (after MD 37, so all but one match for each team has been played). All data is from fbref.com.

Then, using a linear regression model, I controlled for a host of factors :

  • Focal team
    • No base (essentially, with categorical (non-continuous like possession) variables, one of the entries has to be the “base”, or what all others are compared to).
  • Opponent
    • Base = Atlético Madrid
  • Focal team’s possession
  • Match number (1, 2, 3, …, 38)
  • The season
  • Whether the match was home/away (for the focal team, not the opponent)
    • Base = away
  • Day of the week
    • Base = Friday
  • Local time of kickoff
    • Base = 12:00 Noon
  • The referee in charge of the match
    • Base = Adrián Cordero

The model result is shown at the bottom of this post. Basically, it proves that, yes, it has been harder to generate xG against Atlético than any other La Liga team in the past 5 seasons. If we use xG as a measure of a team’s defensive strength, then that shows us Atlético have had the best defense over the last 5 seasons at least.

The way the model shows this is:

  1. All but two clubs (Getafe & Leganés) have significant p-values (bolded, which show there is a statistical difference in something) AND
  2. No other club has a negative “estimate”.

We can read each club’s estimate as: “Controlling for [all factors in the bulleted list above], teams should generate X more xG than when playing against Atlético Madrid”, With X being the estimate. A p-value below 0.05 is typically called “significant”, meaning that the data is inconsistent with the null hypothesis.

Example Case

So, let’s take Barcelona as an example. Controlling for all these factors, teams should generate 0.32 more xG than when playing Atlético Madrid.

And let’s look at two example scenarios. First, a Cádiz match against Atlético, and then a Cádiz match against Barcelona. We’ll keep everything else the same (let’s use the bases, so we’ll say Cádiz is the away team, Friday, noon kickoff, with Adrián Cordero in charge. We’ll also say it’s this season (2022) and match number 10, with Cádiz having 45% possession). The method to calculate any match is:

Team’s estimate + [Opponent Estimate] + (Possession * 0.0006748) + (Match Number * 0.0006748) + (Season End Year * -0.0280301) + [Home/Away Estimate] + [Day Estimate] + [Time Estimate] + [Ref Estimate]

If you use the base cases, you do not need to include those estimates, since we are using those as comparisons.

  • Atlético-Cádiz: 57.2159207 + (45 * 0.0006748) + (10 * 0.0006748) + (2022 * -0.0280301)
    • Cádiz should generate 0.576 xG
  • Barcelona-Cádiz: 57.2159207 + 0.2045895 + (45 * 0.0006748) + (10 * 0.0006748) + (2022 * -0.0280301)
    • Cádiz should generate 0.781 xG

From this we can see how we can interpret Barcelona’s estimate. Keeping all else constant, a team would generate 0.205 more xG against Barcelona than they would against Atlético!

A Final Word: Statistical Models’ Use in Football Data

Statistical models have a lot of possible uses in football. On way we can use them is to see what players might be putting in great (or poor) performances. By including many different variables, instead of just 2-3 like we can on a scatter plot, we can then use a residual plot to visualize players who are exceeding their predicted levels of performance, or who may be far below them.

And then another way to use models is to conduct a statistical test, like I did here, to begin to answer a specific question. I asked a question about teams, but we can also use models to begin answering questions about players or leagues as well. The key is that you include the variable you want to test (here, Atlético as the opponent team base case) and see the significance of that value (in this example, we needed to look at all other opponent teams’ estimates and p-values). Make sure to include other variables to act as controls. That way, we can start isolating the effect of what you want to test!

Linear Regression Output

Note: these values are rounded to 2 places. None of the numbers are exactly 0.00.

  xG
Predictors Estimates CI p-value
Team [Atlético Madrid] 57.49 11.93 – 103.04 0.013
Team [Alavés] 57.22 11.66 – 102.77 0.014
Team [Athletic Club] 57.32 11.77 – 102.87 0.014
Team [Barcelona] 58.03 12.48 – 103.58 0.013
Team [Cádiz] 57.22 11.64 – 102.79 0.014
Team [Celta Vigo] 57.36 11.81 – 102.91 0.014
Team [Deportivo] 57.43 11.90 – 102.95 0.013
Team [Eibar] 57.29 11.74 – 102.83 0.014
Team [Elche] 57.04 11.47 – 102.61 0.014
Team [Espanyol] 57.24 11.69 – 102.78 0.014
Team [Getafe] 57.18 11.63 – 102.73 0.014
Team [Girona] 57.28 11.75 – 102.81 0.014
Team [Granada] 57.28 11.71 – 102.84 0.014
Team [Huesca] 57.23 11.68 – 102.78 0.014
Team [Las Palmas] 57.11 11.59 – 102.63 0.014
Team [Leganés] 57.17 11.63 – 102.71 0.014
Team [Levante] 57.38 11.83 – 102.93 0.014
Team [Málaga] 57.03 11.51 – 102.56 0.014
Team [Mallorca] 57.26 11.69 – 102.82 0.014
Team [Osasuna] 57.24 11.67 – 102.80 0.014
Team [Rayo Vallecano] 57.29 11.73 – 102.85 0.014
Team [Real Betis] 57.41 11.86 – 102.96 0.014
Team [Real Madrid] 57.94 12.39 – 103.50 0.013
Team [Real Sociedad] 57.51 11.96 – 103.06 0.013
Team [Sevilla] 57.52 11.97 – 103.07 0.013
Team [Valencia] 57.43 11.87 – 102.98 0.013
Team [Valladolid] 57.17 11.62 – 102.72 0.014
Team [Villarreal] 57.62 12.07 – 103.17 0.013
Opponent [Alavés] 0.44 0.31 – 0.58 <0.001
Opponent [Athletic Club] 0.20 0.06 – 0.33 0.004
Opponent [Barcelona] 0.20 0.06 – 0.34 0.004
Opponent [Betis] 0.36 0.22 – 0.50 <0.001
Opponent [Cádiz] 0.50 0.32 – 0.68 <0.001
Opponent [Celta Vigo] 0.41 0.27 – 0.55 <0.001
Opponent [Eibar] 0.29 0.15 – 0.44 <0.001
Opponent [Elche] 0.66 0.48 – 0.84 <0.001
Opponent [Espanyol] 0.32 0.18 – 0.47 <0.001
Opponent [Getafe] 0.06 -0.07 – 0.20 0.358
Opponent [Girona] 0.33 0.15 – 0.51 <0.001
Opponent [Granada] 0.45 0.29 – 0.61 <0.001
Opponent [Huesca] 0.37 0.19 – 0.54 <0.001
Opponent [La Coruña] 0.56 0.33 – 0.79 <0.001
Opponent [Las Palmas] 0.79 0.54 – 1.05 <0.001
Opponent [Leganés] 0.10 -0.05 – 0.26 0.192
Opponent [Levante] 0.63 0.49 – 0.77 <0.001
Opponent [Málaga] 0.44 0.20 – 0.68 <0.001
Opponent [Mallorca] 0.53 0.35 – 0.71 <0.001
Opponent [Osasuna] 0.33 0.17 – 0.49 <0.001
Opponent [Rayo Vallecano] 0.38 0.20 – 0.56 <0.001
Opponent [Real Madrid] 0.22 0.08 – 0.36 0.002
Opponent [Real Sociedad] 0.20 0.06 – 0.33 0.004
Opponent [Sevilla] 0.20 0.07 – 0.34 0.004
Opponent [Valencia] 0.33 0.20 – 0.47 <0.001
Opponent [Valladolid] 0.36 0.20 – 0.51 <0.001
Opponent [Villarreal] 0.38 0.24 – 0.51 <0.001
Possession 0.00 0.00 – 0.01 0.013
Match Number 0.00 -0.00 – 0.00 0.521
Season -0.03 -0.05 – -0.01 0.015
Venue [Home] 0.31 0.27 – 0.35 <0.001
Day [Mon] -0.02 -0.14 – 0.09 0.708
Day [Sat] -0.03 -0.14 – 0.08 0.567
Day [Sun] 0.01 -0.09 – 0.12 0.781
Day [Thu] 0.08 -0.07 – 0.22 0.308
Day [Tue] 0.08 -0.08 – 0.23 0.327
Day [Wed] 0.08 -0.06 – 0.22 0.239
Time 13:00 0.06 -0.09 – 0.21 0.423
Time 14:00 -0.03 -0.16 – 0.10 0.642
Time 16:00 -0.05 -0.21 – 0.12 0.577
Time 16:15 -0.09 -0.21 – 0.04 0.167
Time 17:00 -0.27 -0.50 – -0.05 0.017
Time 17:30 0.04 -0.25 – 0.33 0.800
Time 18:00 -0.20 -0.57 – 0.17 0.288
Time 18:15 0.09 -0.24 – 0.41 0.602
Time 18:30 -0.06 -0.18 – 0.05 0.283
Time 18:45 0.23 -0.70 – 1.17 0.626
Time 19:00 -0.14 -0.32 – 0.04 0.127
Time 19:15 0.10 -0.46 – 0.65 0.736
Time 19:30 -0.21 -0.36 – -0.05 0.010
Time 19:45 -0.26 -0.70 – 0.18 0.242
Time 20:00 -0.17 -0.37 – 0.02 0.077
Time 20:15 -0.03 -0.30 – 0.24 0.848
Time 20:30 -0.12 -0.43 – 0.20 0.464
Time 20:45 -0.03 -0.17 – 0.11 0.665
Time 21:00 -0.12 -0.24 – 0.01 0.077
Time 21:15 0.11 -0.33 – 0.54 0.627
Time 21:30 -0.21 -0.41 – -0.00 0.048
Time 22:00 -0.13 -0.30 – 0.03 0.114
Time 22:15 0.08 -0.20 – 0.36 0.565
Time 22:30 0.05 -0.88 – 0.97 0.920
Referee [Alberola Rojas] -0.07 -0.21 – 0.07 0.335
Referee [Alberto Undiano] -0.26 -0.45 – -0.08 0.005
Referee [Alejandro
Hernández]
-0.12 -0.26 – 0.03 0.120
Referee [Alejandro
Muñíz]
-0.05 -0.29 – 0.19 0.687
Referee [Alfonso
Álvarez]
-0.25 -0.51 – 0.00 0.054
Referee [Antonio Matéu
Lahoz]
-0.21 -0.35 – -0.06 0.005
Referee [César Soto] -0.05 -0.22 – 0.11 0.517
Referee [Carlos del
Cerro]
-0.17 -0.31 – -0.02 0.022
Referee [Daniel Trujillo] -0.09 -0.33 – 0.15 0.451
Referee [David
Fernández]
-0.06 -0.31 – 0.18 0.611
Referee [David Medié] -0.04 -0.20 – 0.11 0.607
Referee [Eduardo Prieto] -0.24 -0.43 – -0.06 0.011
Referee [Guillermo
Cuadra]
-0.08 -0.23 – 0.07 0.314
Referee [Ignacio
Iglesias]
-0.12 -0.31 – 0.06 0.201
Referee [Isidro Díaz de
Mera]
-0.01 -0.19 – 0.18 0.938
Referee [Javier Estrada] -0.14 -0.29 – 0.01 0.064
Referee [Jesús Gil] -0.06 -0.21 – 0.08 0.405
Referee [Jorge Figueroa] -0.15 -0.34 – 0.03 0.106
Referee [José González] -0.27 -0.44 – -0.11 0.001
Referee [José Luis
Munuera]
-0.10 -0.25 – 0.05 0.184
Referee [José Munuera] -0.21 -0.45 – 0.03 0.087
Referee [José Sánchez] -0.23 -0.37 – -0.09 0.002
Referee [Juan Martínez] -0.10 -0.25 – 0.04 0.167
Referee [Mario Melero] -0.19 -0.34 – -0.05 0.008
Referee [Miguel Ángel
Ortiz Arias]
-0.06 -0.31 – 0.19 0.625
Referee [Pablo González] -0.25 -0.39 – -0.10 0.001
Referee [Ricardo de
Burgos]
-0.07 -0.21 – 0.07 0.337
Referee [Santiago Jaime] -0.23 -0.37 – -0.09 0.002
Referee [Valentín
Pizarro]
-0.10 -0.26 – 0.07 0.253
Referee [Xavi Estrada] -0.33 -1.27 – 0.62 0.501
Observations 3780
R2 / R2 adjusted 0.799 / 0.793

Header image source

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s