Article by Ben Griffis
My process for finding players with interesting data profiles is, like, most, to cast a wide net and then work on paring the initial list down with more filters. It’s an iterative process and I have a few methods I use to find players, either by filtering raw datasets, making scatters, filtering event data, or generating player role scores.
Usually, if I’m not just playing around but instead actively looking for a few players to spend time watching games of, I combine several of these methods.
I’ve recently started getting into Spain’s second division, La Liga SmartBank (or LaLiga2, or Segunda División), for a few reasons, not least because of how great the football is. And how easy it is to watch every single match on replay. And the quantity & quality of data. And that maybe I’ll get lucky and someone in a club sees my work and likes it. Whenever I start diving into a new league, regardless of my level of knowledge coming into it, I like to find interesting players to focus on, using them as my entry point to the league.
For this article I’ll be looking for a couple young (24 and younger) midfielders (CMs or DMs). My goal is two-fold, both to find young prospects to potentially keep an eye on in the future, as well as finding interesting players to be my entry point into watching a few teams.
Step 1: Filtering Criteria
Step 0 is always setting the base criteria for my sample. I’m going to use 700 minutes minimum, which is about 25% of the season at time of writing (31 games). This is a little low, but for casting a big net it’s important to me to have a lower number than I would for, say, analyzing a player’s performances. I’m looking for interesting young players, some of who may not start every single match. The key to remember is that I’m not analyzing, but in a discovery phase.
I get a sample size of 82 CMs and DMs (based on Wyscout’s main position for the player) after this. That’s a large sample, and an average of a little under 4 players per team, which means I’ve probably captured all the midfielders who play some role for their team this season.
The next step, step 1, is to cast my net by setting filters on several metrics I deem important to the type of player I’m looking for: just an all-rounder. I want to find some young CMs that can do a wide range of duties at a decent level. I’m not necessarily looking for the best U24 playmaker, or best U24 ball-winner… just some talented young CMs in a range of tasks.
To do this step, I calculate Z-Scores for all player metrics, ranked against the other players in my sample. This brings me to a quick tangent (I’m explaining my methods for player research more in this article than I normally would, as I’ll just link back to this in future articles so I can get straight to the point later on).
Z-Scores vs Percentiles
Z-Scores and percentile rankings are similar in nature. Both aim to grade one players’ score in a metric to all others in a given sample. However, there’s a key difference that makes me prefer z-scores for many things: percentiles lose all of the information of a metric’s distribution… while z-scores retain that info.
Percentiles take a distribution and flatten it out, spacing every player an equal distance from each other. We lose some granularity. Z-scores, however, keep that information as z-scores are simply the number of standard deviations a player’s number is from the mean. A z-score of 1 means the player is 1 standard deviation above the average of that metric. A z-score of -0.5 means they are half a standard deviation below the average.
These pictures below demonstrate this point. First, this very basic excel graph visualizes 500 randomly-generated values based on a normal distribution with mean = 0, and standard deviation = 1. The data are visualized with 30 bins.
This next image shows the percentiles of the same data, again with 30 bins for comparison’s sake. Notice how the distribution disappears, and we basically have a perfectly linear distribution, as all points get laid on a line. We lose the information in our distribution.
Z-scores don’t lost that information, as we see below (again, 30 bins).
Both percentiles and z-scores normalize raw data so that we can filter each metric without having to specify very specific things like this much xG per 90, or that many passes per 90, etc.
I like to keep the information of a metric’s distribution in my filtering criteria, but it’s also a personal choice. To me, filtering to show players at or above half a standard deviation of the mean is important, rather than specifically the top x% of players. When looking for extreme players in a metric, you won’t see much difference. But for the purpose I have here, I want to find decent players performing decently in several metrics, and want to use z-scores. Pick what you prefer for your projects, there’s no “right” or “wrong” method.
Next, we need to choose what metrics to filter players by. The specific metrics you use for filtering will change based on your end goal. For this article, I’m going to select some general metrics. If you’re looking at playmaking, you’ll want different metrics. If you’re writing a Recruitment Plan for your favorite club, you’ll want to use metrics that address your team’s specific needs/role requirements.
When I’m just playing around, I tend to switch up the metrics and just see who comes up for various combinations. But for an article, it might be hard to keep track of that so I’ll be more linear.
Since I’m looking for a few players who are decent in a range of midfield duties, I’ve selected these metrics to start, and a quick note as to why:
- Number of passes/90 (involved in their team’s play)
- Short & Medium pass completion % (decent at passing)
- Long pass completion % (decent at making long passes)
- Smart passes/90 (willing to take some creative passing risks)
- Progressive carries/90 (willingness & ability to carry)
- Shot assists/90 (involved higher up the pitch)
- Defensive duel win % (not a liability in defense)
The z-score I’ll use at first is -0.25. Basically, I want to see what players aged 24 or younger are at least slightly below average for all of these metrics. Again, I’m not looking for 1-2 players who are outstanding at each metric. Just several players who are record at least decent numbers overall.
Here are the players meeting those criteria:
César Gelabert of Mirandés, Enzo Loiodice of Las Palmas (who I’ve heard decent things about, so glad to see him in this basic filter), and two FC Andorra players, Jandro Orellana and Iván Gil.
To be honest, I’m relatively surprised that there are just 4 players. Of course, if we take away the age criteria we have 9 players (shown below), and it’s funny that we have four U23 players and then five O30 players… nobody right in the middle!
Going through the filters, the only filter which, if taken away, that gives at least six U24 players is defensive duel win rates. That output is shown below. So, we can possibly take a look at Ponferradina’s Nwakali or Gijón’s Pedro Díaz if we don’t like the look of any of our main 4
The final look I want to have from a filtering standpoint is how many of our 4 main players record at least average (z-score = 0) numbers in all our filters:
This tells us that Loiodice and Gil might be the “better” of the 4 players initially output, or at least, may be the ones to focus on first.
Step 2: Role Ranking
The next step I like to take is see if any of the players from this output rank highly for the particular role I’m searching for. For this, I’m mainly looking at an all-around midfielder, one who can do most things decently, and of course will have their own areas of strength.
My reasoning behind seeing how players rank in a role score is mainly to validate the output players as well as see which of them might have higher levels of performance than the others, or potentially who might have just gotten into my filtering net by the skin of their teeth.
So, I have a “rounded midfielder” role score that I calculated using weighted z-scores. I won’t write out the exact metric weightings, but I have several “buckets” of metrics. They are defensive positioning, ball winning, aerial, progression, dribbling, creativity, shooting, and also minutes. Each of these buckets is created by weighting individual metric inputs. The final role score is then calculated by weighting all of my buckets. For a rounded midfielder, the weightings are essentially equal, as a rounded midfielder should naturally be able to perform most any task fairly well, whether that be on a defensive side, or by creating, or progression, etc.
Below is a table of the top 15 CMs & DMs in the league (with at least 700 minutes) based on this “rounded midfielder” score.
3 of our 4 main players rank in the top 15 players (which is about the top 18% of CMs/DMs): Iván Gil ranks 4th, Enzo Loiodice 6th, and César Gelabert ranks 9th. Not bad, and this tells me that these 3 players are all well above average midfielders. Despite having a fairly conservative initial filter (as in, we included players who were slightly below average or better in several metrics), the players who made that cut are all fairly rounded as well as putting up decent numbers overall.
Jandro Orellana of FC Andorra is the player from our initial filter who didn’t make this cut, so we might throw him off our list of potential candidates, but I’ll keep him in the back of my mind for the future.
And do you remember those 2 “extra” players that didn’t make the initial cut because of their relatively poor defensive duel win rate? Well, Nwakali ranks #3 on this list, and Díaz ranks #5. That tells me that while they may have a fairly low win rate in defensive duels, they are probably quite good in most other areas.
Because of that, they’re back on the list! Like I’ve said before, this is an iterative process and since I’m not putting a hard minimum level of performance down for any one metric, they seem interesting enough to warrant further consideration. So our list is, at this time:
- César Gelabert
- Enzo Loiodice
- Iván Gil
- Kelechi Nwakali
- Pedro Díaz
5 is actually a solid number to move forward with. Sometimes when I’m finished playing around with filters and role scores, I have 7 to 10 players, other times I end up with 1 or 2. 5 is a good number for this exercise, so I’m happy it’s worked out. Without doing this before sitting down to write, I was worried there either wouldn’t be enough players, or that this section would go on forever!
Step 3: Radars/Profiles
Before I post each players’ radars, I need to say that my radars should mainly be used as a tool for seeing basic info of player style more than “how good is this player?” You can get some good performance indicators from them, but most metrics would be better suited to speaking on player style than performance. Even something like xG per shot, while one of the more “performance-leaning” metrics, is still closer to style than anything. It could tell us how good a player is at picking positions to shoot from, but it’s much better when used in parallel with shots taken, as we can see if a player likes to take more shots from deep or maybe fewer shots and only when they’re closer to the goal (this is especially the case with midfielders).
I won’t go in detail about each players’ radar. I’ll add some thoughts after. So, here are each of the radars:
We can see how these players each ranked in the top 10 for the Rounded Midfielder role score. Of these players, Pedro Díaz’s and Iván Gil’s radars impressed me the most, with how many areas of the game they’re able to record high percentile scores for. Interestingly, I mentioned just above about using number of shots and npxG per shot together… well, both of these players have a relatively very high number of shots, and low npxG per shot (Díaz’s npxG per shot is very low). That tells me they both like to shoot, likely from distance since they are central midfielders, not attacking.
Overall, all of these players have a good statistical profile for an all-around central midfielder, with each having areas they may be poor in, and areas they might excel in. Díaz looks to be a player more keen on progressing the ball, making plays, and taking shots. Iván Gil looks to be heavily involved in most aspects when he’s on the pitch (just 1,179 minutes so far), and performs well in all areas. Loiodice and Gelabert are both good all-around players whose development will be fun to track. These are probably the main players I’d focus my time on, as Nwakali is good, but records a wide range of high/low numbers in all areas, and at 24 appears to be a solid option in Segunda and I’d have questions, data-wise, if he’d be able to be a good option for a La Liga side, for example.
4 players is a good end point:
- Iván Gil, FC Andorra
- Pedro Díaz, Gijón
- Enzo Loiodice, Las Palmas
- César Gelabert, Mirandés
Step 4: Event Maps
At this point, in a league that I have event data for, I like to visualize a few things for the players. In leagues that I don’t have event data for, I typically am done and move forward with whatever players I identify as being the main players to follow or watch games of to really understand them as a player. So, if you don’t care about looking at some extra graphs of these players, you can take away those 4 names above as the 4 young rounded CMs who might be worth a deeper look. But since I have event data, I’ll share some different visualizations of these players.
First, here’s Iván Gil’s key attacking actions in the opponent’s half, notably shots, dribbles, and shot assists. We can see that he does not refrain from long shots. Note the large concentration of shots outside the box on the left. He’s got one goal from there, and two goals from the top of the box.
Gil also has assisted many shots, something his radar shows as well as this image. They come from all over the pitch, although the end location of some of these shot assists makes me think that while they ARE assisting shots, the receiving/shooting player likely carries the ball into a better spot to shoot. Several shot assists appear to be crosses, so he must be a decent crosser (89th percentile of cross completion % backs that up).
Continuing with Gil, here are all open-play passes he has made when he’s in the final third. We see a concentration of passes on the left, in a similar location to his shots outside the box. As a CM, he will pop up everywhere, but we can see that he’s mainly playing on the left.
Of his 267 open-play passes in the final third, 62.5% of those will keep possession in the final third, a really strong number. 7.2% of his completions assist shots, which I’ve found to be about the average rate of all the players I’ve made these graphs for. We see he’s not afraid to try a cross either.
Here are Gil’s top pass clusters. This graph shows the most common similar completed passes to each other, based on the start and end point of the pass. One cluster are short corners, likely a sign that FC Andorra make use of short corners. 3 other clusters are lateral or back passes, which tells me that while the graphs above show he can make plenty of plays, he’s also a good at retaining and recycling possession. Finally, one cluster are passes from the left half-space onto the left flank in the final third.
Final graph I’ll share for Gil is his defensive actions. Most of these are ball recoveries, as expected for a midfielder who, from what I understand, is not a defensive midfielder. There are many ball recoveries in the opposition’s half, although I can’t speak to whether that’s a positive to Gil or simply because FC Andorra play in a way that means CMs will naturally have high recoveries.
I won’t explain as much of each graph as I did for Gil, since if you’ve never seen these graphs before now, you should be able to interpret them well enough! But I will share a few graphs for these other 3 players, starting with Díaz.
Overall, what stands out to me in Díaz’s passes is the seemingly long average length. A lot of his passes appear to be much longer than Gil’s, for example. Of course, without being an expert on the teams or players, this could be something inherent to Díaz, or it could be a result of Gijón’s tactics. Worth a look, at least, when watching him.
Pedro Díaz’s average pass length really comes out in the passes in the final third above (lots of long passes in all directions), and especially his passes into the final third, shown below. He plays a lot of passes from the middle of the pitch onto the flank near the touchline. He also plays quite a lot of switches, particularly from the left half-space onto the right flank. Again, this is something to look for when watching him: is this a trait of Díaz’s, or do Gijón play longer passes on average? Regardless, it appears he can make longer passes, which is a good skill to have as a midfielder.
Finally, here are his defensive actions. Compared to Iván Gil, Pedro Díaz contributes a bit more tackles and interceptions, and also has a lot of defensive actions in both the defensive and attacking half. It appears that he’s both winning and able to help defend the right flank.
Pedro Díaz really stands out to me in all of his data. His defensive contributions, play making ability, and passing range seem very interesting and I will be watching some Gijón games to see if his good-looking data translates to looking good on the pitch.
Enzo Loiodice’s event maps are really pleasing to look at. There’s a ton of positives in these. Of course, the sheer number of actions after almost 2,400 minutes played will be high, but particularly look at, for example, his final third retention rate and percentage of completed passes in the final third that assist shots. Not to mention his very strong dribbling ability.
As I wrote earlier in this article: I am writing this as I’m going through everything. I’m doing this so that you get a decent glimpse into my process for finding and researching players, as well as my thought process on a couple data points, particularly when comparing a couple players I’ve Identified I want to do more work on.
Enzo Loiodice is now #1 in my head of these 4 players, from a data perspective. Unless César Gelabert’s data is more interesting than Loiodice’s I will start my dive into these players with Loiodice because of these images, as well as what we found in the first few steps.
74 tackles, 153 ball recoveries. That’s an oddly high rate from what I have seen in many non-purely-defensive midfielders’ data. Almost a tackle every 2 ball recoveries. Pedro Díaz is at about 1 tackle every 4 recoveries, and Iván Gil is about 1 tackle every 6 ball recoveries.
Again, like I said, all the work before the event data told me to look into Enzo Loiodice. But looking at his event data now tells me to look at Enzo Loiodice first.
As expected, Gelabert’s event data is interesting, but not enough to push him to the top of my list. Some very good elements that we can’t see in Wyscout metrics, though, especially his strength at enabling play in the final third. His passes in the final third are interesting to me because of the very high proportion of lateral passes in front of the box (both left and right side) and his high completion % of those passes. Almost all of his incomplete passes are passes we’d expect most players to see cut out: likely risky passes into the box. There are some incomplete passes outside the box, but that graph certainly makes him stand out to me.
For comparison’s sake, César Gelabert also has a high tackles-to-recoveries ratio, at about 1 tackle every 2.5 ball recoveries. Gelabert really seems like the type of player I like to call an “enabler”: a player who is very solid at several things and allows the players around him the freedom to play how they like, as he can both get up the pitch and impact play, or stay a little deeper and swing the ball side to side, as well as help a cover for a fullback that likes to get high up the pitch.
Of course, everything I have said in this article are just my thoughts from the data profiled in this article. Any thoughts I shared on a player could be very wrong, which is why data can’t be used to fully scout a player. However, data can and should be used to both find several players worth your time and get a sense of their general profile so you don’t go into watching them blind (although some may disagree with that and prefer going into watching a player with zero info about their profile, which is perfectly fine).
I hope you enjoyed reading this as much as I enjoyed writing this and finding these 4 players. I’ve given myself some homework, of course, to watch a few games of these players and a) see if these players do look as promising on the pitch as they do in the data, and b) check my understanding of data to see if the general profiles I’ve put these players into are in fact decent profiles of these players.
So, if you skipped to the end of this article and only want to know the outcome, I recommend checking out these four U24 central midfielders in LaLiga SmartBank:
- Enzo Loiodice (22, Las Palmas)
- Pedro Díaz (24, Sporting Gijón)
- Iván Gil (23, FC Andorra)
- César Gelabert (22, Mirandés)
Header image courtesy of LaLiga
One thought on “Using Data to Find Interesting Young Midfielders: 22-23 La Liga SmartBank”
[…] I won’t discuss my exact method for everything in this article, as I already wrote that up in my first data scouting article. […]