## Can Data Science help Novak Djokovic?

There is no doubt about it!

Tennis fans are well aware that man who is recognised among the all-time tennis greats has been suffering with quite drop of form in last year that culminated with quite painful quarterfinal loss in Roland Gaross tournament, of which he was defending champion.

Tennis pundits and fans alike are all trying to figure out Novak’s dramatic and persistent loss of form, and probably nobody more than Djokovic himself. Is it mental, is it physical – is it both? And what is driving it?

Well, I would say that nothing could be potentially more valuable to him right now if he wants to regain his old self – then a personal data science project!

What do I mean?

Let’s do a thought experiment to explain how at least in theory Novak could benefit from these emerging technological field.

Let’s imagine that in last 3 years Novak Djokovic had tennis sensor in his racquets, that would captured many important variables such as speed of the swing, type of the swing, place on racquet where ball was hit.. Couple that with some other important tennis statistics such as unforced errors, % of first serves in, speed of first and second serve and all the other relevant tennis statistics for sets that he played.

So, let's call all these variables “inputs”. Then, there would be another key variable which could be simple indicator that would tell if specific set was won or lost - and that variable we call "target" variable. So, imagine we are able to construct such data set in spreadsheet format where we have “rows” which would be specific sets characterised by the columns of which most of them are “input” variables, and one would be “target” variable with values "win" or "loss".

So, idea is to unleash algorithm on such data in order to describe what values of which variables are present in larger volumes (data pattern) when set is won, and what are the values when set is lost.

So, we could produce descriptive model that would show:

- What are the most important variables that lead to win or loss and potential interaction between them
- What are the underlying data patterns that lead to win or loss

Data pattern here being defined as the significant number of occurrences of specific values of specific variables that led to specific outcome (win or loss).

While, it is well known among tennis coaches and pro’s that you can’t expect to go deep in any tournament if your 1^{st} serve % is below 60%, or if your forehand or backhand speed drops below 80mph speed in average in set (for males) – what is NOT known are specific quantitative patterns for specific player when he wins and when he loses, as well as their ranking order of importance - over longer time. Anyone can have bad day and so evaluating accumulated data over longer time, last year or two of play is crucial to make we extracting real signal and not noise.

Let me give you specific example:

Algorithm could extract the rule based on underlying data that says “in 80% of losing sets average forehand speed is less than 80mph, and average combined average 1serve and 2^{nd} serve is below – 170km per hour". Another rule could be “in 71% of losing sets serve % drops below 60% and number of backhand and drop-shot related unforced errors is bigger that 15%”. First rule may suggest that there is general significant drop in speed of serve and speed of forehand that contribute to loosing sets, and if these factors are prevalent for loosing sets in more recent times – there could be trend. In case of second rule – maybe focusing on serve, backhand and drop-shots would be what is needed during training sessions. Also, first rule should more prioritised as “loss factor” because it more prevalent in loses.

Furthermore, data scientist could filter these rules not only based on time, but based on opponent or playing surface and then even more actionable information could be derived. But deeper you want to go – deeper layer of data you need to have to support your analytical journey.

When we look specific set and player wins - it can be fairly obvious what parts of the game he has done better than his competitor. But, if we bring 500 sets of which maybe 400 are won and 100 are lost – only way to extract these general patterns of win or loss that are based on some quantifiable metrics are with some powerful technologies and skills to use them.

In other words, using data science could give someone like Djokovich true reflection of which constellation of tennis mechanics of play are associated when he wins a set and which are those when he loses. Patterns that lead to win are "desirable" patterns that needs to be nurtured and supported, while loss patterns need to be broken down.

Another benefit of this approach is that after gaining an understanding of key loss factors - bigger picture can emerge that may point to overall drop of strength, speed, skill in certain areas of game and this can give enough clues to his coaching staff to design specific training regime that addresses all the right pain points and modify player game that maximises chances of desired outcome which is to win, and then overlay it with specific strategy for specific opponent.

And while presence of win or loss patterns are present in game of any player that wins or loses - what is completely unique is how all that fits in combination, and that is what gives unique footprint of a loss (or win) to the specific player, and that is where data science approach has no alternative other than rely on memory and hunches, which can be inaccurate and misleading. Data patterns extracted by advanced analytical methods that data scientist can evaluate, compare test their predictive potential are far more powerful in telling the story of what is really behind wins or losses.

And these same principles are applicable in any other sporting or non-sporting situation that has different sets of outcomes. Once we are able to use data science to help us to better understand what set of factors lead to which outcomes – we can then work on factors that lead to desired outcome and work on countering factors that lead to outcomes that we don’t want.

And good luck to Novak and hope he regains his former best!

An article by Goran Dragosavac