Before the start of the current Brazilian Championship at the end of May, most commentators stated that the favorites to win were the teams Atlético Mineiro, Corinthians and Internacional, due to the larger number of great players they had contracted. The team Cruzeiro had also hired reinforcements, but few expected it to be leading the championship by the end of September, followed by Botafogo, Atlético Paranaense and Grêmio. The point difference between the highest placed teams, however, was small. Everything could change by the end of the tournament, in December.
“We know that the best team doesn’t always win,” says physicist Roberto da Silva, at the Federal University of Rio Grande do Sul (UFRGS). “Soccer has a high degree of randomness.” With the help of some UFRGS colleagues, Silva created a computational model that generates virtual championships with statistical properties identical to those of national championships decided based on total points, such as the Brazilian, Spanish and Italian championships. Their findings, published this year in Computer Physics Communications and Physical Review E, suggest that the differences in the skills of the teams are important, but that randomness is what really dominates soccer dynamics.
“No one gets rich betting on soccer,” says Silva, who was born in Mauá, in the São Paulo metropolitan area, and who is a long-time fan of the Santos football club. “The number of upsets, in which the worse team wins, is enormous.” He justifies this claim by citing a study published by British statisticians. They showed that sports commentators for three UK newspapers correctly guessed only 42% of the outcomes for 1,700 Premier League games. Those who always bet on the home team, whose chance of victory tends to be slightly higher because of its familiarity with the field and the pressure of fans, would get 47% correct.
Silva and his colleagues analyzed score tables from five years of the Brazilian championship and several European national championships disputed between 2006 and 2011. Since 2003, the Brazilian championship has used the cumulative point system, like the Europeans. There are no qualifiers: each of the 20 participating teams plays each opponent twice—once at home and once away. Teams earn three points for every win, one point for a draw and no points for a loss. Each team plays a total of 38 matches. The team with the most points at the end of the tournament wins.
“I searched the physics literature for a similar random phenomenon,” Silva explains. One of the simplest random processes is the diffusion of molecules of a solute in a solvent, such as when a pinch of sugar dissolves in a cup of water. Silva tried to describe the evolution of team scores with the same equations that model the motion of molecules in diffusion. In this first model, each team was a molecule. The displacement of each molecule corresponded to the step-wise advancement of teams throughout the season, which could occur in three ways: defeat, draw or victory.
It did not work very well. Silva noted that the difference between the total scores of the teams tended to increase faster than projected by the simple diffusion model. Actually, the evolution of the scores had the characteristics of what physicists call superdiffusion. That was the sign that the premise of the simple model—that team performance remained constant throughout the season—did not match reality. “Superdiffusion occurs when the odds of winning and losing change over time,” he explains. “The teams change: players get hurt, new players are hired and coaches are fired.”
One Sunday, while playing video games with his 8-year-old son, Silva thought of a way to incorporate these changes into his model. Like in soccer video games, the teams in Silva’s model are now represented by a number that measures the group’s ability, or its potential for winning a match. The results of the games are still determined randomly, but the probability of a team winning or losing depends on the potentials of both teams. Thus, the winner of a match increases its potential, while the loser’s potential decreases. In a tie, the potential of the teams remains constant.
Virtual versus Real
With this adjustment, the model worked better. He was able to accurately simulate the cumulative statistics for five Brazilian championships in a row, but not for other tournaments. The scores of the virtual tournaments did not match those of the European leagues, especially the Spanish and Italian championships.
It was not hard to find an explanation. The model assumed that all teams started the championship with the same potential for winning games. Since 2003, six different teams have won the Brazilian championship. Although there will always be favorites, no Brazilian team stands out from the others for very long because really good players are quickly sold to foreign teams. Spain is different. The two top teams—Barcelona and Real Madrid—have a much higher goals-per-game average than other teams and one or the other almost always wins the championship. The same happens in Italy with Juventus, Milan and Internazionale. The model only worked for all countries when Silva included this initial difference, setting the initial potential of the teams based on each team’s goals-per-game average during the prior season.
For now, the model only reproduces team ranking evolution in a very general sense, but Silva and his colleagues hope to track an individual team and simulate its performance, assessing its chances of winning the championship.
Physicist Haroldo Ribeiro, of the State University of Maringá, has also been observing superdiffusion in his analysis of soccer, cricket and chess matches. “There is still a lot to look into,” he says. “We can answer questions that sports fans may be wondering about or justify the statements they often make without a scientific basis.”
Scientific articles
SILVA, R. et al. Anomalous diffusion in the evolution of soccer championship scores: Real data, mean-field analysis, and an agent-based model. Physical Review E. v. 88, n. 2. ago. 2013.
SILVA, R. et al. A simple non-Markovian computational model of the statistics of soccer leagues: Emergence and scaling effects. Computer Physics Communications. v. 184, n. 3. Mar. 2013.