Maciej Zieliński

Data Scientist at Forecast

FIFA World Cup 2022 | Maciej Zieliński

FIFA World Cup 2022

Every four years 32 teams from six confederations compete in the FIFA World Cup. These confederations include: the Asian (AFC), African (CAF), North/Central American and Caribbean (CONCACAF), South American (CONMEBOL), Oceanian Football Confederation (OFC) and European (UEFA) Football Confederations. The tournament consists of a group stage in which 48 matches are played and a knock-out phase in which 16 matches are played. Historically, these tournaments are attended by about three million spectators of which half a million need lodging in the host country.

Image

Keywords: operational research, mathematical modelling, optimisation

Below I present only the most interesting parts of the project. The whole project repository is available here.

1. introduction

The tournaments are usually hosted by UEFA or CONMEBOL countries with large tourism sectors, however, in 2022 the event will be hosted in Qatar which has very little tourism infrastructure. The aim of this report is to develop a methodology for estimating lodging requirement based on the previous work done by Ghoniem et al. (2017)1. The paper was written by researchers of the University of Massachusetts, the University of Qatar and the Qatar Tourism Authority with funding from the Qatar Government. Their overall method consists of two integer programming models and a further calculation-based model to determine the lodging requirement. The model only looks at the group stage as this is when lodging requirements are highest due to the amount of foreign spectators in Qatar.

2. Methodology

2.1 Framework for attendance analytics

As the match attendance and therefore overall lodging requirements are based on the quality and popularity of nations that qualify and the capacity of the stadiums in which they play. First, we need to establish which countries are most likely to qualify for the FIFA 2022 World Cup. In the report we assume that countries with the highest FIFA ranking as of February 2021 (respecting the required proportion of teams involved from each confederation) are the ones that will qualify, plus the host nation which qualifies by default.

2.2 Group Formation Model

Our first task is to sort the teams into groups to determine who will play who in the group stage of the competition. The constraints are that we can only have one team from each confederation per group - with the exception of UEFA, who can have two teams per group, and that there is only one team from each pot in each group. Teams are sorted into pots by way of their FIFA points total - in other words, every team in pot 1 has more FIFA points than every team in pot 2 and so on. We aim to maximise the total number of FIFA points in each group. Ordinarily, teams are sorted randomly into groups (subject to the constraints) but optimising in this way gives a worst-case scenario in terms of accommodation provision as more competitive games will likely attract the largest number of spectators.

Mathematically, we can introduce the following notation.

Then the optimisation problem becomes:

\[\begin{align} \max w & \mathrm{\;\;\;\; s.t.} \\ w \leq \sum_{i\in N}p_ix_{ig}, & \;\;\;\; \forall g \in G && (1)\\ \sum_{i\in N}x_{ig}=4, & \;\;\;\; \forall g \in G && (2)\\ \sum_{g\in G}x_{ig}=1, & \;\;\;\; \forall r \in R && (3)\\ x_{i_1g}+x_{i_2g}\leq 1, & \;\;\;\; \forall g \in G; i_1,i_2\in P_k && (4)\\ \sum_{i\in N}x_{ig}c_{ij}\leq \kappa_j, & \;\;\;\; \forall g \in G, j \in C && (5)\\ w\geq0 & && (6) \end{align}\]

The objective function along with constraint (1) ensures minimum total FIFA points in agroup is maximised, creating groups of similar strengths. Constraint (2) ensures each group contains four teams whilst constraint (3) ensures each team is only assigned to one group. Constraint (4) means no two teams from the same pot can be in the same group and constraint (5) enforces the confederation constraints. Constraint (6) is simply a non-negativity condition.

2.3 Group-Letter Assignment Model

The purpose of Group-Letter Model is to spread matches predicted to be most popular across different stadiums while ensuring that more popular matches are assigned to stadiums with higher capacity. Following the paper’s methodology, we introduce the match popularity index. It is described in a greater detail in the full report.

2.4 Foreign spectator attendance and lodging requirements

The final model brings everything together in producing a concrete prediction for how much lodging is required. We must estimate how many foreign individuals are likely to attend each game, and from that work out how many beds will be needed each night. We first introduce two seat allocation parameters: $\alpha$ is the proportion of seats reserved for officials and $\beta$ is the proportion of the remaining seats offered to each of the two nations involved and to other nations. For the purposes of our calculation we will use $\alpha=0.09$ and $\beta=0.12$ as done by Ghoniem et al. (2017).

3. Conclusion

The paper was written to assess the preparedness of Qatar’s lodging capacity in anticipation of the Qatar World Cup 2022. The paper describes a three part model, which optimises team groupings, fits these to a match schedule and produces an estimate of foreign spectator attendance. The paper analyses 16 different scenarios of 32 qualifying nations and three levels for the spectator index and probabilities of extended stay of foreign spectators. This results in 67,000 required rooms averaged over the scenarios, which is 7000 above the minimum of 60,000 as recommended by FIFA. This requirement increases as spectator index and probability of extend stay increase. In our model we updated the qualifying teams and considered the teams who are most likely to qualify. Following the paper’s methodology we arrive at a maximum of 66,880 rooms required. This is similar to the paper’s estimate.

We believe that the paper is overestimating the lodging requirement at great cost. In the first part of the model the group formations are optimised to create groups with close to equal strength. The second model assigns the most popular matches to the largest stadiums. This creates a “worst-case scenario’’ in terms of lodging capacity needed. In reality, groups are formed semi-randomly and it is improbable that this exact scenario will unfold. Secondly, we have identified multiple ways Covid-19 could impact the capacities of the stadiums. We think it is a risky assumption that Covid-19 will not impact the event at all (in terms of foreign attendance). Therefore, we think Qatar should take into account the probability that stadiums will not be able to run at full capacity or that certain nations may not be able to attend the event. Lastly, we think the spectator indices which are assigned to all nations are determined by a possibly invalid methodology. This methodology contains old data, badly justified assumptions and estimates by the author. This will contribute to general inaccuracy of the models. We think that reviewing this methodology and by taking into account the randomness of the group assignments and the impact of Covid-19, a more accurate estimate can be determined to assess the preparedness of Qatar’s lodging capacity.

For the full model description, please see the full report here.


  1. Ahmed Ghoniem, A. Ali, Mohammed Al-Salem, and Wael K. M. Alhajyaseen. “Prescriptive analytics for FIFA World Cup lodging capacity planning”. In: Journal of the Operational Research Society 68 (2017), pp. 1183–1194.