Choosing opponents in skiing sprint elimination tournaments

Anders Lunander; Niklas Karlsson

doi:10.1515/jqas-2021-0027

Article Open Access

Choosing opponents in skiing sprint elimination tournaments

Anders Lunander and Niklas Karlsson

Published/Copyright: July 11, 2023

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Journal of Quantitative Analysis in Sports Volume 19 Issue 3

Abstract

In this study we analyse data from world cup cross-country skiing sprint elimination tournaments for men and women in 2015–2020. Instead of being assigned a quarterfinal according to a seeding scheme, prequalified athletes choose themselves in sequential order in which of the five quarterfinals to compete. Due to a time constraint on the day the competition is held, the recovery time between the elimination heats varies. This implies a clear advantage for the athlete to race in an early rather than in a late quarterfinal to maximize the probability of reaching the podium. The purpose of the paper is to analyse the athletes’ choices facing the trade-off between recovery time and expected degree of competition when choosing in which quarterfinal to compete. We find empirical support for the prediction that higher ranked athletes from the qualification round prefer to compete in early quarterfinals, despite facing expected harder competition. Nevertheless, our results also suggest that athletes underestimate the value of choosing an early quarterfinal. In addition, we propose a seeding scheme capturing the fundamental disparity across quarterfinals using the estimates from alogistic regression model.

Keywords: choosing opponent; elimination tournament; seeding; skiing sprint

1 Introduction

At the end of the ski-season 2014/2015, the International Ski Federation (FIS) changed the design of their elimination tournaments in cross-country skiing sprint for men and women. Instead of seeding 30 prequalified athletes into five quarterfinals, based on their ranking from the initial qualification round, the FIS today applies a method where the prequalified athletes sequentially choose one of the five quarterfinals themselves, according to a certain order based on their ranking in the qualification round. The main motive behind the organisation’s decision to introduce this novelty was to internalize the positive effect that athletes potentially obtain from having a longer recovery time before a future final if they compete in one of two early quarterfinals. The previous method used to distribute the 30 prequalified athletes across the five quarterfinals could best be described as a standard seeding method (see Marchand 2002). The fastest athlete from the qualification round was assigned the rank number one, the second fastest athlete was assigned rank number two, and so on. The athletes were then distributed across the five quarterfinals according to a predetermined scheme such that the sum of ranks of the six athletes in each quarterfinal was 93, i.e., a balanced seeding.

However, a look at the result lists for the contests in seasons 2009/2010–2014/2015 shows an unequal distribution of medallists across the five quarterfinals. For men, about 55 % of the medallists were assigned one of the two early quarterfinals whereas only 24 % of the medallists had competed in the two late quarterfinals. The corresponding numbers for women exhibit a similar pattern, albeit less skewed; 46 % and 38 %, respectively. Clearly, being seeded into one of two early rather than into one of two late quarterfinals, gave the athlete a greater chance to reach the podium. In the current format, where athletes choose their quarterfinals, the choice will be a trade-off between the length of recovery time prior to a final and the degree of expected competition in a quarterfinal. Choosing one of two early quarterfinals means – in the case of advancement from the quarterfinal and from the subsequent semifinal – a longer recovery time prior to the final, increasing the win probabilities in the final against those athletes with a shorter recovery time. However, the choice of an early quarterfinal likely means tougher competition, lowering the athlete’s chances to advance to the next round.

The change to the current design, from a seeding scheme to partly choosing the opponents, brought an additional tactical dimension and a complex optimization problem into the elimination tournaments. Given the athlete’s objective function, the optimal choice of quarterfinal requires information about the strength of the others to calculate the win probabilities. Also, the individual’s optimal choice will not only be based upon the observed choices of those having already made their choices, but also be conditioned upon the expected choices of the athletes to follow. Moreover, the sequential choices of quarterfinals take place within a limited period of time after the prequalification race, about 2 h. To get around the computational burden and the need to integrate all available information in the decision, the athletes are likely to rely on simple rules when choosing their quarterfinals.

In this paper, we analytically and empirically investigate decision making in these ski-sprint elimination tournaments using data from the competitions held 2015–2020. Our analysis is carried out in four steps. To gain an understanding of the complexity behind the optimal choice in this type of decision making, we first consider a simple elimination tournament model with two rounds – semifinals and final – and four players. A player increases the probability of winning the final by competing in the first of the two semifinals, ceteris paribus. This asymmetry across semifinals aims to capture the athlete’s trade-off between recovery time and expected degree of competition when making her decision. The winning probabilities are exogenous in our model, that is, players’ exerted effort is not strategically allocated across the two rounds. This is in line with our empirical observations where the duration between each round is assumed long enough for complete recovery, except between the second of two semifinals and the final. This shortage in recovery time induces the player to act strategically when choosing between matches, not when choosing exerted effort.^[1]

Our model predicts that higher ranked players tend to choose the first semifinal. In a second step we test if our derived prediction is consistent with the observed athletes’ choices of quarterfinals by testing the null hypothesis that athletes choose quarterfinals randomly. In order to carry out this test, we propose a test statistic suitable for this purpose. Our third step is to develop and apply a logistic regression analysis to examine whether athletes make choices consistent with the objective of maximizing the probability of reaching the podium. We condition the probability of reaching the podium on an athlete’s choice of quarterfinal and her capacity in terms of ranking in the prequalification round, results in previous contest and long-term ranking.

Based on our estimated results from the logistic regression analysis, we propose in our fourth step a new innovative unbalanced seeding scheme aiming to remove the tactical dimension and the computational challenges from the contest but still captures the athlete’s advantage of being assigned one of the two early quarterfinals. Our design of a new seeding scheme contrasts with the ideas put forward in Guyon (2022), who suggests the use of a choose-your-opponent mechanism in the knockout stage which follows the group stage in many tournaments (e.g. FIFA World Cup, UEFA Champions League, NFL). The author sees a benefit of adding this strategic element to these competitions, which could bring new dynamics in the tournaments and attract a lot of media attention. However, it is one thing to optimally choose a single opponent from a set of competitors, whose qualities and tactics to a large extent can be foreseen, but it is a different thing to optimally choose multiple competitors, several of which are unknown, or at best only can be expected when making the choice. This demands skills going beyond what is required at the most to be successful in skiing-sprint.

To our best knowledge, no study has yet been conducted on modelling this type of tournament design, or on using contestants’ observed choices to come up with a seeding scheme.

Our paper is organized as follows: the skiing sprint competitions and the procedure to choose quarterfinals are presented in Section 2. Section 3 provides a literature overview. The model is given in Section 4, followed by a description of the data in Section 5. The empirical methodology and the outcome of our tests are presented in Sections 6–8 and, finally Section 9 presents the conclusions.

2 The skiing sprint competition

The skiing sprint competitions in the cross-country World Cup are held about twelve times per season at various places in Europe, Russia, and Canada, each competition stretching out only for one day. Men and women compete in separate classes. The competition begins with a prologue, a qualification round, where the 60 to 80 athletes ski a course which has a length of about 1.5 km, each athlete starting at about 15 s intervals. The 30 fastest athletes qualify for the five quarterfinals, with six athletes in each quarterfinal. These knock-out races begin about two to three hours after the qualification round. The athletes who come first and second place in the first two quarterfinals, are placed in the first semifinal, and the athletes coming first and second place in the last two quarterfinals, are placed in the second semifinal. The winner of the third quarterfinal, is placed in semifinal one while the athlete at second place goes to semifinal two. In addition to these top-ten athletes from the five quarterfinals, the two athletes with the best times of the athletes ending up at place 3–6 in the quarterfinals (the lucky losers) are also qualified for the semifinals. The faster of these two is placed in semifinal one while the other athlete is placed in semifinal two. Hence, an athlete winning the third quarterfinal or being the fastest lucky loser, is rewarded. She will reach semifinal one – which implies a longer recovery time in case of advancing to the final – without facing the potentially harder competition in the first two quarterfinals.

Finally, the top two athletes in each semifinal together with the two lucky losers from the semifinals are qualified for the final. All finals are run on the same course as the qualification round and mass start is applied in all elimination races. The format of the competition is identical for men and women who run the races alternately. The 30 athletes qualifying for the quarterfinals are awarded World Cup ranking points, conditioned on their performance in subsequent elimination rounds. The timing of the competition during a day is illustrated in Figure 1.

Figure 1:

The timing of skiing sprint competition.

The sequential ordering of the quarterfinals and semifinals means that the recovery time for those athletes advancing from one stage to the next stage will differ. The recovery time between the quarterfinals and the semifinals is about 20 to 30 minutes, which is considered long enough not to affect the performance in subsequent heats. However, whereas the recovery time separating the first semifinal and the final is normally longer than 20 minutes, the time separating the second semifnal and the final is less than 20 minutes. This means that the athletes advancing from semifinal two are disadvantaged. In terms of recovery time, an athlete has no advantage of choosing quarterfinal one before quarterfinal two or vice versa. Advancing as one of the top two athletes from either of these two quarterfinals automatically places the athlete in semifinal one. Also, athletes will benefit from a shorter but equal recovery time whether they choose quarterfinal four or quarterfinal five, since an advancement from both these two races puts them in semifinal two. Hence, it is reasonable to assume athletes being indifferent in their choice between quarterfinals one and two, and in their choice between quarterfinals four and five, respectively. This assumption will later be tested in Section 8.1.

The prequalified athletes choose their quarterfinals according to a predetermined order. The order of choices is based on their ranking numbers from the qualification round, where the eleven fastest athletes first make their choices in descending order, starting with the athlete having ranking number 11 {11,10, …, 2,1}. The remaining 19 athletes then make their choices in ascending ranking order, starting with the athlete having ranking number 12 {12,13, …, 29,30}. The choices are revealed immediately, that is, an athlete who is about to choose her quarterfinal knows about the previous choices.

3 Literature

Our work can be added to the list of studies in operational research, analysing tactics and strategy in sports (Wright 2009). Adopting the classification in Wright (2014), the first part of our paper − the modelling and the empirical examination of the athletes’ choices of quarterfinals − is a post hoc analysis of a tournament rule change, whereas the second part − the derivation of an unbalanced seeding scheme − represents a proposal for changes to tournament rules, backed up by the analysis in the first part.

The number of studies analysing the impact on behaviour and outcome when teams can choose their opponents in elimination tournaments is limited. Guyon (2022) refers to a handful of tournaments in various sports and games in which this method has been applied (e.g. hockey, rugby, bridge, chess) but very little of analysis seems to have been documented. The method is though currently applied in the Austrian Hockey League, a league currently consisting of twelve teams from Austria, Hungary, Italy and Slovenia. This tournament is first played in a double home-away round robin format, that is, the teams play against each other four times. The teams in first to sixth place are qualified for the quarterfinals whereas two of the four teams, placed seventh to tenth, advance to the quarterfinals in a two and two best-of-three-games. The seventh placed team chooses which of the teams placed ninth and tenth to play. The pairing of the eight teams in the quarterfinals are then determined by the three top teams choosing their opponents, where the top team chooses first. The fourth-placed team is not available for selection and will face the team that was not chosen in the pick.^[2] Guyon (2022) proposes a choose-your-opponent design to be considered in major elimination tournaments in sports, especially when the elimination stage is preceded by a group stage where more than one team qualifies from each group. In such cases, a huge advantage of the proposed method is that it removes the incentives of tanking.

There exists all the more a relatively extensive literature on seeding in elimination tournaments. The standard seeding, where top ranked teams are matched against lower ranked teams in the first round, often serves as a reference point when analysing the properties of various design of elimination tournaments (e.g. Dagaev and Suzdaltsev 2018; Groh et al. 2012; Hwang 1982; Karpov 2016; Marchand 2002). The standard seeding is widely used because the draw seeks to delay the confrontation between the tournament’s best teams until the very last rounds, increasing the quality of the matches as the tournament progresses. Our proposed seeding scheme of the five quarterfinals deviates from the standard method in that it is unbalanced, aiming to level out the positive effect of recovery and the negative effect of tougher competition across groups. A related work to our proposed design is Csató (2020a) who analyses the properties of the tournament design used by the European Handball Federation (EHF) Champions League, where the groups were deliberately seeded unbalanced between the 2015/16 and 2019/20 seasons. The EHF tournament consisted of a round-robin group stage followed by a knockout stage. In the group stage, 28 teams were competing in four groups. However, the groups were put together differently, they were of different sizes, and the rules to advance to the knockout stage differed across groups. Using simulations to obtain tournament metrics, the author compares this EHF tournament design to a classical tournament design with seven teams competing in each of four balanced groups. The simulations indicate that the former design generates a higher proportion of matches of high quality, defined as the sum of the playing teams’ pre-tournament ranks, and a higher average pre-tournament ranks in the Final Four.

In the seminal paper Guyon (2015), the author illustrates the weaknesses that may arise when other principles besides balance is guiding the building of groups. Using the data from the draw of the Fédération Internationale de Football Association (FIFA) World Cup 2014, Guyon demonstrates how the use of a geographic separation constraint, when building the eight groups, failed to satisfy the principle of balance. In the same study, Guyon proposes a new draw system for FIFA, which produces groups of similar strength and also satisfies the organisation’s seeding and geographical separation rules. Laliena and López (2019) develop the ideas presented in Guyon (2015) and suggest a draw system with pots producing almost perfectly balanced groups, based on the ranking sum of the groups’ three strongest teams groups instead of the ranking sum of all four teams. Putting the random draw aside when assigning teams to groups, Cea et al. (2020) present an optimization model where the ranking sum and the ranking range across groups are minimized.

The seeding system of the Union of European Football Associations (UEFA) Champions League has also attracted attention in the literature. Like the FIFA World Cup, this tournament consists of a group stage and a knockout stage, where 32 qualified teams are seeded into four pots prior to the random draw of eight groups. Before the season 2015–2016, qualified teams were split into the pots by their UEFA rating, with one exception: the titleholder was automatically seeded in the first pot. When starting the season 2015–2016 the UEFA had changed the seeding arrangements. The national champion from the seven top-ranked associations and the titleholder were seeded in the first pot.^[3] The rest of the teams are seeded as earlier. Taking the teams in 2015–2015 Champions League, Corona et al. (2019) evaluate in a simulation study how this change effects the probability that a team will reach the round of 16 and the final, respectively. They find that the new seeding scheme slightly reduces the probability of the higher ranked teams to reach the knockout stage compared to the previous scheme, while the chances of some of the lower ranked teams to reach the round 16 increase. A similar study is carried out in Dagaev and Rudyak (2019) where the effect of the change in the seeding rules upon the competitiveness in the UEFA Champions League is simulated. Their results show that the change in seeding rules leads to a decrease in tournament quality, in terms of an expected lower rating of the final and of the winner. Csató (2020b) points out that the UEFA Champions League seeding rules from 2015 to 2016 season suffer from incentive incompatibility. A team can be placed in a lower pot if the titleholder happens to represent the same league. In a regression discontinuity analysis, Engist, Merkus, and Schafmeister (2021) analyse match results from the UEFA Champions League and the Europa League to investigate the effect of being seeded upon the probability of advancing from the group stage to the knockout stage. Although higher seeded teams face weaker opponents, their results suggest that a higher seed does not lead to a higher probability of advancing to the knockout stage.

Our contribution to the existing literature on seeding is a result of our empirical analysis of observed behaviour in an elimination tournament, where athletes to some extent may choose their opponents in the first elimination round. Making use of our estimated regression parameters, given our sample of athletes, we propose an unbalanced seeding scheme where the goal is to internalize the inherent disadvantage of being assigned one of the two last quarterfinals in skiing sprint competitions. However, in contrast to individual seeding, our proposed seeding does not indicate in which of the quarterfinals a ranked athlete is assigned. Instead, it stipulates what the athletes’ ranking number from the qualification round should sum up to in each of the five quarterfinals.

Our study is also related to the body of literature in physiology on the effects of recovery duration in cross-country skiing sprint. To bring down the accumulation of fatigue, that is, to reduce the concentration of blood lactate, the recovery time between the heats is essential. A comprehensive survey on factors influencing the performance of skiing sprint is provided by Hébert-Losier et al. (2017). The outcome from experimental tests, reflecting the format of skiing sprint competitions indicates that the recovery time should be about 20 min to fully ensure that a break does not impact the athlete’s performance in a subsequent heat (e.g. Vesterinen et al. 2009; Zory et al. 2006). Moxnes and Moxnes (2014) develop a mathematical model showing how anaerobic portion of total energy depends on time. Given a racing time around 3 min and 20 s, they find that if the recovery time is below 20 min, then performance in a subsequent heat will deteriorate.

4 Modelling an optimal choice

In this section we set up a simple elimination tournament model where players can choose their opponents. To make the model computationally tractable, we consider a tournament with two rounds – semi-finals and final – and head-to-head competition in each round, that is, the number of players is limited to four players. The purpose with the model is to analyse how a player’s choice between two alternatives is affected when varying two parameters, capturing the degree of players’ competitiveness and recovery duration. Our analysis exposes the complex optimization problem the athletes in skiing sprint are facing under the current mechanism.

The four players, i = A, B, C, D compete head-to-head in two semi-finals, the first semifinal (s.1) and the second semifinal (s.2). The winner of each semifinal advances to the final. The objective of each player is to win the final. The players are of two types: high ranked players (H) and low ranked players (L). Players A and B are assumed to be of type H whereas players C and D are of type L. The players choose, in sequential order, which one of the two semifinals to compete in. The order of the choices is: A, B, C, D.^[4] A player’s choice of semifinal becomes public prior to the next player’s choice.

Hence, there are six possible settings j = 1 ⋯ 6 of the semifinals, including the mirroring of identical plays, albeit in different order. Figure 2 illustrates the decision tree of our model.

Figure 2:

The sequential order of choices and possible settings of semifinals.

We denote the player’s probability of winning the whole tournament as p_i,j. For example, p_C,4 denotes the probability that player C will win the tournament given that she faces player B in semifinal 1. A player of type H will beat a player of type L with probability p > 0.5, and if two players of the same type compete, the probability is 0.5 to win against the other. To capture the effect of having a shorter recovery time when the player advances into the final from the second semifinal, we multiply that player’s probability of winning the final with a constant c, where 0 < c < 1. The lower value of c, the larger is the negative effect – henceforth the recovery effect – of having advanced to the final from the second semifinal rather than from the first semifinal. Thus, even though semifinal settings j = 1 and j = 6 imply identical plays, we have p_A,1 = p_B,1 > p_A,6 = p_B,6 and p_C,1 = p_D,1 < p_C,6 = p_D,6 due to the recovery effect c. The players are assumed to have full information on the values of the probabilities defined above, as well as the value of c.

Proposition 1

Player A will always choose the first semifinal.

Player B will choose the first semifinal if and only if

(1) c < f p = − 0.5 p 3 − 1.5 p 2 + 0.5 p − 0.5 .

Otherwise, player A will play against player C in the first semifinal.

Proof

See Appendix A.

In Figure 3 we illustrate the result graphically. Points below the convex graph indicate combinations of levels of the recovery effect and the probability p for which player B chooses to compete in the first semifinal. For values of c up to about 0.91, the degree of competition from low ranked players has no effect upon player B’s choice. Player B might choose to avoid player A in the first semifinal for higher values of c. For example, for c = 0.95, shown as a horizontal line in the figure, B will choose semifinal 2 for values of p in between 0.61 and 0.94, rounded to two decimal places.

Figure 3:

Player B’s choice of semifinal for various combinations of p and c.

To understand the mechanism behind B’s choice for high values of c once A has chosen semifinal 1, we initially assume that c = 1 and p = 0.5. Thus, no recovery effect is assumed, and competition is equalized across all players. This combination of c and p obviously makes player B indifferent between the two semifinals. Now, letting p increase, still assuming no recovery effect, B will choose semifinal 2 to have a positive probability of avoiding player A in the tournament, now the single most competitive opponent. Thus, as p increases, for B to be indifferent, a compensation in terms of a decrease in c is necessary, and we are moving downwards along the graph of the convex function in the figure. As p further increases, holding c fixed, there is a positive effect on the incentive for B to choose semifinal two in that the importance of avoiding A in the tournament increases due to a decrease of the relative competitiveness of type L players. However, there is also a negative effect, since the probability increases for B of ending up weakened in the final with type H player A. For p about 0.79 these opposite effects cancel each other out. This negative effect outweighs the positive effect for larger values of p, meaning that we are moving upwards along the graph for indifference. For a further increase in p, B needs to be compensated by an increase in c, i.e., a lower recovery effect, to still be indifferent and not choosing semifinal 1. For values of p close to 1 player B expects to face player A in the final with almost certainty, both advancing from different semifinals. Player B is then better off playing against player A already in the first semifinal, unless the recovery effect is negligible, i.e., c is close to 1, making B indifferent.

Corollary

At least 50 % of the high ranked players will choose semifinal 1.

Proof

It follows immediately from Proposition 1.

The result from the corollary will be extrapolated to the actual skiing sprint tournament with five quarterfinals, in that we expect high ranked athletes to be overrepresented in the first two quarterfinals, while underrepresented in the last two quarterfinals.

5 Data

Official results for all World Cup skiing sprint competitions as well as data on individual athletes are collected from the International Ski Federation’s website Fédération Internationale de Ski, FIS (2020). The data include the results from 56 competitions – for men as well as for women – during six seasons, 2014/2015–2019/2020. For each of these competitions, we have data on the 30 athletes competing in the quarterfinals, in total 3360 observations, equally divided in men and women. For each athlete in every competition, we observe the choice of quarterfinal, the achieved result, and the official ranking. Table 1 shows the average rank sums, rounded to integer values, for different quarterfinals divided by season. The rank sums for early quarterfinals are low compared to late quarterfinals, indicating stronger competitions in the first two quarterfinals.^[5] The pattern is quite stable over time, except for the latest season where the rank sum for the first quarterfinal is high for men.^[6]

Table 1:

Average rank sums for different quarterfinals by season and sex.

Quarterfinal	Season
	2014/2015	2015/2016	2016/2017	2017/2018	2018/2019	2019/2020	Total
Women
1	61	86	91	85	93	89	88
2	83	87	89	88	87	90	88
3	102	95	95	92	96	92	94
4	102	95	92	96	91	97	94
5	117	102	98	104	98	97	100
Men
1	86	89	87	89	83	96	89
2	88	90	87	84	86	91	87
3	91	86	92	101	94	93	93
4	92	97	93	93	95	91	94
5	108	103	106	98	107	94	102

In Table 2 we show the share of medallists coming from each of the five quarterfinals by season.

Table 2:

Share of medallists from the quarterfinals by season and sex.

Quarterfinal	Season
	2014/2015	2015/2016	2016/2017	2017/2018	2018/2019	2019/2020	Total
Women
1	0.17	0.31	0.37	0.40	0.28	0.30	0.32
2	0.33	0.31	0.27	0.33	0.25	0.33	0.30
3	0.33	0.06	0.13	0.20	0.25	0.07	0.15
4	0.17	0.22	0.10	0.03	0.14	0.10	0.12
5	0.00	0.11	0.13	0.03	0.08	0.20	0.11
Men
1	0.17	0.33	0.30	0.53	0.58	0.50	0.44
2	0.50	0.36	0.13	0.23	0.19	0.17	0.27
3	0.17	0.17	0.20	0.13	0.06	0.13	0.14
4	0.17	0.03	0.13	0.03	0.14	0.17	0.10
5	0.00	0.11	0.03	0.07	0.03	0.03	0.05

Tables 1 and 2 indicate that higher ranked athletes are more inclined to choose one of the two early quarterfinals rather than the late ones, and that the athletes coming from the two early quarterfinals are overrepresented among medallists. This share is even higher under the current design than observed under the previous applied seeding format.

6 Testing for random choice

In this section we propose a test statistic for testing the null hypothesis that the athletes choose quarterfinals in a pure random way against the alternative hypothesis that athletes with a low ranking number from the qualification round (high ranked athletes) choose to a large extent early quarterfinals rather than late quarterfinals, as indicated by the model from Section 4. In Section 6.1 the test statistic to be used is presented. Section 6.2 shows the results from the test.

6.1 The test statistic

The corollary in Section 4 suggests that early quarterfinals should be overrepresented by high ranked athletes, while the opposite holds for late quarterfinals, implying that the sum of the ranking numbers for early quarterfinals is to undercut the corresponding sum of ranking numbers for late quarterfinals.

Define U_k and V_k as the rank sum from the qualification round for the early quarterfinals, one and two, and late quarterfinals, four and five, respectively, for competition k, k = 1, 2, …, 56.

The assumption of the athletes choosing quarterfinals in a pure random way, implies the null hypothesis can be formulated as

H 0 : E U k − E V k = 0 .

This is to be tested against the alternative hypothesis that the expected sum of ranking numbers from early quarterfinals is smaller than for late quarterfinals, that is,

H A : E U k − E V k < 0 .

An appropriate test statistic is given by Z = R ̄ E − R ̄ L 465 14 , where R ̄ E = ∑ k = 1 56 U k 56 and R ̄ L = ∑ k = 1 56 V k 56 .

Proposition 2

The test statistic Z = R ̄ E − R ̄ L 465 14 ∼ appr N 0,1 under H 0 : E U k − E V k = 0 .

Proof

See Appendix A.

Since we reject H₀ in favour of H_A for large negative values of z, the rejection region is R R = z < − z α where z_α is such that P Z > z α = α .

Alternative test statistics could be used. The sign test and Wilcoxon’s matched-pairs signed rank test are two non-parametric alternatives, while a paired t-test is another option where the population standard deviation of difference is estimated. However, since the assumptions under the null hypothesis are fulfilled to apply a parametric test, where the population variance can be derived, the test statistic given in Proposition 2 is preferred in terms of power.

6.2 Results

The two rows in Table 3, also shown as the last column in Table 1, show the average qualification rank sum for each of the five quarterfinals based on 56 competitions. As expected, the rank sum increases with quarterfinals.

Table 3:

Average qualification rank sum for different quarterfinals under the current design, n = 56.

	Quarterfinal
	1	2	3	4	5
Women	88.04	87.93	94.30	94.32	100.41
Men	88.68	87.48	92.93	93.84	102.07

We apply the test statistic outlined in Section 6.1 to test for random choice. We get:

Women : z = R ̄ E − R ̄ L 465 14 = 88.04 + 87.93 − 94.32 + 100.41 465 14 = − 3.26 p < 0.01

Men : z = R ̄ E − R ̄ L 465 14 = 88.68 + 87.48 − 93.84 + 102.07 465 14 = − 3.43 p < 0.01

Our tests reject the null hypothesis, indicating that data supports the alternative hypothesis for both women and men.

7 Testing for optimal choice

The results from Section 6 indicate that high ranked athletes choose early quarterfinals in large part. Thereby, in terms of the probability of reaching the podium, those relatively few high ranked athletes choosing late quarterfinals might get fully compensated for shorter recovery time if reaching the final, thanks to weaker competition in the rounds preceding the final. Thus, the increased chance of reaching the final, could possibly balance the decreased chance of performing on top once in the final. We think of such behavior as optimal, i.e., when the athletes’ choice of quarterfinal as a group makes the probability of reaching the podium irrespective of type of quarterfinal, conditioning on the athlete’s capacity. In Section 7.1 a test for such behavior is proposed, while Section 7.2 shows the results.

7.1 Methodological approach

To test for optimal behaviour of choice, we adopt a logistic regression approach, where an optimal behaviour among the athletes as a group, in terms of balanced proportions of high ranked athletes in different quarterfinals, corresponds to certain parameter restrictions within a model to be presented below. The athlete’s probability of reaching the podium is modelled as a function of her choice of quarterfinal, her individual capacity relative other athletes (short-term, middle-term and long-term capacity), and individual specific effects.

The dependent variable to be used is Podium, a binary variable taking the value one if podium is reached, zero otherwise. The first type of explanatory variables considers quarterfinals, where a dummy variable is used to indicate the choice of one of four quarterfinals Q 1 , Q 2 , Q 4 , Q 5 . Quarterfinal three serves as reference.

The second type of explanatory variables captures the athlete’s capacity relative other athletes for different time perspectives. These variables are (i) Rankqual: The athlete’s ranking number from the qualification round (short-term capacity); (ii) Rankqualsq: Rankqual squared; (iii) Podium_1: A dummy variable indicating if the athlete reached the podium in the latest competition (middle-term capacity); (iv) Rankwcp: The athlete’s ranking number among the 30 quarterfinalists based on the current World Cup sprint points achieved in previous World Cup sprint competitions in the current season (long-term capacity). Again, the lower the athlete’s ranking number, the higher is the athlete’s capacity; (v) Rankwcpsq: Rankwcp squared.

In addition, we also include individual specific dummy variables (I₁, I₂,…,I_m) for m out of those m + 1 athletes having variation in the dependent variable (m = 29 for men and m = 30 for women). Leaving out subscript for individual athletes, the linear predictor can be written as

(2) η = β 0 + β Q 1 Q 1 + β Q 2 Q 2 + β Q 4 Q 4 + β Q 5 Q 5 + β Rankqual R a n k q u a l + β Rankqualsq R a n k q u a l s q + β Rankwcp R a n k w c p + β Rankwcpsq R a n k w c p s q + β Podium_ 1 P o d i u m _ 1 + β I 1 I 1 + ⋯ . + β I m I m ⋅

The parameters to be estimated using the method of maximum likelihood are based on a total of 1680 observations for both men and women (56 competitions with 30 qualified athletes in each competition). We expect the ranking number from the qualification round, as well as the ranking number based on current World Cup sprint points, to have a negative declining effect on the probability of reaching the podium. That is, the negative effect on the probability of reaching the podium, when ending up 14th instead of 13th in the qualification round, is small, compared to the corresponding probability of ending up third instead of second. Thus, β_Rankqual and β_Rankwcp are both expected to be negative, while β_Rankqualsq and β_Rankwcpsq are expected to be positive. Furthermore, a good performance in the previous competition is expected to have a positive effect, implying β_{Podium_1} should have a positive sign.

Now, those relatively few high ranked athletes, choosing a late quarterfinal, expect to be fully compensated for the shorter recovery time prior to a final thanks to expected weaker competition in the late quarterfinal. Such a behaviour can be described as four parameter restrictions in the specified model, formulated in the null hypothesis below:

H 0 : β Q 1 = β Q 2 = β Q 4 = β Q 5 = 0

H A : A t l e a s t o n e o f β Q 1 , β Q 2 , β Q 4 a n d β Q 5 i s n o t e q u a l t o z e r o .

The alternative hypothesis corresponds to choices where the probability of reaching the podium, conditioned on the capacity of the athlete, differs for at least one of the five quarterfinals. The null hypothesis is to be tested with a Wald test, which approximately follows a chi-squared distribution with four degrees of freedom under H₀.

7.2 Results

The results from the estimation of three logistic regression models for both women and men are presented in Table 4 with the variable Podium as dependent variable. In the first model only the quarterfinal dummy variables are included. To avoid possible systematic bias in the regression analysis, caused by the season 2019/2020, in which the average rank sum for men for the first quarterfinal was exceptionally high, most likely due to the outstanding performance by the Norwegian athlete J. H Klæbo (see footnote 6), we have excluded this season for men in the analysis.^[7]

Table 4:

Results for logistic regression models, podium as response variable.

Sex	Men			Women
Model variable	(1)	(2)	(3)	(1)	(2)	(3)
Intercept	−2.60*	0.0681	−4.83*	−2.52*	1.18*	−2.85*
	(0.238)	(0.441)	(1.09)	(0.208)	(0.412)	(1.14)
Q ₁	1.30*	1.02*	0.825*	0.846*	0.583*	0.657*
	(0.280)	(0.314)	(0.379)	(0.256)	(0.305)	(0.351)
Q ₂	0.830*	0.554	0.467	0.846*	0.601*	0.661*
	(0.293)	(0.332)	(0.371)	(0.256)	(0.304)	(0.348)
Q ₄	−0.486	−0.135	−0.252	−0.239	0.00630	−0.104
	(0.379)	(0.424)	(0.456)	(0.310)	(0.357)	(0.406)
Q ₅	−0.907*	−0.144	0.0738	−0.411	0.107	−0.116
	(0.431)	(0.496)	(0.550)	(0.323)	(0.366)	(0.442)
Rankqual		−0.277*	−0.245*		−0.198*	−0.244*
		(0.0558)	(0.0697)		(0.0467)	(0.0541)
Rankqualsq		0.00435*	0.00331		0.00290*	0.00430*
		(0.00219)	(0.00278)		(0.00180)	(0.00200)
Rankwcp		−0.0469	0.0194		−0.303*	−0.162*
		(0.0522)	(0.0634)		(0.0517)	(0.0624)
Rankwcpsq		−0.00000214	0.000249		0.00692*	0.00331
		(0.00191)	(0.00232)		(0.00191)	(0.00233)
Podium_1		0.657*	0.219		0.521*	−0.0508
		(0.305)	(0.314)		(0.275)	(0.301)
No quarter final effect (p-value)		0.0010	0.0564		0.100	0.0706
No individual effect (p-value)			0.0036			<0.0005

Standard error in parenthesis; *significant at 5 %.

Model 2 also includes those variables aiming to capture the capacity of the athlete, except for individual specific dummy variables. In the third model individual specific dummy variables are included as well.^[8] Considering the first model, we observe that the parameter estimates on the early quarterfinals attain large significantly positive values for both men and women. The parameter estimates corresponding to late quarterfinals are negative, although not significantly negative. These results imply that the probability of reaching the podium is larger for early quarterfinalists than for athletes going in the third and late quarterfinals. The most likely explanation for these results is the fact that high ranked athletes in the qualification round are overrepresented in early quarterfinals, amplified by the positive effect on performance of longer recovery time between the semifinal and the final for those athletes going in the early quarterfinals. Consider the results from estimation of the second model, where variables to control for the individual’s capacity are included. Except for the variable Rankwcpsq for men, the signs of the parameter estimate for these variables are as expected, although the estimates corresponding to variables concerning World Cup points are not significantly different from zero for men. As can be seen from Table 4, an overall test of no quarterfinal effect, described in Section 7.1, is rejected at the 5 % level of significance for men, whereas for women the effect is close being rejected. The pattern of large positive parameter estimates for early quarterfinals is still observed, albeit not as clearly as before. To summarize, the results indicate that the way athletes choose quarterfinals, an individual athlete has a better chance of reaching the podium going in an early quarterfinal. The results from estimation of the third model, to which individual specific dummy variables are added as well, there is no dramatic change in the parameter estimates.^[9]

8 Seeding instead of choosing quarterfinal

In this section we propose a seeding scheme where the sum of ranking numbers differs across quarterfinals, with low sums attached to early quarterfinals and high sums attached to late quarterfinals. Hence, this sum of ranking numbers will reveal the degree of competition in each quarterfinal. The implication is that higher ranked athletes from the qualification round are likely to be placed in the first two quarterfinals, facing relatively strong competition already in the first elimination round. However, they will then, in case of advancement, be compensated by a longer recovery time before the final. Likewise, high ranked athletes, placed in the late quarterfinals, pay for the weaker competition in their first round with a shorter recovery time prior to a final. Section 8.1 presents the methodological approach, while the results are presented in Section 8.2.

8.1 Methodological approach

We include the variable Rankqualsum in the model set out in Section 7.1 to capture and control for the competition in the quarterfinal, thereby extending the linear predictor in the logistic regression model to:

(3) η = β 0 + β Q 1 Q 1 + β Q 2 Q 2 + β Q 4 Q 4 + β Q 5 Q 5 + β Rankqual R a n k q u a l + β Rankqualsq R a n k q u a l s q + β Rankwcp R a n k w c p + β Rankwcpsq R a n k w c p s q + β Podium_ 1 P o d i u m _ 1 + β I 1 I 1 + ⋯ ⋅ + β I m I m + β Rankqualsum R a n k q u a l s u m .

The model specification makes it possible to separate the effect of competition from the recovery effect. The sign of Rankqualsum is expected to be positive. A larger rank sum indicates weaker competition in the quarterfinal which should be associated with a high probability of reaching the podium when controlling for type of quarterfinal and the athlete’s capacity. We expect β Q 1 and β Q 2 to be positive, and β Q 4 and β Q 5 to be negative, due to different length of recovery time. In addition, from the discussion on recovery time in Section 2 we assume β Q 1 = β Q 2 and β Q 4 = β Q 5 , restrictions to be tested with a Wald test. Thus, provided the restrictions are true, we may look upon the five quarterfinals as three types of quarterfinals: early quarterfinals, quarterfinal 3, and late quarterfinals. Letting Q_E and Q_L be dummy variables for early and late quarterfinals, respectively, quarterfinal 3 being the reference, the linear predictor simplifies to

(4) η = β 0 + β Q E Q E + β Q L Q L + β Rankqual R a n k q u a l + β Rankqualsq R a n k q u a l s q + β Rankwcp R a n k w c p + β Rankwcpsq R a n k w c p s q + β Podium _ 1 P o d i u m _ 1 + β I 1 I 1 + ⋯ . + β I m I m + β Rankqualsum Rankqualsum ⋅

Now, let x_j define the rank sum for the chosen quarterfinal of type j, j = 1, 2, 3. Conditioning on the three types of quarterfinals, the linear predictor can be written as

η q u a r t e r fi n a l o f t y p e 1 = β 0 + β Q E + β Rankqualsum x 1 + ⋯ η q u a r t e r fi n a l o f t y p e 2 = β 0 + β Rankqualsum x 2 + ⋯ η q u a r t e r fi n a l o f t y p e 3 = β 0 + β Q L + β Rankqualsum x 3 + ⋯ .

The probability of reaching the podium will be the same, irrespective of type of quarterfinal, conditioning on a certain capacity, if, and only if,

η t y p e 1 , c a p a c i t y = η t y p e 2 , c a p a c i t y = η t y p e 3 , c a p a c i t y .

Upon substitution we get

β Q E + β Rankqualsum x 1 = β Rankqualsum x 2 a n d β Q L + β Rankqualsum x 3 = β Rankqualsum x 2 .

Solving for x₁ and x₃, we obtain x 1 = x 2 − β Q E β Rankqualsum and x 3 = x 2 − β Q L β Rankqualsum .

By adding the restriction 2x₁ + x₂ + 2x₃ = 1 + 2 + ⋯ + 30 = 465 we obtain an equation system with three equations and three unknowns, x₁, x₂ and x₃, which can be solved for in terms of the parameters β Q E , β Q L and β_Ranksumqual. Solving the system yields:

x 1 = 93 + 2 β Q L − 3 β Q E 5 β Rankqualsum , x 2 = 93 + 2 β Q E + β Q L 5 β Rankqualsum , x 3 = 93 + 2 β Q E − 3 β Q L 5 β Rankqualsum .

Expecting β Q L < 0 , β Q E > 0 and β_Rankqualsum > 0 it follows that we should expect x₁ < x₂ < x₃. Thus, in order to make the athletes indifferent between the three types of quarterfinals, the rank sums should increase with type of quarterfinal. Moreover, from the expected signs it follows that the rank sum for early quarterfinals should be smaller than 93, while the rank sum for late quarterfinals should be larger than 93, which is the value assigned to all quarterfinals in the former system.

Once maximum likelihood estimates of x₁, x₂ and x₃ are obtained, rounded to integer values, the new design is implemented as follows: For a certain value of x_j,j = 1, 2, 3, one out of all possible samples of six unique qualification ranks summing up to x_j, is randomly drawn. These six numbers are assigned to the specific type of quarterfinal.

8.2 Results

In this section we present estimation results of the models presented in Section 8.1. Consider Table 5. For the first model, Equation (3), the result from the proposed Wald test in Section 8.1 reveals no support for the recovery effect to differ between the first two quarterfinals or between the last two quarterfinals. This is true for men as well as for women. Therefore, we turn our attention to the results obtained from Equation (4), where only three types of quarterfinals will be considered.^[10] As expected, the sign associated with the variable Rankqualsum is positive. This means that a large rank sum, i.e., a weak competition, is associated with a high probability of reaching the podium, conditioning on a certain athlete going in a specific quarterfinal. For women, this effect is not as strong as for men, although the sign is positive, as expected. A 95 % confidence interval is given by (−0.00656, 0.0366), i.e., its lower limit is slightly less than zero.

Table 5:

Results for logistic regression models including degree of competition, podium as response variable.

Sex	Men		Women
Model variable	(1)	(2)	(1)	(2)
Intercept	−7.72*	−7.57*	−4.21*	−4.19*
	(1.53)	(1.45)	(1.45)	(1.42)
Q ₁	1.06*		0.750*
	(0.401)		(0.371)
Q ₂	0.640		0.750*
	(0.377)		(0.362)
Q ₄	−0.339		−0.126
	(0.471)		(0.408)
Q ₅	−0.296		−0.244
	(0.572)		(0.480)
Q _E		0.828*		0.751*
		(0.340)		(0.330)
Q _L		−0.321		−0.173
		(0.409)		(0.379)
Rankqual	−0.286*	−0.280*	−0.258*	−0.259*
	(0.0690)	(0.0679)	(0.0558)	(0.0557)
Rankqualsq	0.00423	0.00413	0.00466*	0.00467*
	(0.00269)	(0.00266)	(0.00192)	(0.00192)
Rankwcp	0.0194		−0.163*	−0.160*
	(0.0646)		(0.0633)	(0.0574)
Rankwcpsq	0.000569	0.00113	0.00341	0.00334
	(0.00236)	(0.000762)	(0.00223)	(0.00209)
Podium_1	0.254	0.193	−0.0597
	(0.326)	(0.320)	(0.325)
Rankqualsum	0.0330*	0.0325*	0.0156	0.0150
	(0.00990)	(0.00986)	(0.0112)	(0.0110)
Equal recovery effect within types of quarterfinal (p-value)	0.466		0.966
No quarter final effect (p-value)	0.0054*	0.0013*	0.0252*	0.0039*
No individual effect (p-value)	0.0028*	0.0020*	<0.0005*	<0.0005*

Standard error in parenthesis; ^*significant at 5 %.

The parameter estimates are expected to be positive for early quarterfinals, and negative for late quarterfinals. This is also consistent with our findings, although the estimates for late quarterfinals are not significantly negative.

Now, following the approach described in Section 8.1 and using estimates from the second model specification (Equation (4)) in Table 5, it is possible to calculate estimates of those sums of ranking numbers making a certain athlete indifferent between quarterfinals, that is, levelling out the recovery effect and the competition effect across quarterfinals. These estimates are provided in Table 6.

Table 6:

Estimated rank sums for indifference between quarterfinals for men and women*.

	Quarterfinal
	1	2	3	4	5
Men	74	74	99	109	109
Women	58	58	109	120	120

^*Rounded to the nearest integer.

In comparison with the observed average sum of ranking numbers for different quarterfinals based on 56 competitions in Table 1, our rank sum estimates in Table 6 are lower for the earlier quarterfinals than they are for the later quarterfinals. Thus, our estimated unbalanced seeding suggests that high ranked athletes should pick early quarterfinals to an even greater extent than they do. There is a difference in the estimated sums of ranking numbers between men and women. For women, the sum should be as low as 58 for early quarterfinals compared to 74 for men. The reason for this difference is the difference in the estimated effect of the variable Rankqualsum. For men, a change in the value of this variable by a certain amount has a larger effect on the performance than a corresponding change for women. A wider spread between rank sums for early and late quarterfinals is needed for the women to compensate for differences in recovery time.

The validity of our proposed seeding scheme relies on that the sum of ranking numbers being an unbiased measure of the true degree of competition. To illustrate, suppose there is another factor, also reflecting the degree of competition. Quarterfinals being the unit, we consider the joint distribution of this factor and the sum of ranking numbers with respect to two types of drawings, the random drawings according to our seeding scheme and the drawings made by the athletes. If these two joint distributions differ, our seeding scheme will produce biased proposals of sums of ranking numbers. For example, suppose (i) that the other factor, just as the sum of ranking numbers, is inversely related to the degree of competition, and (ii) the correlation between the two variables is negative with respect to the drawings made by the athletes, while positive with respect to seeding scheme random drawings. Then, we would allocate a too low sum of ranking numbers for early quarterfinals and a too high sum for late quarterfinals.^[11]

9 Conclusions and discussion

By replacing a standard seeding method with a “choose-your-opponents” format when distributing prequalified athletes across quarterfinals, the International Ski Federation brought a demanding decision task into their elimination tournaments, requiring high cognitive capacity from the athletes. Our simple decision model with four players, capturing the inherent trade-off between recovery time and competition, provides the prediction that at least one high ranked player should choose an early rather than a late round (semifinal) to maximize her probability of winning the elimination tournament. A test rejects the hypothesis that athletes choose quarterfinal randomly, indicating our prediction being consistent with the observed athletes’ choices of quarterfinals. Moreover, we also find that the probability of reaching the podium is still higher when choosing an early quarterfinal, conditioning on athletes’ capacity, despite the impact of increased competition in the early quarterfinals, suggesting sub-optimal decisions. The athlete’s rank from the qualification race is by far the strongest predictor to determine the probability to reach the podium. Hence, the athlete’s daily form seems to be a decisive factor in achieving success in the competition. In addition, an athlete’s ranking from the qualification round to some extent also hinges on the preparation of the skis (Budde and Himes 2017; Moxnes, Sandbakk, and Hausken 2014). Since the snow conditions most likely are similar in all races, an athlete managing to reduce the sliding friction of the skis, increases her chances to succeed in the qualification round as well as in the subsequent races in case of advancement.

Holding on to seeding, we present a revision of the old seeding scheme. Instead of letting the sum of athletes’ ranking numbers from the qualification round be equal, i.e., the sum 93, across quarterfinals, this sum should differ across quarterfinals to adjust for the variation in recovery time. The earlier the quarterfinal, the lower is the sum of ranking numbers. Compared to the current design, this revised seeding would thus eliminate a tricky strategic element from the competition, still capturing the fundamental disparity across quarterfinals. Our proposal does not point out in which of the quarterfinals a competitor with a certain ranking number from the qualification round should compete. For each quarterfinal, our seeding only specifies a total sum of ranking numbers, arising from possible combinations of six numbers, adding up to the specified sum. It may well be that some athletes, due to high talent for tactics, prefer a choose-your-opponents format instead of being seeded into quarterfinals. Tactical decision-making is, after all, an inherent trait in sports, where the difference between a good and a great performance is often a matter of tactical skills. However, one can argue that the current method used by the FIS promotes abilities going beyond what is reasonably required to be successful in skiing-sprint.

Clearly, an assertion that only cognitive limitations when choosing quarterfinals are behind the result is not without objection. First and foremost, it is reasonable to assume that for some of the athletes, the objective function behind the choice of quarterfinal may reflect other goals than maximizing the probability of ending up on the podium. For example, athletes may seek to maximize the expected award of World Cup points. The expected number of points awarded may be maximized by choosing a late quarterfinal with lower competition, even if it leads to a shorter recovery time in case of advancement to the final, thus lowering the chances of ending up on the podium.

Secondly, the athlete’s choice of quarterfinal surely to a large extent depends on the private perception of own capacity on the current competition day. In other words, the choice of quarterfinal may correlate with factors of capacity not controlled for in our model.

One natural step in this research is to adopt another objective function behind the athletes’ choices of quarterfinals. As suggested above, a plausible point of departure for the analysis is to assume the athletes seeking to maximize the expected World Cup points rather than maximizing the probability of ending up on the podium. Another step would be to use comparable data from the same skiing sprint competitions held the years before the season 2014/2015 to investigate whether an athlete’s results being affected by the FIS’s decision to switch design. Given that an athlete’s performance is observed under both regimes, it may be possible to assess – conditioning on the athlete’s various rankings – to what extent the athlete’s choice of quarterfinal has improved her outcome in the competitions under the current design versus the old seeding design.

Corresponding author: Anders Lunander, Örebro University School of Business, Örebro, Sweden, E-mail: anders.lunander@oru.se

Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: None declared.
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

Appendix A

A.1 Proof of Proposition 1

Given that player A initially chooses the first semifinal (s.1), we write the probability of player B winning the tournament when choosing s.1 as

p B , 1 = 0.5 × 0.5 1 − 1 − p c + 0.5 × 0.5 1 − 1 − p c

The first part is the probability that player B beats player A, player C beats player D in s.2, and in the final player B beats player C. Note that player C’s probability of beating player B in the final is reduced by the factor c, that is 1 − p c .

The second part identifies the same probabilities as the first part, but now it is player D who advances to the final. If player B instead chooses s.2, then player C is better off choosing s.1 than s.2. To see this, we have p C , 2 = 1 − p p 1 − p c + 1 − p 1 − p 1 − 0.5 c and p C , 3 = 1 − p p 1 − p c + 1 − p 1 − p 0.5 c , where p C , 2 − p C , 3 = 1 − p 1 − c > 0 .

Thus, the probability that player B will win the tournament given that player A chooses s.1 and player B chooses s.2 can be written as p B , 2 = p × 0.5 p c + p 1 − p p c .

Player B will choose to compete against player A in s.1 if p_B,1 − p_B,2 > 0. Using the expressions for p_B,1 and p_B,2, we obtain the condition for player B choosing to compete against player A in s.1 as c < − 0.5 p 3 − 1.5 p 2 + 0.5 p − 0.5 . Now, given this condition, the probability of player A winning the tournament if he chooses s.1 is p A , 1 = 0.5 0.5 1 − 1 − p c + 0.5 1 − 1 − p c . Otherwise, player B will meet player C in s.1, generating the corresponding probability

p A , 2 = p × p 1 − 0,5 c + p 1 − p 1 − 1 − p c .

Turning to the case where player A initially chooses the second semi-final (s.2), and player B chooses s.1, it can easily be verified that player C prefers to compete against player B in s.1 rather than facing player A in s.2. The price player C has to pay, in case of winning against player A in s.2, is a reduced probability 1 − p c o r 0.5 c of winning the final either against player B or against player D. We have p C , 4 − p C , 5 = 1 − p 1 − c > 0 .

Player B’s probability of winning the tournament when choosing s.1 is then

p B , 4 = p × p 1 − 0.5 c + p 1 − p 1 − 1 − p c .

Making use of the inequalities 1 − 0.5 c > 0.5 a n d 1 − 1 − p c > 0.5 , we get

p B , 4 = p × p 1 − 0.5 c + p 1 − p 1 − 1 − p c > p × p 0.5 + p 1 − p 0.5 = 0.5 p .

Since player B’s probability of winning the tournament when choosing s.2 is p_B,6 = 0.5pc, we have established that p_B,4 − p_B,6 > 0. To summarize, if player A initially chooses s.2 then he will face player D in this semifinal. Player A’s probability of winning the tournament when choosing s.2 is then p A , 4 = p × p 0.5 c + p 1 − p p c . However, player A will never choose s.2. For the case c < − 0.5 p 3 − 1.5 p 2 + 0.5 p − 0.5 , it is relevant for player A to compare p_A,1 with p_A,4. For this case, we have earlier found that p_B,1 > p_B,2. By definition of the plays we also have p_B,1 = p_A,1 and p_B,2 = p_A,4. Hence, p_A,1 > p_A,4.

For the case c > − 0.5 p 3 − 1.5 p 2 + 0.5 p − 0.5 we compare p_A,2 with p_A,4. It is easily verified that

p A , 2 − p A , 4 = p 1 − c > 0 .

A.2 Proof of Proposition 2

To show that the test statistic follows a standard normal distribution under the null hypothesis, we need to prove that R ̄ E − R ̄ L ∼ appr N 0 , 465 14 under H₀. First, from the way the null hypothesis is formulated we have E R ̄ E − R ̄ L = E R ̄ E − E R ̄ L = 0 . Second, to prove that V R ̄ E − R ̄ L = 465 / 14 , we define X_ik and Y_ik, i = 1, 2, …, 12, for the kth competition, as the rank for the ith athlete in an early quarterfinal and in a late quarterfinal, respectively. Thus, U k = ∑ i = 1 12 X i k and V k = ∑ i = 1 12 Y i k .

From the way a competition is designed, under the assumption of the null hypothesis that the athletes choose quarterfinals at random, it follows that X_ik and Y_ik Y_ik are discrete uniformly distributed from 1 to 30, implying that E X i k = E X i k = 15.5 and V X i k = V Y i k = 74 11 12 (see Casella and Berger 2002). Referring to the same assumption of random choice, it also follows that X_ik and X_jm are independent for all combinations of i , j and k , m , except for combinations where k = m. This result holds true for Y_ik and Y_jm, as well as for X_ik and Y_jm. For combinations where k = m and i ≠ j we get

Cov X i k , X j k = E X i k − E X i k X j k − E X j k = E X i k X j k − E X i k E X j k = ∑ x i k = 1 30 ∑ x j k = 1 30 x i k x j k P X i k = x i k , X j k = x j k − 15.5 2 = 1 870 ∑ ∑ x i k ≠ x j k x i k x j k − 15.5 2 = − 2 7 12 ,

where the second to last step follows from the fact that X_ik and X_jk cannot take on the same value and there are 870 possible outcomes x i k , x j k , all equally likely to occur. The same result holds for Cov Y i k , Y j k . The result is also valid for Cov X i k , Y j k , here for the case i = j as well. Now, making use of the results for V X i k and Cov X i k , X j k , we get

V U k = V ∑ i = 1 12 X i k = ∑ i = 1 12 V X i k + 2 ∑ ∑ 1 ≤ i ≤ j ≤ 12 Cov X i k , X j k = 12 V X i k + 132 Cov X i k , X j k = 558 ⋅

Likewise, we obtain V V k = 558 . We also get

Cov U k , V k = Cov ∑ i = 1 12 X i k , ∑ j = 1 12 Y j k = ∑ i = 1 12 ∑ j = 1 12 Cov X i k , Y j k = 144 Cov X i k , Y j k = − 372 .

In order to find V R ̄ E − R ̄ L , we also need V R ̄ E , V R ̄ L and Cov R ̄ E , R ̄ L . Since U₁, U₂, …, U₅₆ are i.i.d, we have V R ̄ E = V ∑ k = 1 56 U k 56 = 1 56 2 ∑ k = 1 56 U k = 1 56 2 × 56 V U k = 279 28 using the result V U k = 558 . Likewise, we obtain V R ̄ L = 279 28 .

Using that Cov U k , V k = 372 and Cov U k , V m = 0 for k ≠ m, we have

Cov R ̄ E , R ̄ L = Cov ∑ k = 1 56 U k 56 , ∑ m = 1 56 V m 56 = 1 56 2 ∑ k = 1 56 ∑ m = 1 56 Cov U k , V m = 1 56 2 ∑ k = 1 56 Cov U k , V k = 1 56 2 × 56 Cov U k , V k = − 93 14 .

We obtain the variance V R ̄ E − R ̄ L as V R ̄ E − R ̄ L = V R ̄ E + V R ̄ L − 2 Cov R ̄ E , R ̄ L = 465 14 . Third, to prove that R ̄ E − R ̄ L follows an approximate normal distribution, we note that U₁, U₂, …, U₅₆ as well as V₁, V₂, …, V₅₆ are i.i.d, meaning that R ̄ E and R ̄ L are approximately normal distributed by the Central Limit Theorem. Therefore, the difference between R ̄ E and R ̄ L – a linear combination – is also normal distributed.

References

Amegashie, J. A., C. B. Cadsby, and Y. Song. 2007. “Competitive Burnout: Theory and Experimental Evidence.” Games and Economic Behavior 59 (2): 213–39. https://doi.org/10.1016/j.geb.2006.08.009.Search in Google Scholar

Budde, R., and A. Himes. 2017. “High-Resolution Friction Measurements of Cross-Country Ski Bases on Snow.” Sports Engineering 20 (4): 299–311. https://doi.org/10.1007/s12283-017-0230-5.Search in Google Scholar

Casella, G., and R. L. Berger. 2002. Statistical Inference, 2nd ed. Pacific: Duxbury.Search in Google Scholar

Cea, S., G. Durán, M. Guajardo, D. Sauré, J. Siebert, and G. Zamorano. 2020. “An Analytics Approach to the FIFA Ranking Procedure and the World Cup Final Draw.” Annals of Operations Research 286 (1): 119–46. https://doi.org/10.1007/s10479-019-03261-8.Search in Google Scholar

Corona, F., D. Forrest, J. D. D. Tena, and M. Wiper. 2019. “Bayesian Forecasting of UEFA Champions League under Alternative Seeding Regimes.” International Journal of Forecasting 35 (2): 722–32. https://doi.org/10.1016/j.ijforecast.2018.07.009.Search in Google Scholar

Csató, L. 2020a. “Optimal Tournament Design: Lessons from the Men’s Handball Champions League.” Journal of Sports Economics 21 (8): 848–68. https://doi.org/10.1177/1527002520944442.Search in Google Scholar

Csató, L. 2020b. “The UEFA Champions League Seeding is not Strategy-Proof since the 2015/16 Season.” Annals of Operations Research 292 (1): 161–9. https://doi.org/10.1007/s10479-020-03637-1.Search in Google Scholar

Csató, L. 2021. Tournament Design: How Operations Research can Improve Sports Rules. Palgrave Pivots in Sports Economics. Cham, Switzerland: Palgrave Macmillan.10.1007/978-3-030-59844-0Search in Google Scholar

Dagaev, D., and A. Suzdaltsev. 2018. “Competitive Intensity and Quality Maximizing Seedings in Knock-Out Tournaments.” Journal of Combinatorial Optimization 35 (1): 170–88. https://doi.org/10.1007/s10878-017-0164-7.Search in Google Scholar

Dagaev, D., and V. Rudyak. 2019. “Seeding the UEFA Champions League Participants: Evaluation of the Reforms.” Journal of Quantitative Analysis in Sports 15 (2): 129–40. https://doi.org/10.1515/jqas-2017-0130.Search in Google Scholar

Engist, O., E. Merkus, and F. Schafmeister. 2021. “The Effect of Seeding on Tournament Outcomes: Evidence from a Regression-Discontinuity Design.” Journal of Sports Economics 22 (1): 115–36. https://doi.org/10.1177/1527002520955212.Search in Google Scholar

Fédération Internationale de Ski, FIS. 2020. https://www.fis-ski.com/en/cross-country/ (Accessed March, 2020).Search in Google Scholar

Groh, C., B. Moldovanu, A. Sela, and U. Sunde. 2012. “Optimal Seedings in Elimination Tournaments.” Economic Theory 49 (1): 59–80. https://doi.org/10.1007/s00199-008-0356-6.Search in Google Scholar

Guyon, J. 2015. “Rethinking the FIFA World Cup™ Final Draw.” Journal of Quantitative Analysis in Sports 11 (3): 169–82. https://doi.org/10.1515/jqas-2014-0030.Search in Google Scholar

Guyon, J. 2022. “‘Choose Your Opponent’: A New Knockout Design for Hybrid Tournaments.” Journal of Sports Analytics 8 (1): 9–29. https://doi.org/10.3233/jsa-200527.Search in Google Scholar

Harbaugh, R., and T. Klumpp. 2005. “Early Round Upsets and Championship Blowouts.” Economic Inquiry 43 (2): 316–29. https://doi.org/10.1093/ei/cbi021.Search in Google Scholar

Hébert-Losier, K., C. Zinner, S. Platt, T. Stöggl, and H. C. Holmberg. 2017. “Factors that Influence the Performance of Elite Sprint Cross-Country Skiers.” Sports Medicine 47 (2): 319–42. https://doi.org/10.1007/s40279-016-0573-2.Search in Google Scholar PubMed PubMed Central

Hwang, F. K. 1982. “New Concepts in Seeding Knockout Tournaments.” The American Mathematical Monthly 89 (4): 235–9. https://doi.org/10.1080/00029890.1982.11995420.Search in Google Scholar

Karpov, A. 2016. “A New Knockout Tournament Seeding Method and its Axiomatic Justification.” Operations Research Letters 44 (6): 706–11. https://doi.org/10.1016/j.orl.2016.09.003.Search in Google Scholar

Laliena, P., and F. J. López. 2019. “Fair Draws for Group Rounds in Sport Tournaments.” International Transactions in Operational Research 26 (2): 439–57. https://doi.org/10.1111/itor.12565.Search in Google Scholar

Marchand, É. 2002. “On the Comparison between Standard and Random Knockout Tournaments.” Journal of the Royal Statistical Society: Series D (The Statistician) 51 (2): 169–78. https://doi.org/10.1111/1467-9884.00309.Search in Google Scholar

Moxnes, J. F., and E. D. Moxnes. 2014. “Mathematical Simulation of Energy Expenditure and Recovery during Sprint Cross-Country Skiing.” Open Access Journal of Sports Medicine 5: 115. https://doi.org/10.2147/oajsm.s62020.Search in Google Scholar

Moxnes, J. F., Ø. Sandbakk, and K. Hausken. 2014. “Using the Power Balance Model to Simulate Cross-Country Skiing on Varying Terrain.” Open Access Journal of Sports Medicine 5: 89. https://doi.org/10.2147/oajsm.s53503.Search in Google Scholar

Ryvkin, D. 2011. “Fatigue in Dynamic Tournaments.” Journal of Economics and Management Strategy 20 (4): 1011–41. https://doi.org/10.1111/j.1530-9134.2011.00314.x.Search in Google Scholar

Vesterinen, V., J. Mikkola, A. Nummela, E. Hynynen, and K. Häkkinen. 2009. “Fatigue in a Simulated Cross-Country Skiing Sprint Competition.” Journal of Sports Sciences 27 (10): 1069–77. https://doi.org/10.1080/02640410903081860.Search in Google Scholar PubMed

Wright, M. 2014. “OR Analysis of Sporting Rules–A Survey.” European Journal of Operational Research 232 (1): 1–8. https://doi.org/10.1016/j.ejor.2013.03.043.Search in Google Scholar

Wright, M. B. 2009. “50 Years of OR in Sport.” Journal of the Operational Research Society 60 (1): S161–68. https://doi.org/10.1057/jors.2008.170.Search in Google Scholar

Zory, R., G. Millet, F. Schena, L. Bortolan, and A. Rouard. 2006. “Fatigue Induced by a Cross-Country Skiing KO Sprint.” Medicine & Science in Sports & Exercise 38 (12): 2144. https://doi.org/10.1249/01.mss.0000235354.86189.7e.Search in Google Scholar PubMed

Received: 2021-03-18

Accepted: 2023-06-16

Published Online: 2023-07-11

Published in Print: 2023-09-28

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/jqas-2021-0027

Keywords for this article

choosing opponent; elimination tournament; seeding; skiing sprint

Creative Commons

BY 4.0