Home Right on Time: An Electoral Audit for the Publication of Vote Results
Article
Licensed
Unlicensed Requires Authentication

Right on Time: An Electoral Audit for the Publication of Vote Results

  • Leonardo Antenangeli and Francisco Cantú ORCID logo EMAIL logo
Published/Copyright: November 21, 2019
Become an author with De Gruyter Brill

Abstract

The publication of electoral results in real time is a common practice in contemporary democracies. However, delays in the reporting of electoral outcomes often stir up skepticism and suspicion in the vote-counting process. This issue urges us to construct a systematic test to distinguish delays attributable to manipulation to those resulting from a limited administrative capacity. This paper proposes a method to assess the potential sorting of the electoral results given the moment at which polling stations publish their vote totals. To do so, we model the time span for a polling station to report its electoral results, to identify those observations whose reported times are poorly explained by the model, and to assess a potential bias in the candidates’ vote trends. We illustrate this method by analyzing the 2006 Presidential Election in Mexico, a contest that aroused suspicion from opposition parties and public opinion alike regarding how the electoral results were reported. The results suggest that polling stations’ time logs mostly respond to their specific geographic, logistic, and sociodemographic features. Moreover, those observations that took longer than expected to report their returns had no systematic effect on the electoral outcome. The proposed method can be used as an additional post-election audit to help officials and party representatives evaluate the integrity of an election.

Acknowledgement

We are grateful to Dorothy Kronick, Michelle Torres, and Walter Mebane for their useful comments. Previous versions of the paper were presented at the annual meetings of the Midwest Political Science Association, Southern Political Science Association, and the American Political Science Association.

6

6 Appendix

6.1 Analysis of the Vote Shares

This test looks for a systematic bias in the reported results of the outlying observations. We do so by comparing the vote shares for each candidate between our outlying observations and their corresponding control units. This approach allows us to identify potential trends on the vote results over time and assess the overall effect on the electoral result.

The comparisons are illustrated in Figure 10, which shows the differences between the vote shares of the three main candidates for each outlying observation and its correspondent control unit. We separate the figure by each of the matching approaches referenced above. Dots above the horizontal line indicate a positive difference for the candidate in the outlying observations compared with their control units. By contrast, a value below the horizontal line denotes a lower candidate’s vote share for the outlying observation compared with that reported by its control units. The plots show no observable trend between the reported time of the outlying observation and the differences between the candidates’ vote shares. Overall, the discrepancies for each candidate cancel each other out, failing to back the argument of a potential sorting of the results in favor of or against a particular candidate.

Figure 6: Histogram for the values of Crossings. Notes: The plots show the frequency for the number of times in which the leading candidate overturns the vote aggregates of the runner-up in the simulated elections (Plot 1(A)) and permuted results (Plot 1(B)) described in Section 3.1. The dashed line represents the observed number of crossings in the candidates’ vote shares reported by the PREP. We log-transform the number of overturns per iteration to facilitate its visualization.
Figure 6:

Histogram for the values of Crossings. Notes: The plots show the frequency for the number of times in which the leading candidate overturns the vote aggregates of the runner-up in the simulated elections (Plot 1(A)) and permuted results (Plot 1(B)) described in Section 3.1. The dashed line represents the observed number of crossings in the candidates’ vote shares reported by the PREP. We log-transform the number of overturns per iteration to facilitate its visualization.

Figure 7: Histogram for the values of Last overturns. Notes: The plots show the frequency for the number of reported tallies before the leading candidate gets an unsurpassable vote share in the simulated elections (Plot 2(A)) and permuted results (Plot 2(B)) described in Section 3.1. The dashed line represents the observed last overturn in the candidates’ vote shares reported by the PREP. We log-transform the number of last overturns overturns per iteration to facilitate its visualization.
Figure 7:

Histogram for the values of Last overturns. Notes: The plots show the frequency for the number of reported tallies before the leading candidate gets an unsurpassable vote share in the simulated elections (Plot 2(A)) and permuted results (Plot 2(B)) described in Section 3.1. The dashed line represents the observed last overturn in the candidates’ vote shares reported by the PREP. We log-transform the number of last overturns overturns per iteration to facilitate its visualization.

Figure 8: Presidential candidates’ vote shares by the time the results of the polling stations were reported. Mexico, 2006. Notes: The figure shows the vote shares for the candidates given the reported time of the preliminary results. The dashed lines show the vote shares for the candidates after adjusting the reporting times of the outlying units.
Figure 8:

Presidential candidates’ vote shares by the time the results of the polling stations were reported. Mexico, 2006. Notes: The figure shows the vote shares for the candidates given the reported time of the preliminary results. The dashed lines show the vote shares for the candidates after adjusting the reporting times of the outlying units.

Table 3:

Hazard Ratios of Binary Variables with Time Interactions.

Variable Hazard ratio (minimum) Hazard ratio (mean)
PAN observers 111.6232 1.4131
PRD observers 15.0043 1.2715
PRI observers 2613.607 2.3721
  1. Columns display hazard ratios at the minimum and mean (logged) reporting times, respectively. For each covariate, a hazard ratio for xi=1 is compared to a baseline of xi=0.

Table 4:

Absolute Standardized Differences for the Covariates Before and After Sparse Optimal Matching.

Variable Unbalanced
Balanced
Treated Control Standardized difference Treated Control Standardized difference
Log driving time 1.68 2.41 0.548 1.71 1.71 0.002
Registered voters 0.70 0.76 0.421 0.71 0.71 0.011
65 years old 0.07 0.06 0.180 0.07 0.07 0.0003
Illiteracy 0.07 0.08 0.250 0.07 0.07 0.005
High school 0.12 0.11 0.220 0.12 0.12 0.003
No Spanish 0.001 0.003 0.078 0.001 0.001 0.008
Computer 0.24 0.20 0.240 0.24 0.24 0.019
Absentees 0.70 0.72 0.033 0.69 0.69 0.003
Figure 9: Geographic location of the outliers identified on Section 3.2.
Figure 9:

Geographic location of the outliers identified on Section 3.2.

Figure 10: Difference in vote shares for each outlier by candidate and comparison method.
Notes: The plots show the difference in the vote shares for each candidate between those observations identified as outliers and their matched polling stations. Dots above (below) the red line depict higher (lower) vote shares of the outliers when compared to their matched observations.
Figure 10:

Difference in vote shares for each outlier by candidate and comparison method.

Notes: The plots show the difference in the vote shares for each candidate between those observations identified as outliers and their matched polling stations. Dots above (below) the red line depict higher (lower) vote shares of the outliers when compared to their matched observations.

To estimate the overall effect of the outlying observations, we compare their mean vote share for each candidate with those reported by the control units. The results are summarized in Figure 11, which shows the estimated difference of means and 95% confidence intervals of the vote shares between the outliers and their respective counterfactual sets when comparing adjacent polling stations and using sparse optimal matching, respectively. In both cases, the differences are never statistically different from 0, suggesting that the results reported later than expected had no effect on the reported trend for each candidate.

Figure 11: Difference in means for the candidates’ vote shares. Notes: The dots and horizontal lines show the mean differences and 95% confidence intervals for comparing the vote shares of the outliers to the matched observations.
Figure 11:

Difference in means for the candidates’ vote shares. Notes: The dots and horizontal lines show the mean differences and 95% confidence intervals for comparing the vote shares of the outliers to the matched observations.

In sum, the method described above suggests a way to identify suspicious observations given the time in which a polling station reported the results. In the case of the 2006 presidential election in Mexico, we find that polling stations reporting their results with an undue delay represent less than 0.5% of all observations. Moreover, and against the conventional wisdom regarding the preliminary vote count in the country, we did not find significant biases in the vote shares of these outlying units. Our model shows that the reporting time of every polling station mostly corresponds to its geographic and logistic context, and that the late results show no particular pattern.

As an extension of the proposed methodology, the next subsection shows the analysis of similar cases for the 2016 local elections in two Mexican provinces. In both elections, the losing candidates claimed fraud and blamed the electoral authorities for multiple irregularities during the preliminary vote count. Using data for those elections, we find that the (un)timely reporting of the results had no effect on the overall trend for each of the candidates in their respective elections.

6.2 2016 Local Elections

We use the proposed methodology to assess the fraud allegations on the preliminary counts of two gubernatorial elections in Oaxaca and Veracruz on June 5, 2016. Both states are often identified among those states with the lowest levels of sub-national democracy (Gibson 2013; Giraudy 2015). In the case of Oaxaca, the PRI’s candidate won the election with 32% of the votes, followed by the PAN and PRD coalition and the recently formed National Regeneration Movement (MORENA) with 25% and 23% of the vote, respectively. This result gave back control of the state to the PRI after losing the election 6 years earlier. Meanwhile, the PAN’s candidate won the governorship in Veracruz with 34% of the vote, followed by the candidates of the PRI and MORENA with 30% and 27%, respectively. This result ended the PRI’s 87-year rule of the state.

In both elections, losing candidates claimed fraud and blamed the electoral authorities for multiple irregularities during the preliminary vote count. In Oaxaca, after the PREP showed a lead for the PRI candidate, opposition parties alleged that the PREP was biased, suggesting a potential manipulation of the results to match them with the pre-electoral polls.[28] Similarly, the candidate of MORENA in Veracruz, Cuitláhuac García, accused the PREP of being “statistically manipulated,” and argued that the electoral authorities inflated the preliminary results with more than 100,000 fake votes.[29]

To assess the validity of these claims we proceeded in a similar way to the analysis described in the previous section. First, we obtained the coordinates of the local district offices in each state and estimated their driving distance to the polling stations from which they should receive the results. Then, using sociodemographic data from the 2010 census, we estimate a duration model for the time in which the results were reported in the PREP.[30] Unlike the analysis for the 2006 election, we lack information on the presence of party representatives and poll workers. For Oaxaca, we also lack information on the number of registered voters.

Following the procedures described above, we estimate the survival models and correct for non-proportionality. The results of the survival models for Oaxaca and Veracruz are shown in the Appendix in Tables 5 and 6, respectively. We then obtain the Deviance residuals of the final model for each election and identify as outliers those observations with a residual value below −3; the geographic locations of these outliers are shown in Figure 12. Finally, we create potential counterfactuals to estimate the potential effect of the unexpected survival time of these outliers on the electoral result. The complete results are in the Appendix.

Figure 12: Geographic location of the outliers. Notes: The plots in each map show the geographic location of those polling station which reporting times were underestimated by the model.
Figure 12:

Geographic location of the outliers. Notes: The plots in each map show the geographic location of those polling station which reporting times were underestimated by the model.

Figure 13 shows the overall effects of the outliers on vote shares in Oaxaca and Veracruz, respectively. The difference in means for each candidate’s vote share are plotted in both figures, along with 95% confidence intervals. As in the case of the 2006 Election, the model once again shows no statistical difference in outcomes when comparing the outliers to their respective counterfactuals, suggesting that the (un)timely reporting of the results had no effect on the overall trend for each of the candidates in their respective elections.

Figure 13: Difference in means for the candidates’ vote shares. Notes: The dots and horizontal lines show the mean differences and 95% confidence intervals comparing the vote shares of the outliers to the matched observations.
Figure 13:

Difference in means for the candidates’ vote shares. Notes: The dots and horizontal lines show the mean differences and 95% confidence intervals comparing the vote shares of the outliers to the matched observations.

Table 5:

Cox Model on the Reporting Time for the Vote Counts in Every Polling Station, Oaxaca.

Variable Coef.
Log (Driving distance) −0.233
(0.0173)
65-Year-Olds 0.0784
(0.4024)
Illiteracy 330.8147
(4.5249)
Computer 199.5805
(3.1975)
No Spanish −142.741
(4.0912)
High school 5.0218
(0.3341)
Illiteracy*log (minutes) −51.1548
(0.7053)
Computer*log (minutes) −32.4932
(0.5333)
No Spanish*log (minutes) 22.0642
(0.6213)
θ 0.958
I-likelihood −25980.2
Table 6:

Cox Model on the Reporting Time for the Vote Counts in Every Polling Station, Veracruz.

Variable Coef.
Log (Registered voters) 0.0373
(0.0522)
Log (Driving distance) −0.2031
(0.0151)
65-Years-Old 0.4642
(0.3974)
Illiteracy 251.0217
(2.5864)
Computer 126.0903
(1.3068)
No Spanish −157.4623
(7.568)
High school 0.9299
(0.229)
Illiteracy*log (minutes) −38.3146
(0.4041)
Computer*log (minutes) −19.6523
(0.2084)
No Spanish*log (minutes) 24.0149
(1.1028)
θ 0.965
I-likelihood −54040.2
Figure 14: Schoenfeld residuals showing violations of proportionality assumption (Oaxaca Elections, Table 5).
Figure 14:

Schoenfeld residuals showing violations of proportionality assumption (Oaxaca Elections, Table 5).

Figure 15: Schoenfeld residuals for corrected model (Oaxaca Elections).
Figure 15:

Schoenfeld residuals for corrected model (Oaxaca Elections).

Figure 16: Deviance residuals and time in which each observation was reported by the PREP. Oaxaca.
Figure 16:

Deviance residuals and time in which each observation was reported by the PREP. Oaxaca.

Figure 17: Schoenfeld residuals showing violations of proportionality assumption (Veracruz Elections, Table 6).
Figure 17:

Schoenfeld residuals showing violations of proportionality assumption (Veracruz Elections, Table 6).

Figure 18: Schoenfeld residuals for corrected model (Veracruz elections).
Figure 18:

Schoenfeld residuals for corrected model (Veracruz elections).

Figure 19: Deviance residuals and time in which each observation was reported by the PREP. Veracruz.
Figure 19:

Deviance residuals and time in which each observation was reported by the PREP. Veracruz.

Figure 20: Residuals Oaxaca (under-neighbors).
Figure 20:

Residuals Oaxaca (under-neighbors).

Figure 21: Residuals Veracruz (under-neighbors).
Figure 21:

Residuals Veracruz (under-neighbors).

Figure 22: Residuals Oaxaca (under-optimal).
Figure 22:

Residuals Oaxaca (under-optimal).

Figure 23: Residuals Veracruz (under-optimal)
Figure 23:

Residuals Veracruz (under-optimal)

6.3 Alternative Specifications

The baseline model contains the same base covariates as the model in Table 2, but does not correct for violations of non-proportionality.

Table 7 shows the results of the Cox Proportional Hazard Model prior to correcting for violations of the proportionality assumption. Overall, the direction of each covariate’s effect is as expected, and most variables attain statistical significance. However, as detailed above, because these variables violate the proportionality assumption, this model is incorrectly specified, and, therefore, inaccurate. This is further supported by comparing the plot of the Deviance Residuals of the baseline model (Figure 24) to the plot of the Deviance Residuals of the corrected model (Figure 4). In the former, the distribution of residuals is far from being symmetrically distributed around 0, tending instead towards the negative values for those units with the longest duration rates. This suggests that the model progressively underpredicts observations with the passage of time, as demonstrated by the larger residuals in the lower right quadrant of the figure. In contrast, the residuals for the time-corrected model show a more symmetric distribution of the residuals around 0, and the flatness of the loess line suggests that the corrected model overperforms the baseline model. Moreover, as seen in Table 8 and Figure 26, the Schoenfeld Residuals further support the presence of violations of the proportionality assumption in the baseline model.

Table 7:

Baseline Cox Model on the Reporting Time for the Vote Counts in Every Polling Station.

Variable Coef.
Absent poll workers −0.1924***
(0.0167)
PAN observers −0.0425***
(0.0091)
PRD observers −0.0251**
(0.0086)
PRI observers −0.0192
(0.0115)
Registered voters −0.9616***
(0.0230)
log (Driving distance) −0.6475***
(0.0033)
65-years-old −1.4157***
(0.1072)
High school −0.0688
(0.1355)
Computer −0.0720*
(0.0337)
No Spanish −0.5429**
(0.1772)
Illiteracy −1.3082***
(0.1196)
θ 0.988
I-likelihood −1069670.4
  1. ***p<0.001, **p<0.01, *p<0.05.

  2. Mexico, 2006.

Figure 24: Deviance residuals and time in which each observation was reported by the PREP. Mexico, 2006. Baseline Model. Note: The scatter plot show the relationship between the deviance residual of each observation and its correspondent time that reported the results. We denote outlying observations in light blue and define them as those with a deviance residual outside the [−3, 3] range. The plot displays the results of the baseline model that does not correct for violations of non-proportionality. Locally-weighted regression lines are shown in blue.
Figure 24:

Deviance residuals and time in which each observation was reported by the PREP. Mexico, 2006. Baseline Model. Note: The scatter plot show the relationship between the deviance residual of each observation and its correspondent time that reported the results. We denote outlying observations in light blue and define them as those with a deviance residual outside the [−3, 3] range. The plot displays the results of the baseline model that does not correct for violations of non-proportionality. Locally-weighted regression lines are shown in blue.

Figure 25: Survival function of baseline cox model. Note: The baseline model contains the same base covariates as the model in Table 2, but does not correct for violations of non-proportionality.
Figure 25:

Survival function of baseline cox model. Note: The baseline model contains the same base covariates as the model in Table 2, but does not correct for violations of non-proportionality.

Figure 26: Schoenfeld residuals for baseline model, Table 7.
Figure 26:

Schoenfeld residuals for baseline model, Table 7.

As an additional check, we rerun the hazard model with the same specification as our main model (Table 2), but with the right-censored observations included. Results of this model are shown in Table 9, and a plot of this model’s Deviance Residuals is displayed in Figure 27. The right-censored model is not directly comparable to our main model due to the inability to correct for violations of proportionality using time interactions. This is because, by definition, right-censored observations are those still present after the reporting period is over; that is, these observations do not have a measure of time with which to create the interactions. However, the Deviance Residuals display a pattern similar to that of the uncorrected, baseline model: we observe a larger clustering of residuals in the right, bottom quadrant. Compared to the plot of the Deviance Residuals of our main model, this suggests that the main model outperforms the right-censored model, and is therefore more suitable for the analyses conducted above.

Table 8:

Schoenfeld Residuals for Initial, Baseline Model.

Variable ρ χ2 p-value
Log (Registered voters) 0.073096 42.6000 0.0000
Log (Driving distance) 0.263091 627.0000 0.0000
65-years-old 0.010798 1.1900 0.2760
Illiteracy 0.04525 22.0000 0.0000
Computer −0.01912 3.9300 0.0475
No Spanish 0.035912 13.0000 0.0003
High school −0.000504 0.0026 0.9600
Global 1420.0000 0.0000
  1. Mexico, 2006.

Table 9:

Cox Model on the Reporting Time for the Vote Counts in Every Polling Station.

Variable Coef.
Absent poll workers −0.216***
(0.0198)
Registered voters −1.06
(0.0234)***
Log (Driving distance) −0.688***
(0.0036)
PAN observers −0.0262**
(0.0091)
PRD observers −0.0184*
(0.0086)
PRI observers −0.0104
(0.0115)
65-years-old 0.3364**
(0.1074)
High school −0.1162
(0.1361)
Computer 0.0974
(0.0336)
No Spanish −0.9883***
(0.1803)
Illiteracy −0.6900***
(0.1201)
θ 1.02
I-likelihood −912231.5
  1. ***p<0.001, **p<0.01, *p<0.05.

  2. Mexico, 2006.

Figure 27: Deviance residuals and time in which each observation was reported by the PREP. Mexico, 2006. Censored Model. Note: The scatter plot shows the relationship between the deviance residual of each observation and its correspondent time that reported the results. We denote outlying observations in light blue and define them as those with a deviance residual outside the [−3, 3] range. The plot displays the results from a model with the same specifications as the one found in Table 2, but includes the right censoring information with the survival time. Locally-weighted regression lines are shown in blue.
Figure 27:

Deviance residuals and time in which each observation was reported by the PREP. Mexico, 2006. Censored Model. Note: The scatter plot shows the relationship between the deviance residual of each observation and its correspondent time that reported the results. We denote outlying observations in light blue and define them as those with a deviance residual outside the [−3, 3] range. The plot displays the results from a model with the same specifications as the one found in Table 2, but includes the right censoring information with the survival time. Locally-weighted regression lines are shown in blue.

6.4 Robustness Checks

To further verify that the outliers identified in the above analyses are not effecting an undue bias on our model estimates, we conduct two additional robustness checks. The first of these is shown in Table 10. Here, Model (1) is the model used for the above analyses, whereas Model (2) excludes all outlying observations. A comparison of the two models reveals that no major differences are observed, providing evidence that the outlying polling stations are not having a disproportionate effect on our model estimates. For the second robustness check, we take the following steps. First, we drop outlying observations with 0.20 probability. Then, we rerun the model and store the new model’s estimates. We repeat this process one hundred times, each time storing the new coefficient estimates. Finally, we plot a histogram displaying the distribution of the coefficients derived from each of the one hundred simulations. Results can be seen in Figure 23A–S. In each figure, the red, vertical, dotted line represents our model’s coefficient estimate. As we can see, with the exception of illiteracy (and its interaction term), the model estimates are fairly centered around the mean of the distributions, providing further evidence that the outliers are not biasing our model estimates.

Table 10:

Cox Model on the Reporting Time for Vote Counts in all Polling Stations.

Variable Coef.
(1) (2)
Absent poll workers −0.061** −0.0917***
(0.020) (0.0198)
Registered voters 0.0075 0.118***
(0.0243) (0.0243)
Log (Driving distance) −0.175*** −0.145***
(0.0043) (0.00443)
PAN observers 14.7544*** 15.5638***
(0.1362) (0.1354)
PRD observers 8.3793*** 8.0982***
(0.1261) (0.1239)
PRI observers 23.9628*** 27.5000***
(0.1838) (0.1914)
65-years-old 114.6258*** 128.4566***
(1.2901) (1.3326)
High school 207.6447*** 303.4073***
(1.3329) (1.5291)
Computer 44.1353*** 48.0154***
(0.4896) (0.5022)
No Spanish −99.8115*** −175.8127***
(3.1297) (2.8276)
Illiteracy 227.0505*** 319.8438***
(1.0154) (1.3269)
PAN observers*log (minutes) −2.4492*** −2.6133***
(0.0223) (0.0223)
PRD observers*log (minutes) −1.3835*** −1.3532***
(0.0208) (0.0205)
PRI observers*log (minutes) −3.9264*** −4.5648***
(0.0294) (0.0309)
65-years-old*log (minutes) −19.3735*** −21.6802***
(0.2218) (0.2287)
High school*log (minutes) −35.0485*** −51.1750***
(0.2318) (0.2632)
Computer*log (minutes) −7.4468*** −8.1142***
(0.0854) (0.0873)
No Spanish*log (minutes) 16.0395*** 29.1216***
(0.5039) (0.4469)
Illiteracy*log (minutes) −37.7975*** −53.3439***
(0.1715) (0.2230)
θ 1.024 0.984
I-likelihood −912221.1 −891358.8
  1. ***p<0.001; **p<0.01; *p<0.05.

  2. Model (1) is the model used for the above analyses, whereas Model (2) excludes all outlying observations.

  3. Mexico, 2006.

Table 11:

Summary Statistics of the Variables by Cluster.

Cluster Duration time Driving time Registered voters Absent poll workers PAN agents PRD agents PRI agents Computer Population over 65 Illiteracy No Spanish
1 360.90 13.99 0.79 0.56 0.97 0.96 1.00 0.18 0.05 0.07 0.00
2 351.92 14.22 0.79 0.19 1.00 0.00 1.00 0.21 0.05 0.07 0.00
3 322.93 6.37 0.80 0.12 0.98 0.92 1.00 0.51 0.05 0.03 0.00
4 348.54 13.22 0.78 0.18 1.00 0.65 0.00 0.28 0.07 0.06 0.00
5 901.62 258.33 0.69 0.18 0.90 0.75 0.93 0.12 0.07 0.10 0.00
6 293.53 5.19 0.69 0.15 0.93 0.79 0.98 0.42 0.14 0.03 0.00
7 357.91 11.71 0.87 0.12 1.00 1.00 1.00 0.16 0.04 0.07 0.00
8 593.82 80.66 0.72 0.13 0.74 0.86 0.96 0.01 0.05 0.34 0.24
9 346.77 12.70 0.79 0.20 0.00 0.00 1.00 0.23 0.06 0.06 0.00
10 508.55 76.24 0.44 0.11 0.89 0.89 0.97 0.04 0.11 0.14 0.00
11 638.74 79.44 0.77 0.14 0.97 0.97 1.00 0.09 0.07 0.11 0.00
12 336.22 9.12 0.78 0.17 0.00 0.86 0.00 0.28 0.07 0.05 0.00
13 341.27 12.10 0.79 0.18 0.00 1.00 1.00 0.21 0.05 0.07 0.00
14 270.10 10.71 0.66 0.12 1.00 1.00 1.00 0.17 0.06 0.07 0.00
15 486.34 51.33 0.73 0.12 0.89 0.93 0.98 0.02 0.06 0.22 0.01

6.5 k-means Approach

This section shows the results of our identification strategy when we cluster observations using the k-means classifier. This approach separates observations into a given set of clusters with similar characteristics by estimating the Euclidian distance between observations. The outliers are those observations with the largest distance from the closest location of the group’s mean.

For our goal, we use this approach as follows. First, we estimate the intra-cluster distances between points and its variance to estimate the optimal number of clusters using the “elbow method.” This approach finds the number of clusters that explain most of the variance. Then we group the observations into the number of clusters defined in the previous step to then estimate the Euclidean distance between each observation and the mean of its respective cluster, labeling as outliers those observations within the largest 5% distance.

We use the same battery of variables that we employ in our survival model. In this case, we standardize our time measurements to facilitate the clustering. As Figure 29 below shows, the elbow method suggests that the optimal number of clusters is 15. We then group observations in 15 clusters to then estimate the Euclidean distance between the center of each cluster and the location of their respective observations. The summary statistics of every cluster are below. Finally, we will label as outliers those observations whose Euclidean distances with their cluster’s center are in the top five percent.

Figure 28: Comparison of coefficient estimates from Table 2 to coefficient distributions of bootstrapped model. (A) Absent poll workers; (B) registered voters; (C) driving distance; (D) PAN observers; (E) PRD observers; (F) PRI observers; (G) 65-years-old; (H) high school; (I) computer; (J) no Spanish; (K) illiteracy; (L) PAN*log(min); (M) PRD*log(min); (N) PRI*log(min); (O) 65-years-old*log(min); (P) high school*log(min); (Q) computer*log(min); (R) no Spanish*log(min); (S) illiteracy*log(min). Note: Each figure above displays the distribution of each of the Model’s Coefficients, derived from the simulations described above. The red vertical line represents the estimated coefficients from our original model.
Figure 28:

Comparison of coefficient estimates from Table 2 to coefficient distributions of bootstrapped model. (A) Absent poll workers; (B) registered voters; (C) driving distance; (D) PAN observers; (E) PRD observers; (F) PRI observers; (G) 65-years-old; (H) high school; (I) computer; (J) no Spanish; (K) illiteracy; (L) PAN*log(min); (M) PRD*log(min); (N) PRI*log(min); (O) 65-years-old*log(min); (P) high school*log(min); (Q) computer*log(min); (R) no Spanish*log(min); (S) illiteracy*log(min). Note: Each figure above displays the distribution of each of the Model’s Coefficients, derived from the simulations described above. The red vertical line represents the estimated coefficients from our original model.

Figure 29: k-means clustering sum of square errors (SSE). Note: The figure shows the sum of square errors of the first 100 clusters using the k-means classifier.
Figure 29:

k-means clustering sum of square errors (SSE). Note: The figure shows the sum of square errors of the first 100 clusters using the k-means classifier.

We replicate the analysis described in the main body of the manuscript considering only those observations classified as outliers in both the k-means approach and the deviance analysis. This leaves us to 202 observations. Figure 30 shows that the differences between these observations and their respective control units is statistically indistinguishable from zero.

Figure 30: Results. Note: The dots and horizontal lines show the mean differences and 95% confidence intervals for comparing the vote shares of (1) those observations classified as outliers by both the deviance analysis and the k-means classifier and (2) their corresponding control units.
Figure 30:

Results. Note: The dots and horizontal lines show the mean differences and 95% confidence intervals for comparing the vote shares of (1) those observations classified as outliers by both the deviance analysis and the k-means classifier and (2) their corresponding control units.

This table shows the average values of the variables for the first 15 largest clusters.

References

Alvarez, R. Michael and Jonathan N. Katz (2008) “The Case of the 2002 General Election.” In: (Michael Alvarez, R., Thad E. Hall and Susan D. Hyde, eds.) Election Fraud: Detecting and Deterring Electoral Manipulation. Washington, DC: The Brookings Institution, pp. 149–161.Search in Google Scholar

Alvarez, R. Michael, Lonna Rae Atkenson and Thad E. Hall (2013) Evaluating Elections: A Handbook of Methods and Standards. New York: Cambridge University Press.10.1017/CBO9781139226547Search in Google Scholar

Antenucci, Pedro, Juan Matías Mascioto and María Page (2017) “PASO 2017 en la provincia de Buenos Aires: el recuento provisorio explicado,” Revista SAAP, 11(2):341–364.Search in Google Scholar

Aparicio, Javier (2006) “Fraud or Human Error in Mexico’s Presidential Election?” Voices of Mexico, 77:7–10.Search in Google Scholar

Aparicio, Javier (2009) “Análisis estadstico de la elección presidencial de 2006 ?‘fraude o errores aleatorios?” Poltica y Gobierno, Volumen Tematico:225–243.Search in Google Scholar

Box-Steffensmeier, Janet M. and Bradford S. Jones (2004) Event History Modeling. Cambridge: University Press.10.1017/CBO9780511790874Search in Google Scholar

Bruhn, Kathleen and Kenneth F. Greene (2007) “Elite Polarization Meets Mass Moderation in Mexico’s 2006 Elections,” Political Science and Politics, 40(1):33–38.10.1017/S1049096507070060Search in Google Scholar

Burden, Barry C. and Jeffrey Milyo (2015) “The Quantities and Qualities of Poll Workers,” Election Law Journal, 14:38–46.10.1089/elj.2014.0277Search in Google Scholar

Calvo, Ernesto, Juan Dodyk, Marcelo Escobar, Tomás Olego and Juan Pablo Pilorget (2017) 1 Reporte: Evaluación de la Carga de Votos en las PASO 2017, Categora Senador Nacional, Provincia de Buenos Aires. Technical report.Search in Google Scholar

Cantú, Francisco (2014) “Identifying Electoral Irregularities in Mexican Local Elections,” American Journal of Political Science, 58(4):936–951.10.1111/ajps.12097Search in Google Scholar

Casas, Agustin, Guillemo Diaz and Andre Trindade (2013) “Effect of Non-Neutral Observers on Electoral Outcomes.” Working Paper.Search in Google Scholar

Christensen, Ronald (1997) Log-Linear Models and Logistic Regression. New York: Springer-Verlag.Search in Google Scholar

Crespo, José Antonio (2006) 2006: hablan las actas. Random House Mondadori.Search in Google Scholar

de Icaza-Herrera, Miguel (2006) “Signos Inequivocos de Manipulacion en el PREP.” Working paper UNAM. Working Paper.Search in Google Scholar

DeGroot, Morris H. (1984) Probability and Statistics. Second ed. Reading, MA: Addison-Wesley.Search in Google Scholar

Eisenstadt, Todd A. (2004) Courting Democracy in Mexico: Party Strategies and Electoral Institutions. New York: Cambridge University Press.10.1017/CBO9780511490910Search in Google Scholar

Eisenstadt, Todd A. (2007) “The Origins and Rationality of the ‘Legal versus Legitimate’ Dichotomy Invoked in Mexico’s 2006 Post-Electoral Conflict,” Political Science and Politics, 40(1):39–43.10.1017/S1049096507070072Search in Google Scholar

European External Action Service (2017) EU EOM Honduras 2017: Statement following the declaration of provisional results of the 2017 general elections. Press Release European Union.Search in Google Scholar

Gandrud, Christopher (2015) “simPH: An R Package for Illustrating Estimates from Cox Proportional Hazard Models Including for Interactive and Nonlinear Effects,” Journal of Statistical Software, 65(3):1–20.10.18637/jss.v065.i03Search in Google Scholar

Gibson, Edward L. (2013) Boundary Control: Subnational Authoritarianism in Federal Democracies. New York: Cambridge University Press.10.1017/CBO9781139017992Search in Google Scholar

Giraudy, Agustina (2015) Democrats and Autocrats: Patwways of Subnational Undemocratic Regime Continuity within Democratic Countries. New York: Oxford University Press.10.1093/acprof:oso/9780198706861.001.0001Search in Google Scholar

Grambsch, Patricia M. and Terry M. Therneau (1994) “Proportional Hazards Tests and Diagnostics Based on Weighted Residuals,” Biometrika, 81(3):515–526.10.1093/biomet/81.3.515Search in Google Scholar

Hall, Thad E., J. Quin Monson and Kelly D. Patterson (2008) “Poll Workers’ Job Satisfaction and Confidence.” In: (Bruce Cain, Todd Donovan and Caroline Tolbert, eds.) Democracy in the States: Experiments in Election Reform. Washington: Brooking Institution Press.Search in Google Scholar

Hausmann, Ricardo and Roberto Rigobón (2011) “In Search of the Black Swan: Analysis of the Statistical Evidence of Electoral Fraud in Venezuela,” Statistical Science, 26(4):543–563.10.1214/11-STS373Search in Google Scholar

Hawkins, Douglas M. (1980) Identification of Outliers. Netherlands: Springer.10.1007/978-94-015-3994-4Search in Google Scholar

Hernández, Rubén, Adrián Vallejo and Alejandro Vergara (2006) “Análisis de la probabilidad de cruces entre las curvas de votación acumulada del PAN y la CBT en el PREP de la Elección de Presidente de los Estados Unidos Mexicanos de 2006.” In: Desempeño del PREP en las pasadas elecciones del 2 de julio del 2006. Mexico City: Instituto Federal Electoral.Search in Google Scholar

Jiménez, Raúl and Manuel Hidalgo (2014) “Forensic Analysis of Venezuelan Elections During the Chávez Presidency,” PLoS One, 9(6):e100884.10.1371/journal.pone.0100884Search in Google Scholar

Jin, Shuai and Frederick J. Boehmke (2016) “Proper Specification of Non-proportional Hazards Corrections in Duration Models,” International Methods Colloquium Talk on March 25th, 2016.10.1017/pan.2016.16Search in Google Scholar

Keele, Luke (2010) “Proportionally Difficult: Testing for Nonproportional Hazards in Cox Models,” Political Analysis, 18:189–205.10.1093/pan/mpp044Search in Google Scholar

Keele, Luke, Sam Pimentel and Frank Yoon (2015) “Variable Ratio Matching with Fine Balance in a Study of the Peer Health Exchange,” Statistics in Medicine, 34(30):4070–4082.10.1002/sim.6593Search in Google Scholar

Larreguy, Horacio, John Marshall and Pablo Querubin (2016) “Parties, Brokers and Voter Mobilization: How Turnout Buying Depends Upon the Party’s Capacity to Monitor Brokers,” American Political Science Review, 10(1):160–179.10.1017/S0003055415000593Search in Google Scholar

Lawson, Chappell (2007) “How Did We Get Here? Mexican Democracy after the 2006 Elections,” Political Science and Politics, 40(1):45–48.10.1017/S1049096507070084Search in Google Scholar

López Gallardo, Jorge Alberto (2009) 2006 ?‘Fraude Electoral? Chihuahua: Doble Hélice.Search in Google Scholar

Mebane, Walter R. (2015) “Election Forensics Toolkit.” DRG Center Working Paper.Search in Google Scholar

Mochán, W. Luis (2006) “Incertidumbre y errores en las elecciones de julio del 2006.” Working paper Centro de Ciencias Físicas, UNAM.Search in Google Scholar

Mozaffar, Shaheen and Andreas Schedler (2002) “The Comparative Study of Electoral Governance–Introduction,” International Political Science Review, 23(5):5–27.10.1177/0192512102023001001Search in Google Scholar

Norris, Pippa, Ferran Martnez i Coma and Richard W. Frank (2016) “The Expert Survey of Perceptions of Electoral Integrity, PEI 4.0,” https://dataverse.harvard.edu/dataverse/PEI.Search in Google Scholar

Pastor, Robert A. (1999) “The Role of Electoral Administration in Democratic Transitions: Implications for Policy and Research,” Democratization, 6(4):1–27.10.1080/13510349908403630Search in Google Scholar

Pimentel, Samuel D. (2016) “Large, Sparse Optimal Matching with R package rcbalance,” Observational Studies, 2:4–23.10.1353/obs.2016.0006Search in Google Scholar

Pliego Carrasco, Fernando (2007) El Mito del Fraude Electoral en México. Mexico City: Editorial Pax.Search in Google Scholar

Poiré, Alejandro and Luis Estrada (2006) “Allegations of Fraud in Mexico’s 2006 Presidential Election,” Paper presented at the 102th meeting of the American Political Science Association.Search in Google Scholar

Rochín, Victor Romero (2006) “Una Opinión sobre el PREP y el Conteo Distrital.” Working paper UNAM.Search in Google Scholar

Schoenberg, Isaac Jacob (1946) “Contributions to the Problem of Approximation of Equidistant Data by Analytic Functions. Part A. On the Problem of Smoothing or Graduation. A First Class of Analytic Approximation Formulae,” Quarterly of Applied Mathematics, 4(1):45–99.10.1090/qam/15914Search in Google Scholar

Tello, Carlos (2007) 2 de Julio: La Crónica Minuto a Minuto Del Da Más Importante de Nuestra Historia Contemporánea. Mexico City: Editorial Planeta Mexicana.Search in Google Scholar

Therneau, Terry M. (2015) A Package for Survival Analysis in S. http://CRAN.R-project.org/package=survival.Search in Google Scholar

Therneau, Terry M. and Patricia M. Grambsch (2000) Modeling Survival Data. New York: Springer.Search in Google Scholar

Tuckman, Jo (2012) Mexico: Democracy Interrupted. New Haven: Yale University Press.Search in Google Scholar

Wand, Jonathan N., Kenneth W. Shotts, Jasjeet S. Sekhon, Walter R. Mebane, Jr, Michael Herron and Henry E. Brady (2001) “The Butterfly Did It: The Aberrant Vote for Buchanan in Palm Beach County, Florida,” American Political Science Review, 95(4):793–810.10.1017/S000305540040002XSearch in Google Scholar

Published Online: 2019-11-21
Published in Print: 2019-12-18

©2019 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 27.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/spp-2019-0001/html
Scroll to top button