Robust baseball pitch reconstruction using artificial neural networks with noisy data

Ryan D. DeBoskey; Veeraraghava Raju Hasti; Venkateswaran Narayanaswamy

doi:10.1515/jqas-2025-0045

Article

Robust baseball pitch reconstruction using artificial neural networks with noisy data

Ryan D. DeBoskey , Veeraraghava Raju Hasti and Venkateswaran Narayanaswamy

Published/Copyright: January 22, 2026

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Journal of Quantitative Analysis in Sports

Abstract

The use of high-fidelity camera-vision and Doppler tracking systems in Major League Baseball (MLB) has created an influx of advanced analytics that have transformed game and personnel strategy. However, due to the cost and complexity of these systems, advanced statistics are severely limited in most amateur games. In this work, we develop a robust pitch reconstruction methodology using artificial neural networks (ANN) and a three-point reconstruction technique, requiring the knowledge of only three baseball spatiotemporal locations. The ANN models were trained to predict the initial baseball speed and spin rate, which are then used as initial conditions to integrate the full pitch trajectory. We use numerically simulated baseball pitches to train two ANN models, each with clean and noisy training inputs, respectively. We then performed a robustness analysis to test ANN performance with increasingly noisy data to simulate low-fidelity camera tracking systems. Probability distributions of predicted model output are calculated using the Monte Carlo method to quantify model uncertainty. We show that the ANN models accurately predict the true baseball trajectory from both high-fidelity and noise-injected testing data. The results demonstrate the effectiveness of ANN models for quick and robust pitch trajectory reconstruction using minimal input data, even in the presence of noise. The present methodology provides a first step towards enabling pitch tracking and advanced analytics in a wider variety of baseball games.

Keywords: baseball; pitch reconstruction; artificial neural network; robustness

Corresponding author: Veeraraghava Raju Hasti, School of Modeling, Simulation, and Training, College of Engineering and Computer Science, University of Central Florida, Orlando, FL, USA, E-mail: vhasti@ucf.edu

Research ethics: Not applicable.
Informed consent: Not applicable as there were no human subjects and IRB approval is not applicable.
Author contributions: R. DeBoskey – Conceptualization, Methodology, Software, Visualization, Writing – original draft, Writing – reviewing and editing. V. Hasti – Conceptualization, Methodology, Software, Resources, Supervision, Writing – reviewing and editing. V. Narayanaswamy – Supervision, Resources, Writing – reviewing and editing.
Use of Large Language Models, AI and Machine Learning Tools: None declared.
Conflict of interest: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Research funding: This research did not receive any specific grants from funding agencies in the public, commercial, or not-for-profit sectors. Ryan DeBoskey was supported by the National Defense Science and Engineering Graduate Fellowship.
Data availability: Data will be made available upon reasonable request.

Appendix A: Baseball trajectory model

Equation (1) is the governing equation for baseball trajectory, derived from Newton’s second law of motion. The equation is a simplified version derived from Aguirre-López et al. (2018), which assumed the centrifugal and Coriolis body forces are both negligible. The centrifugal and Coriolis forces are approximately two orders of magnitude smaller than the magnus force and are generally negligible to trajectory calculation (Robinson and Robinson 2017). The drag force, f_D, was given by Giordano (1996):

(9) f D = − 0.0039 + 0.0058 1 + exp ( ( v − v d ) / Δ ) v ⋅ v

where v is the velocity vector, v is the velocity magnitude, and v_d = 35 m/s and Δ = 5 m/s are calibrated model constants. The magnus force, f_M, is present and plays a prominent effect in the physics of ball trajectory in many sports (Kray et al. 2014; Sayers and Hill 1999). The force is oriented orthogonal to the spin axis and velocity, resulting in the well-known curving of baseball pitches. By Bernoulli’s principle, the net force was given by the cross product (×) of velocity about the spin axis:

(10) f M = B ( ω × v )

where B = 0.00041 is a non-dimensional scaling factor given by Giordano (1996) for baseballs. We assumed a constant acceleration due to gravity, g = −9.81 m/s², in the z-direction. We solved Equation (1) using 4th-order Runge–Kutta time integration (Lomax et al. 2001) with a fixed time step, Δt = 2.083 ms.

Several simplifying assumptions were made in the current work. First, we neglected the orientation of the baseball seams. Extremely complicated physics, including the seam-shifted wake phenomenon (Smith and Smith 2021), arise from the transition and break-up of the boundary that develops over the ball due to the presence and orientation of the seams on an otherwise smooth sphere (Higuchi and Kiura 2012). This results in small fluctuations to the baseball trajectory, which may not be negligible compared to magnus forces (Borg and Morrissey 2014; Higuchi and Kiura 2012; Smith and Smith 2021).

Secondly, the batch simulations assume a fixed spin axis, ϕ = 45°, fixed launch angle α = 1°, and constant release point, x _o = (0, 0, 0). In addition to the speed and spin rate, the spin axis is very important for pitch classification (the constant angles of ϕ = 45° and α = 1° used in the present study correspond roughly with a curveball from a right-hand pitcher). The magnus force is directed orthogonal to the spin axis and velocity (via Eq. (10)) which causes the breaking direction of the pitch. With differing amounts of break and velocity, the pitcher must adjust the launch angle accordingly between pitches. The fixed angles and constant release point were used in the present work to reduce dimensionality to demonstrate the use of an ANN as a proof-of-concept for pitch reconstruction. Higher dimensional prediction output (necessary to predict spin axis, launch angle, and initial condition) requires a much larger ANN model and substantially more training data (curse of dimensionality), which is outside the scope of the present work. The spatiotemporal release point of the baseball must be known to determine the relative deflection of the pitch. We assumed a known release point in this paper, which in practice must be determined by the system to fully reconstruct trajectory. This introduces an additional data point (and associated uncertainty) to collect and process within a real-world pitch tracking system.

Lastly, we sampled the training data for the ANN at a fixed constant interval (t₁ = 0.467 s, t₂ = 0.483, and t₃ = 0.5s). The selection of time interval in the present work was arbitrary, where t₃ is the (approximate) time required for an 80 mph pitch to reach home plate without consideration of the pitcher’s extension from the rubber. Sampling at the same three time points (relative to release) cannot be realized in practice, particularly for a low-cost vision tracking system. The relative time from release would need to be varied for a real-world system, providing another layer of complexity.

Appendix B: ANN sensitivity study

We performed a sensitivity study to determine the training batch size, activation function for the hidden layers, and number of hidden layers to be used by the final model. Table 8 summarizes the range of hyperparameters tested in this sensitivity study. We used minimization of the mean-squared training and testing loss as the parameter to evaluate the ANN performance. The sensitivity study was performed only on the Quiet ANN model, as a larger number of training epochs was required to train this model. The decrease in time required to train the Noisy ANN model was indicative of a less complex mapping due to lower-fidelity input data.

Table 8:

Summary of ANN hyperparameter sensitivity study.

Hyperparameter	A	B	C	D
Batch size	10	20	30	40
Activation function	sigmoid	tanh	relu	leakyrelu
Hidden layers	1	2	3	4

Figures 13–15 show the results of the sensitivity analysis for batch size, activation function, and number of hidden layers, respectively. For all plots, the logarithm of the testing and training loss was taken to better visualize the relative differences between model parameters. The ANN training time was largely insensitive to batch size. Although the loss decreases significantly with fewer epochs, a smaller batch size required more computation as the network parameters were updated more frequently. A batch size of 20 is taken in the final model architecture, as a compromise between number of epochs and speed of training. The ANN was insensitive to activation function and we used the ReLu activation function, R(ζ), in the hidden layers, represented mathematically as:

(11) R ( ζ ) = max ( 0 , ζ )

Figure 13:

Sensitivity study on batch size showing its effect on the training and testing MSE loss (Eq. (3)) versus epoch for the Quiet ANN model.

Figure 14:

Sensitivity study on activation function showing its effect on the training and testing MSE loss (Eq. (3)) versus epoch for the Quiet ANN model.

Figure 15:

Sensitivity study on number of hidden layers showing its effect on the training and testing MSE loss (Eq. (3)) versus epoch for the Quiet ANN model.

The ANN requires 2 hidden layers or greater to accurately model the nonlinear relationship of velocity and spin rate. Once the model is larger than 2 hidden layers it was relatively insensitive to an increasing number of layers. To ensure that the model was able to capture the nonlinear trends, the final model architecture contains 3 hidden layers.

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., and Devin, M. (2016). Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.Search in Google Scholar

Aguirre-López, M.A., Morales-Castillo, J., Díaz-Hernández, O., Santos, G.J.E., and Almaguer, F.-J. (2018). Trajectories reconstruction of spinning baseball pitches by three-point-based algorithm. Appl. Math. Comput. 319: 2–12, https://doi.org/10.1016/j.amc.2017.01.016.Search in Google Scholar

An, P.-Y. (2024). Sports broadcasting: how big data technology impacts the viewer experience in baseball broadcasting. Eng. Proc. 74, https://doi.org/10.3390/engproc2024074060.Search in Google Scholar

Borg, J.P. and Morrissey, M.P. (2014). Aerodynamics of the knuckleball pitch: experimental measurements on slowly rotating baseballs. Am. J. Phys. 82: 921–927, https://doi.org/10.1119/1.4885341.Search in Google Scholar

Collins, H. and Evans, R. (2008). You cannot be serious! public understanding of technology with special reference to “hawk-eye”. Publ. Understand. Sci. 17: 283–308, https://doi.org/10.1177/0963662508093370.Search in Google Scholar

Deshpande, S.K. and Wyner, A. (2017). A hierarchical bayesian model of pitch framing. J. Quant. Anal. Sports 13: 95–112, https://doi.org/10.1515/jqas-2017-0027.Search in Google Scholar

Escalera Santos, G.J., Aguirre-López, M.A., Díaz-Hernández, O., Hueyotl-Zahuantitla, F., Morales-Castillo, J., Almaguer, F. (2019). On the aerodynamic forces on a baseball, with applications. Front. Appl. Math. Statistics 4: 379640, https://doi.org/10.3389/fams.2018.00066.Search in Google Scholar

Fawzi, A., Moosavi-Dezfooli, S.-M., and Frossard, P. (2016). Robustness of classifiers: from adversarial to random noise. Adv. Neural Inf. Process. Syst. 29.Search in Google Scholar

Fawzi, A., Moosavi-Dezfooli, S.-M., and Frossard, P. (2017). The robustness of deep networks: a geometrical perspective. IEEE Signal Process. Mag. 34: 50–62. https://doi.org/10.1109/MSP.2017.2740965.Search in Google Scholar

Giordano, N.J. (1996). Computational physics. Prentice Hall, Hoboken, New Jersey.Search in Google Scholar

He, Z., Rakin, A.S., and Fan, D. (2019). Parametric noise injection: trainable randomness to improve deep neural network robustness against adversarial attack. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Computer Vision Foundation, New York, NY, pp. 588–597.Search in Google Scholar

Healey, G. (2017). The new moneyball: how ballpark sensors are changing baseball. Proc. IEEE 105: 1999–2002, https://doi.org/10.1109/JPROC.2017.2756740.Search in Google Scholar

Higuchi, H. and Kiura, T. (2012). Aerodynamics of knuckle ball: Flow-structure interaction problem on a pitched baseball without spin. J. Fluids Struct. 32: 65–77, https://doi.org/10.1016/j.jfluidstructs.2012.01.004.Search in Google Scholar

Hintz, E.S. (2022). Moneyball: the computational turn in professional sports management. In: Papers of the business history conference. https://par.nsf.gov/biblio/10346961.Search in Google Scholar

Hsieh, J. (2024). Neural network-based tracking and 3d reconstruction of baseball pitch trajectories from single-view 2d video, https://arxiv.org/abs/2405.16296.Search in Google Scholar

Huang, M.-L. and Li, Y.-Z. (2021). Use of machine learning and deep learning to predict the outcomes of major league baseball matches. Appl. Sci. 11: 4499, https://doi.org/10.3390/app11104499.Search in Google Scholar

Jeong, K.-S., Kim, J.-H., and Han, Y.-H. (2017). A prediction of baseball game results using recurrent neural netowrks. In: Proceedings of the Korea information processing society conference. Korea Information Processing Society, Seoul, South Korea, pp. 873–876.Search in Google Scholar

Kim, J., Ra, M., Lee, H., Kim, J., and Kim, W.-Y. (2019). Precise 3d baseball pitching trajectory estimation using multiple unsynchronized cameras. IEEE Access 7: 166463–166475. https://doi.org/0.1109/ACCESS.2019.2953340.Search in Google Scholar

Kingma, D.P. and Ba, J. (2014). Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980.Search in Google Scholar

Koseler, K. and Stephan, M. (2017). Machine learning applications in baseball: a systematic literature review. Appl. Artif. Intell. 31: 745–763, https://doi.org/10.1080/08839514.2018.1442991.Search in Google Scholar

Kray, T., Franke, J., and Frank, W. (2014). Magnus effect on a rotating soccer ball at high reynolds numbers. J. Wind Eng. Ind. Aerod. 124: 46–53, https://doi.org/10.1016/j.jweia.2013.10.010.Search in Google Scholar

Lee, J.S. (2022). Prediction of pitch type and location in baseball using ensemble model of deep neural networks. J. Sports Anal. 8: 115–126, https://doi.org/10.3233/JSA-200559.Search in Google Scholar

Lomax, H., Pulliam, T.H., and Zingg, D.W. (2001). Fundamentals of computational fluid dynamics (scientific computation), 2001st ed. Springer-Verlag, Berlin, Germany.Search in Google Scholar

Metropolis, N. and Ulam, S. (1949). The monte carlo method. J. Am. Stat. Assoc. 44: 335–341, https://doi.org/10.1080/01621459.1949.10483310.Search in Google Scholar PubMed

Nathan, A.M. (2007). Analysis of pitchf/x pitched baseball trajectories. In: The Physics of Baseball. University of Illinois,Champaign, IL, https://baseball.physics.illinois.edu/Analysis.pdf.Search in Google Scholar

Nickels, K. and Hutchinson, S. (2002). Estimating uncertainty in SSD-based feature tracking. Image Vis. Comput. 20: 47–58, https://doi.org/10.1016/S0262-8856(01)00076-2.Search in Google Scholar

Park, D.J., Kim, B.W., Jeong, Y.-S., Ahn, C.W., and Jeong, Y.S. (2018). Deep neural network based prediction of daily spectators for korean baseball league: focused on gwangju-kia champions field. Smart Media J. 7: 16–23, https://doi.org/10.30693/SMJ.2018.7.1.16.Search in Google Scholar

Park, J. and Park, S. (2017). A study on prediction of attendance in korean baseball league using artificial neural network. KIPS Trans. Software Data Eng. 6: 565–572, https://doi.org/10.3745/KTSDE.2017.6.12.565.Search in Google Scholar

Rahimian, P. and Toka, L. (2022). Optical tracking in team sports: a survey on player and ball tracking methods in soccer and other team sports. J. Quant. Anal. Sports 18: 35–57, https://doi.org/10.1515/jqas-2020-0088.Search in Google Scholar

Robinson, G. and Robinson, I. (2017). Are inertial forces ever of significance in cricket, golf and other sports? Phys. Scr. 92: 043001, https://doi.org/10.1088/1402-4896/aa634e.Search in Google Scholar

Rubinstein, R.Y. and Kroese, D.P. (2016). Simulation and the Monte Carlo method. John Wiley & Sons, Hoboken, New Jersey.Search in Google Scholar

Santos-Fernandez, E., Wu, P., and Mengersen, K.L. (2019). Bayesian statistics meets sports: a comprehensive review. J. Quant. Anal. Sports 15: 289–312, https://doi.org/10.1515/jqas-2018-0106.Search in Google Scholar

Sayers, A. and Hill, A. (1999). Aerodynamics of a cricket ball. J. Wind Eng. Ind. Aerod. 79: 169–182, https://doi.org/10.1016/S0167-6105(97)00299-7.Search in Google Scholar

Shum, H. and Komura, T. (2004). A spatiotemporal approach to extract the 3d trajectory of the baseball from a single view video sequence. In: 2004 IEEE International conference on multimedia and expo, 3. IEEE, pp. 1583–1586.Search in Google Scholar

Shum, H. and Komura, T. (2005). Tracking the translational and rotational movement of the ball using high-speed camera movies. In: IEEE International conference on image processing 2005, 3. IEEE, p. III-1084.Search in Google Scholar

Sietsma, J. and Dow, R.J. (1991). Creating artificial neural networks that generalize. Neural Netw. 4: 67–79, https://doi.org/10.1016/0893-6080(91)90033-2.Search in Google Scholar

Smith, L. and Downey, J. (2009). Predicting baseball hall of fame membership using a radial basis function network. J. Quant. Anal. Sports 5, https://doi.org/10.2202/1559-0410.1157.Search in Google Scholar

Smith, A.W. and Smith, B.L. (2021). Using baseball seams to alter a pitch direction: the seam shifted wake. Proc. Inst. Mech. Eng. - Part P: J. Sports Eng. Technol. 235: 21–28, https://doi.org/10.1177/1754337120961609.Search in Google Scholar

Theobalt, C., Albrecht, I., Haber, J., Magnor, M., and Seidel, H.-P. (2004). Pitching a baseball: tracking high-speed motion with multi-exposure images. In: ACM SIGGRAPH 2004 papers. Association for Computing Machinery, Los Angeles, California, pp. 540–547.Search in Google Scholar

Umemura, K., Yanai, T., and Nagata, Y. (2021). Application of vbgmm for pitch type classification: analysis of trackman’s pitch tracking data. Jpn. J. Stat. Data Sci. 4: 41–71, https://doi.org/10.1007/s42081-020-00079-8.Search in Google Scholar

Weller, E. (2020). The data revolution: an examination of the use of scouting and analytics in major league baseball front offices, https://egrove.olemiss.edu/hon_thesis/1306.Search in Google Scholar

Willman, D. (2023). Baseball savant, https://baseballsavant.mlb.com.Search in Google Scholar

Young, W.A., Holland, W.S., and Weckman, G.R. (2008). Determining hall of fame status for major league baseball using an artificial neural network. J. Quant. Anal. Sports 4, https://doi.org/10.2202/1559-0410.1131.Search in Google Scholar

Zou, J., Han, Y., and So, S. (2009). Overview of artificial neural networks. Artif. Neural Net.: Methods Appl.: 14–22, https://doi.org/10.1007/978-1-60327-101-1_2.Search in Google Scholar PubMed

Received: 2025-02-21

Accepted: 2025-11-26

Published Online: 2026-01-22

You are currently not able to access this content.

https://doi.org/10.1515/jqas-2025-0045

Keywords for this article

baseball; pitch reconstruction; artificial neural network; robustness