Statistical, machine learning, and deep learning models for COVID-19 forecasting in Kenya

Joyce Kiarie; Samuel Musili Mwalili; Rachel Mbogo; John Kamwele Mutinda; Amos Kipkorir Langat

doi:10.1515/cmb-2025-0026

Article Open Access

Statistical, machine learning, and deep learning models for COVID-19 forecasting in Kenya

Joyce Kiarie , Samuel Musili Mwalili , Rachel Mbogo , John Kamwele Mutinda and Amos Kipkorir Langat

Published/Copyright: September 15, 2025

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Computational and Mathematical Biophysics Volume 13 Issue 1

Abstract

This study aims to enhance coronavirus disease 2019 forecasting in Kenya by comparing the predictive performance of statistical, machine learning, and deep learning (DL) models for total cases, critical cases, severe cases, and total deaths, using data from April 2020 to August 2021. Six models – autoregressive integrated moving average (ARIMA), support vector regression, random forest (RF), recurrent neural network, long short-term memory, and gated recurrent unit – were evaluated with an 80–20 train-test split, employing root mean squared error, mean absolute error, mean absolute percentage error, and R 2 metrics. The Diebold-Mariano (DM) test assessed statistical significance of error differences. Results reveal RF as the top performer, consistently achieving the lowest errors and highest R 2 across all datasets, indicating superior accuracy in capturing nonlinear epidemic patterns. GRU outperformed other DL models, while ARIMA showed the weakest performance. The DM test confirmed significant differences in forecasting errors, with RF generally outperforming other models.

Keywords: COVID-19 forecasting; random forest; ARIMA; GRU; machine learning; deep learning; Kenya

MSC 2010: 62M10; 62P10

Nomenclature

LSTM: long short-term memory
GRU: gated recurrent unit
RNN: recurrent neural network
ARIMA: autoregressive integrated moving average
SVR: support vector regression
RF: random forest
AIC: Akaike information criterion
BIC: Bayesian information criterion
PACF: partial autocorrelation function
ACF: autocorrelation function
RMSE: root mean square error
MAE: mean absolute error
MAPE: mean absolute percentage error
R ²: coefficient of determination
ML: machine learning
DL: deep learning
COVID-19: coronavirus disease 2019

1 Introduction

The coronavirus disease 2019 (COVID-19) pandemic, initiated by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus in late 2019, precipitated an unprecedented global health and economic crisis, with profound effects on diverse regions, including Africa [7,20,22,43]. In Kenya, the virus challenged a healthcare system already constrained by limited infrastructure, personnel, and financial resources, exacerbating vulnerabilities during multiple infection waves. The Kenyan government responded with measures such as lockdowns, testing campaigns, and vaccination drives, but their success hinged on accurate forecasting to inform timely resource allocation and policy interventions [17,30,31]. Accurate predictions of epidemic trends, including case numbers and mortality rates, are essential for optimizing healthcare preparedness, prioritizing interventions, and mitigating socioeconomic impacts [15,23,34,35,38]. Beyond Kenya, the pandemic underscored the global need for robust forecasting models to manage infectious diseases, particularly in low-resource settings where data and computational limitations pose significant challenges.

Epidemic forecasting leverages diverse modeling approaches to capture the complex dynamics of infectious diseases. Statistical models, such as the autoregressive integrated moving average (ARIMA), are valued for their simplicity and ability to model linear trends and seasonality in time series data [4,18,21]. However, their reliance on linear assumptions limits their effectiveness in capturing the nonlinear, volatile patterns typical of pandemics. Machine learning (ML) models, such as support vector regression (SVR) and random forest (RF), address these limitations by modeling complex, nonlinear relationships, offering improved predictive accuracy [12,14,25,29,32,33,36,44]. Deep learning (DL) models, including recurrent neural networks (RNN), long short-term memory (LSTM), and gated recurrent units (GRU), excel in capturing temporal dependencies in sequential data, making them well suited for epidemic time series [1,2,42]. These advanced models have informed global health strategies, particularly in resource-limited settings where efficient forecasting can optimize scarce resources [24,28].

A study by Arora et al. [2] employed DL models, specifically RNN-based LSTM variants (deep LSTM, convolutional LSTM, and bi-directional LSTM), to predict COVID-19-positive cases across 32 Indian states and union territories, selecting the LSTM model with the lowest error for daily and weekly forecasts. Sinha et al. [40] forecasted COVID-19-confirmed cases in the USA, India, Brazil, Russia, and France, comparing artificial neural network (ANN) and RNN-based long LSTM models, using mean squared error for validation. The main finding was that LSTM outperformed ANN, indicating higher accuracy for epidemic predictions. These results suggest LSTM’s potential for informing timely public health interventions in highly affected countries. Ghafouri-Fard et al. [10] used ML and DL methods, including adaptive neuro-fuzzy inference system (ANFIS), LSTM, RNN, multilayer perceptron (MLP), and ARIMA, to predict COVID-19 case trends. It compares model performance using root-mean-squared error, mean absolute error (MAE), mean absolute percentage error (MAPE), and R 2 , finding bidirectional LSTM, ANFIS, and MLP achieves high accuracy with R 2 values near 1, while ARIMA and LSTM show higher MAPE. Ibrahim et al. [16] employed ANN, ANFIS, SVM, multiple linear regression (MLR), and ensemble approaches (ANN-Ensemble, SVM-Ensemble) to predict daily COVID-19 cases in ten African countries, addressing limited healthcare infrastructure. The findings showed that ANFIS outperformed standalone models, while ANN-Ensemble and SVM-Ensemble significantly improved accuracy, achieving up to 83% enhancement with R 2 = 0.9616 . Sujath et al. [41] utilized linear regression, MLP, and vector autoregression models to predict the epidemiological trend and rate of COVID-19 cases in India using Kaggle data. The findings forecasted potential patterns of confirmed, death, and recovered cases, providing insights into the disease’s near-future spread, Wang et al.’s [45] study applied ARIMA, SARIMA, and Prophet models to predict daily new and cumulative COVID-19 cases in the USA, Brazil, and India. The Prophet model excelled in forecasting daily new cases in the USA, capturing periodic trends, while ARIMA performed better for cumulative cases in Brazil and India, with performance evaluated via RMSE, MAE, and MAPE. These findings inform outbreak trends and support epidemiological control and policy formulation in affected regions. Xu et al. [46] developed convolutional neural network (CNN), LSTM, and CNN-LSTM models to forecast COVID-19 cases in Brazil, India, and Russia, addressing the global health crisis caused by SARS-CoV-2 mutations. The LSTM model achieved the highest forecasting accuracy, outperforming existing DL models, with performance comparisons showing significant improvements. Cumbane and Gidófalvi [9] developed a multi-layer BiLSTM model incorporating mobility, temperature, and humidity data to predict daily COVID-19 cases in low-income countries (Mozambique, Rwanda, Nepal, Myanmar) and other nations (Italy, Turkey, Australia, Brazil, Canada, Egypt, Japan, UK). The BiLSTM model outperformed multi-layer LSTM, ARIMA, and stacked LSTM models, achieving up to 1.6 times lower RMSE and 0.6 times lower average relative error. It also excelled at city-level forecasting in six Japanese regions (Tokyo, Aichi, Osaka, Hyogo, Kyoto, and Fukuoka) over seven 28-day periods.

Rustam et al. [37] applied ML models – linear regression (LR), least absolute shrinkage and selection operator (LASSO), support SVM, and exponential smoothing (ES) – to forecast COVID-19 cases, deaths, and recoveries over 10 days. ES outperformed other models, followed by LR and LASSO, while SVM showed poor performance across all prediction scenarios. The findings highlight the potential of ML, particularly ES, for effective COVID-19 forecasting to support decision-making.

This study contributes in epidemic forecasting by evaluating a range of forecasting models to predict COVID-19 trends in Kenya, focusing on total cases, critical cases, severe cases, and total deaths, using data from April 2020 to August 2021, to enhance epidemic preparedness in resource-constrained environments. The study conducts a comparative analysis of six models – ARIMA, SVR, RF, RNN, LSTM, and GRU – to forecast COVID-19 trends in Kenya, evaluating their performance across multiple dimensions to identify the most effective approach for public health applications.

The methodology employs an evaluation framework to compare model performance. Data from April 2020 to August 2021 are split into an 80:20 train-test ratio to ensure robust validation. Four evaluation metrics – RMSE, MAE, MAPE, and the coefficient of determination ( R 2 ) – are used to assess predictive accuracy comprehensively. The Diebold-Mariano (DM) test is applied to statistically evaluate differences in forecasting errors across models, using MSE, MAE, and MAPE as loss functions, providing a robust measure of comparative performance. This study contributes to epidemic forecasting in the following ways:

Comparative evaluation of six models – ARIMA, SVR, RF, RNN, LSTM, and GRU – for COVID-19 forecasting in Kenya, using RMSE, MAE, MAPE, and R 2 to identify RF as the most effective model for resource-constrained settings.
Application of the DM test with MSE, MAE, and MAPE as loss functions to confirm statistically significant differences in forecasting performance, highlighting RF’s superior accuracy.
Demonstration of RF’s effectiveness in predicting total cases, critical cases, severe cases, and total deaths, offering practical insights for public health forecasting.

This article is organized as follows: Section 2 details the dataset and methodology, including model specifications, evaluation metrics, and the DM test. Section 3 presents the results, including descriptive statistics, forecasting outcomes, and robustness checks, supported by visualizations such as time series plots and heatmaps. Section 4 discusses the findings, emphasizing model strengths and limitations. Section 5 concludes with key insights and implications for epidemic preparedness in Kenya and similar contexts. Section 6 addresses study limitations and suggests future research directions, such as incorporating external features and cross-validation.

2 Data and methods

2.1 Data

The dataset, obtained from the Ministry of Health website in Kenya, spans from April 15, 2020, to August 26, 2021, and includes four columns: total cases, severe cases, critical cases, and total deaths [27]. It provides daily records of these metrics, capturing the progression and impact of the COVID-19 pandemic. The analysis aims to model and predict these variables using time series forecasting and ML techniques to address the ongoing global health crisis.

2.1.1 ARIMA

ARIMA is a statistical model for forecasting univariate time series with autocorrelation and non-stationarity, integrating autoregressive (AR), differencing (I), and moving average (MA) components [13,19]. The AR component links current observations to past values, while the MA component models dependencies on past forecast errors. Differencing achieves stationarity. The ARIMA( p , d , q ) model, where p is the AR order, d is the differencing order, and q is the MA order, is expressed as follows:

(1) Δ d Y t = μ + ∑ i = 1 p ϕ i Δ d Y t − i + ε t + ∑ j = 1 q θ j ε t − j ,

where μ is a constant, ϕ i are the AR coefficients, θ j are the MA coefficients, and ε t is the white noise. The autocorrelation function (ACF) and partial autocorrelation function (PACF) guide p and q selection, with PACF cutoffs indicating AR processes and ACF cutoffs suggesting MA processes.

2.1.2 SVR

SVR is an extension of the SVM framework for regression tasks. It aims to find a function f ( x ) that deviates from the true target values y i by no more than a small margin ε , while keeping the model as flat as possible [3,11]. The function is defined as follows:

(2) f ( x ) = ⟨ w , x ⟩ + b ,

where w is the weight vector, x is the input vector, and b is the bias term. To enforce flatness, the norm ‖ w ‖ 2 is minimized. This leads to the following primal optimization problem:

(3) min w , b 1 2 ‖ w ‖ 2 .

subject to the constraint:

(4) ∣ y i − ⟨ w , x i ⟩ − b ∣ ≤ ε , ∀ i = 1 , … , n .

To allow for infeasible constraints due to noise or outliers, slack variables ξ i , ξ i * ≥ 0 are introduced. The problem becomes

(5) min w , b , ξ i , ξ i * 1 2 ‖ w ‖ 2 + C ∑ i = 1 n ( ξ i + ξ i * )

subject to

(6) y i − ⟨ w , x i ⟩ − b ≤ ε + ξ i ,

(7) ⟨ w , x i ⟩ + b − y i ≤ ε + ξ i * ,

(8) ξ i , ξ i * ≥ 0 .

Here, C > 0 is a regularization parameter that balances the model complexity and the amount up to which deviations larger than ε are tolerated.

Several commonly used kernels in SVR include the following:

Linear kernel:

(9) K ( x , x ′ ) = ⟨ x , x ′ ⟩ ,

Polynomial kernel:

(10) K ( x , x ′ ) = ( ⟨ x , x ′ ⟩ + c ) d ,

Radial basis function (RBF) kernel:

(11) K ( x , x ′ ) = exp ( − γ ‖ x − x ′ ‖ 2 ) ,

where c is a constant, d is the degree of the polynomial, and γ controls the width of the RBF kernel.

Regularization through the parameter C controls the trade-off between model complexity and tolerance to errors. A large C assigns a higher penalty to errors, leading to less margin violation, while a small C allows for a more generalized model by tolerating larger deviations. The choice of kernel and regularization parameters significantly affects the SVR model’s ability to capture nonlinear relationships and its generalization performance.

2.1.3 RF

RF is an ensemble learning algorithm that builds multiple decision trees and aggregates their outputs to perform regression. It was introduced by Breiman [5] and is particularly effective in handling high-dimensional data, nonlinear relationships, and reducing overfitting.

In regression settings, the goal is to predict a continuous target variable y ∈ R based on an input vector x ∈ R d . RF achieves this by constructing a collection of T decision trees, each trained independently on a different subset of the training data. The final prediction for a new input x is obtained by averaging the predictions from all individual trees:

(12) y ˆ = 1 T ∑ t = 1 T h t ( x ) ,

where h t ( x ) denotes the prediction made by the t th decision tree.

Each tree is trained on a bootstrap sample, which is created by randomly sampling (with replacement) from the original training dataset. This introduces variability among the trees. Additionally, during the tree-building process, RF introduces further randomness by selecting a random subset of features at each split rather than considering all features. This decorrelates the trees and enhances generalization.

At each node of a decision tree, the algorithm selects the best split point among the randomly chosen features based on a splitting criterion that minimizes prediction error. For regression, the most commonly used criterion is the mean-squared error between predicted and actual values in the resulting child nodes. This local optimization helps guide the recursive partitioning of the input space.

Unlike a single decision tree, which may overfit the training data, RF reduces the variance of predictions by averaging across many trees. The ensemble effect ensures that individual overfitting errors are averaged out, leading to more robust and accurate predictions.

2.1.4 RNN

RNNs are a class of neural architectures tailored for sequential data modeling. They maintain a hidden state h t that evolves over time by incorporating information from both the current input x t and the previous hidden state h t − 1 , thereby capturing temporal dependencies [26]. The hidden state is updated using the recurrence relation:

(13) h t = σ ( W h h t − 1 + W x x t + b h ) ,

where W h and W x are the weight matrices corresponding to the hidden state and the input, respectively, b h is the bias term, and σ is a nonlinear activation function such as tanh or ReLU. The output at each time step y t is computed as follows:

(14) y t = W y h t + b y ,

where W y is the weight matrix for the output, and b y is the output bias term. This recurrent structure, combined with the output transformation, allows the network to retain and propagate information through time steps, making it suitable for tasks involving time series or sequences.

2.1.5 LSTM

LSTM is a specialized type of RNN designed to capture long-range dependencies in sequential data by using gates to control the flow of information. Each LSTM cell contains three primary gates: the forget gate, input gate, and output gate, which regulate how information is retained, updated, and outputted at each time step [6].

The forget gate f t determines what proportion of the previous cell state C t − 1 should be forgotten. It outputs values between 0 and 1, where 0 means “completely forget” and 1 means “completely retain.” The forget gate is calculated as follows:

(15) f t = σ ( W f x t + U f h t − 1 + b f ) ,

where f t is the forget gate at time t , W f , U f , and b f are the weights and bias for the forget gate, and σ is the sigmoid activation function.

The input gate i t determines how much new information should be added to the cell state C t . It also uses a sigmoid function to output values between 0 and 1. The input gate is computed as follows:

(16) i t = σ ( W i x t + U i h t − 1 + b i ) ,

where i t is the input gate at time t , W i , U i , and b i are the weights and bias for the input gate.

The cell candidate C t ˜ represents the new information that is to be added to the cell state. This candidate value is then filtered by the input gate i t . The cell state C t is updated as follows:

(17) C t = f t C t − 1 + i t C t ˜ ,

where C t − 1 is the previous cell state, f t is the forget gate, i t is the input gate, and C t ˜ is the cell candidate.

The output gate o t determines what part of the cell state should be output as the hidden state h t . It acts as a filter to decide how much of the cell state should contribute to the next hidden state. The output gate is calculated as follows:

(18) o t = σ ( W o x t + U o h t − 1 + b o ) .

Finally, the hidden state h t is updated as

(19) h t = o t tanh ( C t ) ,

where o t is the output gate, C t is the cell state, and tanh is the hyperbolic tangent activation function, applied to the cell state.

Figure 1 presents the LSTM network, which incorporates memory cells to handle long-term dependencies effectively.

Figure 1

Architecture of the LSTM network. Source: From ref. [39].

2.1.6 GRU

The GRU is a simpler version of LSTM with fewer parameters. It combines the forget and input gates of LSTM into a single update gate. The GRU has two main gates: the update gate and the reset gate [39].

The update gate z t controls the extent to which the previous hidden state h t − 1 should be retained. A higher value means more of the previous state is retained, while a lower value means more of the current candidate hidden state h t ˜ is used. The update gate is computed as

(20) z t = σ ( W z x t + U z h t − 1 + b z ) ,

where z t is the update gate, and W z , U z , and b z are the weights and bias for the update gate.

The reset gate r t determines how much of the previous hidden state should be used when computing the candidate hidden state. It is calculated as follows:

(21) r t = σ ( W r x t + U r h t − 1 + b r ) ,

where r t is the reset gate, and W r , U r , and b r are the weights and bias for the reset gate.

The candidate hidden state h t ˜ is computed using the reset gate to decide how much of the previous hidden state should be incorporated. It is calculated as follows:

(22) h t ˜ = tanh ( W h x t + U h ( r t ⋅ h t − 1 ) + b h ) ,

where h t ˜ is the candidate hidden state and r t ⋅ h t − 1 is the element-wise multiplication of the reset gate and the previous hidden state.

Finally, the hidden state h t is updated as a linear interpolation between the previous hidden state h t − 1 and the candidate hidden state h t ˜ , controlled by the update gate z t :

(23) h t = ( 1 − z t ) ⋅ h t − 1 + z t ⋅ h t ˜ ,

where h t − 1 is the previous hidden state, h t ˜ is the candidate hidden state, and z t is the update gate.

The structure of the GRU is illustrated in Figure 2, where gating mechanisms help regulate information flow and improve efficiency.

Figure 2

Architecture of the GRU. Source: From ref. [39].

2.1.7 Evaluation metrics

The predictive performance of the models is evaluated using four standard metrics: RMSE, MAE, MAPE, and R 2 . These metrics assess the accuracy of point forecasts in both absolute and relative terms. Additionally, to quantify the improvement of the best-performing model (RF) compared to other models, the percentage improvement for RMSE, MAE, and MAPE is calculated, providing a relative measure of performance enhancement:

(24) RMSE = 1 n ∑ i = 1 n ( y i − y ˆ i ) 2 ,

(25) MAE = 1 n ∑ i = 1 n ∣ y i − y ˆ i ∣ ,

(26) MAPE = 100 n ∑ i = 1 n y i − y ˆ i y i ,

(27) R 2 = 1 − ∑ i = 1 n ( y i − y ˆ i ) 2 ∑ i = 1 n ( y i − y ¯ ) 2 ,

(28) Percentage improvement = M other − M RF M other × 100 ,

where M other represents the RMSE, MAE, or MAPE of a competing model (ARIMA, SVR, RNN, LSTM, or GRU), and M RF is the corresponding metric for RF, the best-performing model.

To statistically evaluate whether the forecasting performance of two competing models differs significantly, the DM test is applied [8]. The test is based on the loss differential series d t , defined as the difference in forecast losses from two models:

(29) d t = L ( e 1 t ) − L ( e 2 t ) ,

where L ( ⋅ ) is a loss function such as MSE, MAE, and MAPE, and e 1 t , e 2 t are the forecast errors at time t from the two models being compared.

The DM test evaluates the following hypotheses:

(30) H 0 : E [ d t ] = 0 (Equal predictive accuracy) ,

(31) H 1 : E [ d t ] ≠ 0 (Unequal predictive accuracy) .

The test statistic is given by

(32) DM = d ¯ σ ˆ d 2 ∕ T ,

where d ¯ = 1 T ∑ t = 1 T d t is the sample mean of the loss differential, T is the forecast horizon, and σ ˆ d 2 is a consistent estimate of the variance of d t .

Under the null hypothesis, the DM statistic asymptotically follows a standard normal distribution:

(33) DM ∼ N ( 0 , 1 ) .

If the absolute value of the DM statistic exceeds the critical value from the standard normal distribution, the null hypothesis is rejected, indicating a statistically significant difference in predictive performance between the two models. In this study, the DM test is applied using MSE, MAE, and MAPE as loss functions to ensure comprehensive evaluation.

2.1.8 Experimental design

The experimental setup follows an 80:20 data split, where 80% of the time series is used for training and 20% for testing. We evaluate the forecasting performance of six models: ARIMA, SVR, RF, RNN, GRU, and LSTM. These models are applied to four key variables, with all learning carried out under a consistent framework.

To prepare the time series data for supervised learning for SVR, RF, and the RNNs, we apply a sliding window approach. 30-day sliding window is used to prepare the data. Given a univariate sequence y 1 , y 2 , … , y t , each input-output pair is constructed as:

(34) X t = ( y t − w , y t − w + 1 , … , y t − 1 ) , y t = y t ,

where w is the window size. This transformation allows the models to capture and learn temporal dependencies by treating past observations as features for predicting the next value.

For the DL models (RNN, GRU, LSTM), feature scaling is performed using Min-Max normalization to stabilize and accelerate the training process. The normalization formula is

(35) X scaled = X − X min X max − X min .

After prediction, the inverse transformation is applied to recover the original scale:

(36) X original = X scaled × ( X max − X min ) + X min .

Rescaling ensures that input features are within a uniform range, preventing dominance by features with larger numerical values. This enhances the convergence behavior of DL algorithms and contributes to better generalization performance.

2.1.9 Parameter settings

Hyperparameter tuning was carried out for RF, ARIMA, and SVR using grid search or automated routines, while DL models (RNN, LSTM, GRU) employed fixed configurations.

For ARIMA, the model parameters were automatically selected using the auto_arima package in Python, resulting in an optimal configuration of ARIMA(4,1,0) for all datasets. The differencing order d = 1 was used to make the time series stationary.

For SVR, hyperparameter tuning was performed using grid search, which explored the following parameter space:

C ∈ [ 1 , 10 , 100 ] ,
γ ∈ [ 1 × 1 0 − 6 , 1 × 1 0 − 5 , 1 × 1 0 − 4 ] ,
Kernel: RBF, linear and polynomial.

The optimal configuration for SVR was found to be C = 10 , RBF kernel, and γ = 1 × 1 0 − 6 for all time series variables.

For the RF model, hyperparameter tuning was performed using a grid search over a comprehensive parameter space to optimize predictive performance. The following hyperparameters were explored:

Number of trees ( n estimators ): { 50 , 100 , 200 } ,
Maximum tree depth ( max_depth ): { None , 10 , 20 , 30 } ,
Minimum samples required to split a node ( min_samples_split ): { 2 , 5 , 10 } ,
Minimum samples required at a leaf node ( min_samples_leaf ): { 1 , 2 , 4 } ,
Maximum number of features considered for splits ( max_features ): { auto , max_features } ,
Splitting criterion ( criterion ): { mse , mae } .

For the DL models , we used a fixed architecture with three layers and ReLU activation. The optimizer for all DL models was Adam with a learning rate of 0.001. Dropout was applied with a rate of 0.2 to prevent overfitting, and early stopping was used to halt training if the validation loss did not improve. The models were trained for 100 epochs with a batch size of 16.

The parameter settings for the DL models are summarized in Table 1. These settings ensure consistency across all models, enabling a fair comparison of their performance.

Table 1

Parameter settings for RNN, LSTM, and GRU models

Parameter	RNN	LSTM	GRU
Number of layers	3	3	3
Activation	ReLU	ReLU	ReLU
Loss function	MSE	MSE	MSE
Optimizer	Adam	Adam	Adam
Learning rate	0.001	0.001	0.001
Dropout rate	0.2	0.2	0.2
Epochs	100	100	100
Batch size	16	16	16
Units per layer	100, 50, 25	100, 50, 25	100, 50, 25
Early stopping	Yes (monitor = val_loss, patience = 10)	Yes (monitor = val_loss, patience = 10)	Yes (monitor = val_loss, patience = 10)

Explanation of parameters for RNN, LSTM, and GRU models:

Number of layers: All models have three layers of recurrent units (RNN, LSTM, or GRU). More layers allow the models to capture more complex patterns in sequential data.
Activation function: The activation function used in all models is ReLU (rectified linear unit), which helps mitigate the vanishing gradient problem and accelerates convergence.
Loss function: The MSE is used as the loss function for all models. It measures the average of the squares of the errors, making it sensitive to larger errors and ensuring that the model minimizes them.
Optimizer: Adam (Adaptive Moment Estimation) is used for optimization in all models. It adjusts the learning rate during training to improve convergence speed and accuracy.
Learning rate: Set to 0.001 for all models; this determines the step size during model training. A small learning rate helps with more stable training.
Dropout rate: Dropout of 0.2 is used to prevent overfitting. It randomly disables 20% of neurons in the network during training, forcing the model to learn more robust representations.
Epochs: All models are trained for 100 epochs to ensure enough iterations for convergence.
Batch size: A batch size of 16 is used, meaning 16 samples are processed at once during each training iteration.
Units per layer: The number of units in each layer is set to 100 for the first layer, 50 for the second, and 25 for the third layer, with the goal of progressively reducing the complexity of the model.
Early stopping: This mechanism halts training when the validation loss stops improving, preventing overfitting and unnecessary training.

The Python packages used in this study include Numpy for numerical operations and array handling, Pandas for data manipulation and analysis, Seaborn for data visualization, Matplotlib for creating static, animated, and interactive visualizations, Scikit-learn for ML models including SVR, RF, and performance metrics, Statsmodels for statistical models and time series analysis, Pmdarima for automatic ARIMA model selection, and Keras for building and training DL models.

3 Results and discussion

3.1 Descriptive statistics

Figure 3 presents the time series plots of four key variables: total cases, severe cases, critical cases, and total deaths. The patterns indicate an increasing trend over time with noticeable fluctuations. The presence of volatility suggests the need for further statistical analysis, such as stationarity tests and autocorrelation diagnostics.

Figure 3

Time series plots of total cases, severe cases, critical cases, and total deaths. Source: Created by the authors.

Table 2 presents the summary statistics of the four key variables. The mean and standard deviation highlight high variability in the dataset. The skewness values indicate a slight positive skewness, suggesting that the distributions are slightly right-tailed. The negative kurtosis values suggest that the distributions are flatter than a normal distribution, indicating fewer extreme events.

Table 2

Summary statistics

	Count	Mean	Std Dev	Min	25%	50%	75%	Max	Kurtosis	Skewness
Total cases	499	8841.58	5911.12	146.05	3737.57	8218.49	12979.51	21393.05	− 0.8752	0.4086
Severe cases	499	4210.27	2814.82	69.55	1779.80	3913.57	6180.72	10187.17	− 0.8752	0.4086
Critical cases	499	1472.06	983.35	25.06	616.25	1359.01	2158.80	3570.59	− 0.8753	0.4116
Total deaths	499	588.83	393.34	10.03	246.50	543.60	863.52	1428.24	− 0.8753	0.4116

Table 3 shows the results of the augmented Dickey-Fuller (ADF) test for stationarity. Since the p-values are greater than 0.05, we fail to reject the null hypothesis, indicating that the time series data is non-stationary. This suggests the need for differencing or transformation before applying forecasting models.

Table 3

ADF test results for stationarity

Variable	ADF statistic	p -value
Total cases	− 1.6906	0.4360
Severe cases	− 1.6906	0.4360
Critical cases	− 1.6968	0.4327
Total deaths	− 1.6968	0.4328

Figure 4 displays the ACF and PACF plots for the four variables. The slow decay in the ACF plots suggests the presence of long memory in the time series. Long memory, also known as long-range dependence, implies that past observations significantly influence future values over a prolonged period.

Figure 4

ACF and PACF for total cases, severe cases, critical cases, and total deaths. Source: Created by the authors.

3.2 Forecasting results

Table 4 summarizes the forecasting performance of six models, ARIMA, SVR, RNN, GRU, LSTM, and RF across four datasets: total cases, critical cases, severe cases, and total deaths, with metrics reported to four decimal places. RF consistently outperforms all models across all datasets, achieving the lowest RMSE, MAE, and MAPE, and the highest R ² values, nearing 1.0000. For total cases, RF yields an RMSE of 93.4117, MAE of 35.9370, MAPE of 0.2668, and R ² of 0.9995; for critical cases, RMSE of 17.5342, MAE of 7.3318, MAPE of 0.3330, and R ² of 0.9994; for severe cases, RMSE of 44.4818, MAE of 17.1128, MAPE of 0.2668, and R ² of 0.9995; and for total deaths, RMSE of 7.0137, MAE of 2.9327, MAPE of 0.3330, and R ² of 0.9994. These results highlight RF’s superior accuracy and robustness in capturing nonlinear patterns in COVID- 19 data.

Table 4

Performance metrics (RMSE, MAE, MAPE, and R 2 ) for six models – ARIMA, SVR, RNN, LSTM, GRU, and RF – across four datasets: total cases, critical cases, severe cases, and total deaths

Dataset	Model	RMSE	MAE	MAPE (%)	R 2
Total cases	ARIMA	13870.6770	10916.6357	82.0056	− 9.2272
	SVR	5589.6811	3874.5758	26.4893	− 0.6609
	RNN	365.3872	289.7108	2.3229	0.9929
	LSTM	338.4482	281.7235	2.4740	0.9939
	GRU	215.3505	177.0998	1.5449	0.9975
	RF	93.4117	35.9370	0.2668	0.9995
Critical cases	ARIMA	2055.3009	1599.7675	70.9180	− 6.7976
	SVR	467.2761	216.0696	6.8270	0.5970
	RNN	62.5152	54.0835	2.8693	0.9928
	LSTM	185.1695	144.1515	6.6211	0.9367
	GRU	42.5913	35.1251	1.8999	0.9967
	RF	17.5342	7.3318	0.3330	0.9994
Severe cases	ARIMA	6593.2169	5188.9595	81.8556	− 9.1904
	SVR	2223.5193	1341.9956	17.5188	− 0.1590
	RNN	140. 1730	110.0014	2.3796	0.9954
	LSTM	281.8612	221.9638	3.9355	0.9814
	GRU	140.4860	113.2804	1.9507	0.9954
	RF	44.4818	17.1128	0.2668	0.9995
Total deaths	ARIMA	822.0758	639.8712	70.9140	− 6.7967
	SVR	89.1646	34.3085	2.6381	0.9083
	RNN	23.3445	17.9225	2.0940	0.9937
	LSTM	40.4583	31.6118	4.0341	0.9811
	GRU	21.3337	18.6039	2.5657	0.9947
	RF	7.0137	2.9327	0.3330	0.9994

All values are reported to four decimal places. Bold rows indicate the RF model, which consistently outperforms the other models.

ARIMA shows the weakest performance, with high error metrics and negative R ² values, indicating poor fit. For total cases, ARIMA’s RMSE is 13870.6770, MAE is 10916.6357, MAPE is 82.0056, and R ² is − 9.2272 , reflecting its struggle with nonlinear epidemic dynamics. SVR outperforms ARIMA but is inferior to RF, with higher errors, e.g., RMSE of 5589.6811, MAE of 3874.5758, MAPE of 26.4893, and R ² of − 0.6609 for total cases. Among DL models, GRU generally surpasses RNN and LSTM, notably for critical cases (RMSE: 42.5913, MAE: 35.1251, MAPE: 1.8999, R ²: 0.9967) and severe cases (RMSE: 140.4860, MAE: 113.2804, MAPE: 1.9507, R ²: 0.9954). However, RNN and LSTM exhibit higher errors and lower R ² compared to RF, with LSTM performing poorly for critical cases (RMSE: 185.1695, MAE: 144.1515, MAPE: 6.6211, R ²: 0.9367). While DL models capture some temporal dependencies, RF’s efficiency in handling feature interactions and nonlinearities without heavy computational demands.

RF’s consistent performance across datasets underscores its suitability for epidemic forecasting for this study where accuracy and efficiency are vital. Its low errors and near-perfect R ² values indicate strong generalization, ideal for public health applications. ARIMA and SVR’s limitations in complex data, and the moderate performance of RNN, GRU, and LSTM, suggest they are better suited for long-term temporal patterns. These findings support using ensemble methods like RF, with potential for future hybrid models combining RF and DL to improve accuracy.

Figure 5 illustrates the comparison between predicted and actual values for all four datasets. The alignment of predictions with actual values across these datasets demonstrates RF’s consistent predictive accuracy, with minimal deviations from actual values compared to other models. RF’s consistent performance across datasets underscores its suitability for epidemic forecasting in resource-constrained settings like Kenya, where accuracy and efficiency are vital. Its low errors and near-perfect R ² values indicate strong generalization.

Figure 5

Prediction vs actual values for total cases, critical cases, severe cases, and total deaths datasets. Source: Created by the authors.

Figure 6 presents a bar plot comparing the forecasting performance metrics of six models.

$Figure 6 Bar plot of performance metrics (RMSE, MAE, MAPE, and R 2 {R}^{2} ) for six models ARIMA, SVR, RNN, LSTM, GRU, and RF across four COVID-19 datasets: total cases, critical cases, severe cases, and total deaths. Source: Created by the authors.$

Figure 6

Bar plot of performance metrics (RMSE, MAE, MAPE, and R 2 ) for six models ARIMA, SVR, RNN, LSTM, GRU, and RF across four COVID-19 datasets: total cases, critical cases, severe cases, and total deaths. Source: Created by the authors.

3.3 Quality checks

Figures 7, 8, 9 and 10 confirm that RF predictions exhibit a strong linear correlation with actual values across all datasets. The alignment of majority of points along the diagonal in each Scatterplot suggests that RF consistently provides accurate predictions with minimal deviations.

Figure 7

Scatterplot of observed vs predicted values for the total cases dataset. The RF model demonstrates a strong linear relationship, indicating accurate predictions. Source: Created by the authors.

Figure 8

Scatterplot of observed vs predicted values for the critical cases dataset. The RF model demonstrates a strong linear relationship, indicating accurate predictions. Source: Created by the authors.

Figure 9

Scatterplot of observed vs predicted values for the Severe Cases dataset. The RF model demonstrates a strong linear relationship, indicating accurate predictions. Source: Created by the authors.

Figure 10

Scatterplot of observed vs predicted values for the total deaths dataset. The RF model demonstrates a strong linear relationship, indicating accurate predictions. Source: Created by the authors.

Figure 11 illustrates the distribution of forecasting errors for the total cases, critical cases, severe cases, and total deaths datasets. The plot compares the error distributions across six models demonstrating that RF consistently exhibits the lowest median error, minimal variance, and fewer outliers across all datasets. In contrast, models such as ARIMA and SVR show higher error variance and more outliers, while RNN, GRU, and LSTM display moderate performance, suggesting they are better suited for long-term temporal patterns. These findings highlight RF’s superior balance of accuracy and error minimization.

Figure 11

Boxplot of forecasting error distributions for six models – ARIMA, SVR, RNN, LSTM, GRU, and RF – across four COVID-19 datasets. Each model’s errors are grouped by dataset, with boxes showing the interquartile range, median, and outliers of the forecasting errors. The RF model consistently exhibits the lowest median error, minimal variance, and fewer outliers across all datasets, highlighting its superior predictive accuracy and stability. Source: Created by the authors.

A heatmap, as shown in Figure 12, visually represents model performance through color gradients, clearly highlighting differences in RMSE, MAE, and MAPE across total cases, critical cases, severe cases, and total deaths datasets. All models demonstrate positive improvement over the RF baseline, with ARIMA exhibiting the highest enhancement.

Figure 12

Heatmap of model performance metrics (RMSE, MAE, MAPE) across total cases, critical cases, severe cases, and total deaths datasets. All models show improvement over RF, with ARIMA exhibiting the highest improvement compared to RF. Source: Created by the authors.

3.4 Statistical robustness checks

As a robustness check, the DM test, as shown in Table 5, assesses the statistical significance of differences in forecasting errors between the best-performing model, RF, and benchmark models (ARIMA, SVR, RNN, LSTM, GRU) across total cases, critical cases, severe cases, and total deaths datasets, using MSE, MAE, and MAPE as loss functions. All comparisons are significant at the 1% level ( p -value < 0.01 ). Negative DM statistics indicate that RF’s errors are significantly smaller than those of the benchmark models, reflecting RF’s superior performance in most cases.

Table 5

DM test statistics comparing RF to benchmark models across datasets and loss functions

Dataset	Benchmark model	MSE	MAE	MAPE
Total cases	ARIMA	− 8.381 3 * * *	− 12.632 2 * * *	− 14.587 6 * * *
	SVR	− 5.986 5 * * *	− 9.517 9 * * *	− 14.060 0 * * *
	RNN	− 7.478 2 * * *	− 11.299 8 * * *	− 13.647 8 * * *
	LSTM	− 7.713 3 * * *	− 13.057 4 * * *	− 14.500 8 * * *
	GRU	− 4.937 5 * * *	− 9.653 0 * * *	− 12.054 7 * * *
Critical cases	ARIMA	− 8.196 2 * * *	− 12.271 1 * * *	− 16.908 7 * * *
	SVR	− 4.138 3 * * *	− 5.092 6 * * *	− 5.328 4 * * *
	RNN	− 8.906 4 * * *	− 15.779 6 * * *	− 17.316 7 * * *
	LSTM	− 7.806 8 * * *	− 12.091 7 * * *	− 17.446 0 * * *
	GRU	− 6.949 7 * * *	− 10.959 0 * * *	− 11.152 7 * * *
Severe cases	ARIMA	− 8.380 6 * * *	− 12.631 6 * * *	− 14.564 6 * * *
	SVR	− 5.235 0 * * *	− 7.475 3 * * *	− 9.483 4 * * *
	RNN	− 5.296 3 * * *	− 9.109 2 * * *	− 9.459 2 * * *
	LSTM	− 7.056 2 * * *	− 11.344 0 * * *	− 13.424 1 * * *
	GRU	− 7.321 1 * * *	− 11.440 3 * * *	− 13.382 3 * * *
Total deaths	ARIMA	− 8.196 2 * * *	− 12.271 0 * * *	− 13.478 9 * * *
	SVR	− 3.243 0 * * *	− 3.969 7 * * *	− 4.008 1 * * *
	RNN	− 7.122 7 * * *	− 9.983 4 * * *	− 11.479 3 * * *
	LSTM	− 6.444 3 * * *	− 10.958 3 * * *	− 14.155 8 * * *
	GRU	− 8.801 3 * * *	− 13.753 9 * * *	− 13.808 8 * * *

Note: * * * indicates statistical significance at the 1 % level ( p -value < 0.01 ), * * at the 5 % level ( p -value < 0.05 ), * at the 10 % level ( p -value < 0.10 ). Negative values indicate RF errors are significantly larger than the benchmark model’s errors.

4 Conclusion

The primary objective of this study was to evaluate the forecasting performance of statistical, ML, and DL models for predicting COVID-19 trends in Kenya, focusing on total cases, critical cases, severe cases, and total deaths. By comparing models such as ARIMA, SVR, RF, RNN, LSTM, and GRU, the study aimed to identify the most effective approach for epidemic forecasting in a resource-constrained setting. A robust evaluation framework, including multiple error metrics and the DM test, was employed to assess predictive accuracy and statistical significance of differences in forecasting errors. This comprehensive analysis sought to provide actionable insights for public health decision-making by determining which models best capture the complex dynamics of epidemic data in Kenya.

The key finding of this study is that ensemble ML methods, particularly RF, offer superior predictive accuracy and computational efficiency for COVID-19 forecasting in Kenya, making them highly suitable for resource-limited environments. While DL models such as GRU and LSTM show promise in capturing temporal dependencies, their performance is generally outshone by RF, which consistently excels across all datasets. In contrast, traditional statistical models like ARIMA struggle with the nonlinear patterns inherent in epidemic data, highlighting the advantage of ML approaches in such contexts. The DM test reinforces these findings by confirming significant differences in forecasting performance, with RF typically outperforming benchmarks, except in specific cases where other models show marginal advantages.

These findings underscore the potential of ML, particularly RF, to enhance epidemic preparedness, where efficient resource allocation is critical. The study advocates for the adoption of ensemble methods in public health forecasting while suggesting that future research explore hybrid models combining statistical and ML techniques to further improve accuracy. By leveraging such models, policymakers can make informed decisions to mitigate the impact of infectious diseases.

5 Limitations and recommendations for future work

This study, while comprehensive in evaluating six models for COVID-19 forecasting in Kenya, has several limitations. The analysis relied on a single 80:20 train-test split, focusing on one-step-ahead predictions, which may not fully capture the models’ performance in multi-step forecasting scenarios. Time-aware cross-validation, such as walk-forward validation, could provide a more robust assessment but was not implemented due to computational constraints. Additionally, the study was limited to six models, excluding simpler linear models or other advanced techniques, such as hybrid approaches or transformer-based architectures, which might offer complementary insights. The absence of external features, such as new COVID-19 variants, mobility patterns, or socioeconomic indicators, limits the models’ ability to account for real-world complexities. To address these limitations, future research should incorporate time-aware cross-validation to enhance model robustness and explore multi-step forecasting to better simulate real-world epidemic scenarios. Expanding the model space to include linear models, hybrid statistical-ML approaches, or transformer-based architectures, as suggested in recent studies , could improve predictive power. Incorporating external features, such as policy changes, mobility data, or socioeconomic factors, and real-time data streams, would provide a more holistic view of epidemic dynamics.

Acknowledgments

The authors gratefully acknowledge the Ministry of Health, Kenya, for providing access to COVID-19 data.

Funding information: The authors state no funding involved.
Author contributions: Joyce Kiarie: writing – original draft, writing – review, editing, software, methodology, data curation and conceptualization. Samuel Mwalili : writing – review and supervision. Rachel Mbogo : writing – review and supervision. John Mutinda: writing – review. Amos Langat: writing– review and supervision.
Conflict of interest: The authors state no conflict of interest.
Data availability statement: All datasets used in this study are publicly available and can be accessed through the following repository: COVID-19 Processed Datasets Repository: https://github.com/Samgoiwa/Processed-Covid-19-datasets/tree/Covid-19-Dataset. These datasets include the COVID-19 case counts, testing data, vaccination data, and related socio-demographic information that were used for the statistical, machine learning, and deep learning models presented in this paper.

References

[1] Alassafi, M. O., Jarrah, M., & Alotaibi, R. (2022). Time series predicting of COVID-19 based on deep learning. Neurocomputing, 468, 335–344, DOI: https://doi.org/10.1016/j.neucom.2021.10.035. Search in Google Scholar PubMed PubMed Central

[2] Arora, P., Kumar, H., & Panigrahi, B. K. (2020). Prediction and analysis of COVID-19 positive cases using deep learning models: A descriptive case study of India. Chaos, solitons & fractals, 139, 110017, DOI: https://doi.org/10.1016/j.chaos.2020.110017. Search in Google Scholar PubMed PubMed Central

[3] Basak, D., Pal, S., Patranabis, D. C. (2007). Support vector regression. Neural Information Processing-Letters and Reviews, 11(10), 203–224, DOI: https://doi.org/10.1016/j.chaos.2020.110017. Search in Google Scholar

[4] Bayyurt, L., & Bayyurt, B. (2020). Forecasting of COVID-19 cases and deaths using ARIMA models. medrxiv, pp. 2020–04, DOI: https://doi.org/10.1101/2020.04.17.20069237. Search in Google Scholar

[5] Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32, DOI: https://doi.org/10.1023/A:1010933404324. Search in Google Scholar

[6] Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv, http://arXiv.org/abs/arXiv:1412.3555. Search in Google Scholar

[7] Ciotti, M., Ciccozzi, M., Terrinoni, A., Jiang, W.-C., Wang, C.-B., & Bernardini, S. (2020). The COVID-19 pandemic. Critical Reviews in Clinical Laboratory Sciences, 57(6), 365–388, DOI: 10.1080/10408363.2020.1783198. Search in Google Scholar PubMed

[8] Coroneo, L., Iacone, F., Paccagnini, A., & Monteiro, P. S. (2023). Testing the predictive accuracy of COVID-19 forecasts. International Journal of Forecasting, 39(2), 606–622, DOI: https://doi.org/10.1016/j.ijforecast.2022.01.005. Search in Google Scholar PubMed PubMed Central

[9] Cumbane, S. P., & Gidófalvi, G. (2024). Deep learning-based approach for COVID-19 spread prediction. International Journal of Data Science and Analytics, 1–17, DOI: https://doi.org/10.1007/s41060-024-00558-1. Search in Google Scholar

[10] Ghafouri-Fard, S., Mohammad-Rahimi, H., Motie, P., Minabi, M. A., Taheri, M., & Nateghinia, S. (2021). Application of machine learning in the prediction of COVID-19 daily new cases: A scoping review. Heliyon, 7(10), e08143, DOI: https://doi.org/10.1016/j.heliyon.2021.e08143. Search in Google Scholar PubMed PubMed Central

[11] Gunn, S. R. (1997). Support vector machines for classification and regression. Technical report, Citeseer. Search in Google Scholar

[12] Gupta, V. K., Gupta, A., Kumar, D., & Sardana, A. (2021). Prediction of COVID-19 confirmed, death, and cured cases in India using random forest model. Big Data Mining and Analytics, 4(2), 116–123, DOI: 10.26599/BDMA.2020.9020016. Search in Google Scholar

[13] Harvey, A. C. (1990). Arima models. In: Time Series and Statistics, (pp. 22–24). Springer, DOI: https://doi.org/10.1007/978-1-349-20865-4_2. Search in Google Scholar

[14] Herlawati, H. (2020). Covid-19 spread pattern using support vector regression. PIKSEL: Penelitian Ilmu Komputer Sistem Embedded and Logic, 8(1), 67–74, DOI: 10.33558/piksel.v8i1.2024. Search in Google Scholar

[15] Hu, Z., Ge, Q., Li, S., Jin, L., & Xiong, M. (2020). Artificial intelligence forecasting of COVID-19 in China. arXiv: http://arXiv.org/abs/arXiv:2002.07112. Search in Google Scholar

[16] Ibrahim, Z., Tulay, P., & Abdullahi, J. (2023). Multi-region machine learning-based novel ensemble approaches for predicting COVID-19 pandemic in Africa. Environmental Science and Pollution Research, 30(2), 3621–3643, DOI: https://doi.org/10.1007/s11356-022-22373-6. Search in Google Scholar PubMed PubMed Central

[17] Kathula, D. N. (2020). Effect of COVID-19 pandemic on the education system in Kenya. Journal of Education, 3(6), 31–52, https://stratfordjournals.org/journals/index.php/journal-of-education/article/view/640. Search in Google Scholar

[18] Khan, F. M., & Gupta, R. (2020). Arima and NAR based prediction model for time series analysis of COVID-19 cases in India. Journal of Safety Science and Resilience, 1(1), 12–18, DOI: https://doi.org/10.1016/j.jnlssr.2020.06.007. Search in Google Scholar PubMed PubMed Central

[19] Kinney Jr, W. R. (1978). ARIMA and regression in analytical review: An empirical test. Accounting Review, 53, 48–60. Search in Google Scholar

[20] Koh, D. (2020). COVID-19 lockdowns throughout the world. Occupational Medicine, 70(5), 322–322, DOI: 10.1093/occmed/kqaa073. Search in Google Scholar

[21] Kufel, T. (2020). ARIMA-based forecasting of the dynamics of confirmed COVID-19 cases for selected European countries. Equilibrium. Quarterly Journal of Economics and Economic Policy, 15(2), 181–204, DOI: 10.24136/eq.2020.009. Search in Google Scholar

[22] Kumar, A., Singh, R., Kaur, J., Pandey, S., Sharma, V., Thakur, L., et al. (2021). Wuhan to world: the COVID-19 pandemic. Frontiers in cellular and infection microbiology, 11, 596201, DOI: 10.3389/fcimb.2021.596201. Search in Google Scholar PubMed PubMed Central

[23] Langat, A. K., Mutinda, J. K., Mwalili, S. M., & Kazembe, L. N. (2023). COVID-19 impact analysis: assessing African sectors-commodity, service, manufacturing, and education using mixed model approach. Asian Journal of Probability and Statistics, 25(4), 43–55, DOI: 10.9734/ajpas/2023/v25i4571. Search in Google Scholar

[24] Langat, A. K., Ofori, M., Ishag, M., & Bouzir, Y. (2023). Synthetic control and comparative studies on COVID-19 vaccines enrollment and hesitancy in Africa. Research Square Preprint, DOI: https://doi.org/10.21203/rs.3.rs-2650802/v1. Search in Google Scholar

[25] Mantoro, T., Handayanto, R. T., Ayu, M. A., & Asian, J. (2020). Prediction of COVID-19 spreading using support vector regression and susceptible infectious recovered model. In 2020 6th International Conference on Computing Engineering and Design (ICCED), (pp. 1–5). IEEE, DOI: 10.1109/ICCED51276.2020.9415858. Search in Google Scholar

[26] Medsker, L. R., Jain, L. (2001). Recurrent neural networks. Design and Applications, 5(64–67), 2, DOI: https://doi.org/10.1007/978-1-349-20865-4_2. Search in Google Scholar

[27] MoHke, 2025. COVID-19 Updates, https://www.health.go.ke/COVID-19, Accessed: 24 March 2025. Search in Google Scholar

[28] Mukolwe, J. A., Mutinda, J. K., & Langat, A. K. (2025). Spatial epidemiology based on the analysis of COVID-19 in Africa. Scientific African, 27, e02557, DOI: https://doi.org/10.1016/j.sciaf.2025.e02557. Search in Google Scholar

[29] Nayak, D., & Tantravahi, S. L. R. (2024). On building machine learning models for medical dataset with correlated features. Computational and Mathematical Biophysics, 12(1), 20230124, DOI: https://doi.org/10.1515/cmb-2023-0124. Search in Google Scholar

[30] Ngwacho, A. G. (2020). COVID-19 pandemic impact on kenyan education sector: Learner challenges and mitigations. Journal of Research Innovation and Implications in Education, 4(2), 128–139, DOI: https://doi.org/10.9734/ajpas/2023/v25i4571. Search in Google Scholar

[31] Ouma, P. N., Masai, A. N., & Nyadera, I. N. (2020). Health coverage and what kenya can learn from the COVID 19 pandemic. Journal of Global Health, 10(2), 020362, DOI: 10.7189/jogh.10.020362. Search in Google Scholar PubMed PubMed Central

[32] Özen, F. (2024). Random forest regression for prediction of COVID-19 daily cases and deaths in Turkey. Heliyon, 10(4), e25746, DOI: https://doi.org/10.1016/j.heliyon.2024.e25746. Search in Google Scholar PubMed PubMed Central

[33] Parbat, D., & Chakraborty, M. (2020). A python based support vector regression model for prediction of COVID19 cases in India. Chaos, Solitons & Fractals, 138, 109942, DOI: https://doi.org/10.1016/j.chaos.2020.109942. Search in Google Scholar PubMed PubMed Central

[34] Perc, M., Gorišek Miksić, N., Slavinec, M., & Stožer, A. (2020). Forecasting COVID-19. Frontiers in Physics, 8, 127, DOI: https://doi.org/10.1016/j.chaos.2020.109942. Search in Google Scholar

[35] Petropoulos, F., & Makridakis, S. (2020). Forecasting the novel coronavirus COVID-19. PloS one, 15(3), e0231236, DOI: 10.1371/journal.pone.0231236. Search in Google Scholar PubMed PubMed Central

[36] Ribeiro, M. H. D. M., da Silva, R. G., Mariani, V. C., & dos Santos Coelho, L. (2020). Short-term forecasting COVID-19 cumulative confirmed cases: Perspectives for Brazil. Chaos, Solitons & Fractals, 135, 109853, DOI: https://doi.org/10.1016/j.chaos.2020.109853. Search in Google Scholar PubMed PubMed Central

[37] Rustam, F., Reshi, A. A., Mehmood, A., Ullah, S., On, B.-W., Aslam, W., & Choi, G. S. (2020). COVID-19 future forecasting using supervised machine learning models. IEEE access, 8, 101489–101499, DOI: 10.1109/ACCESS.2020.2997311. Search in Google Scholar

[38] Sharma, S., Gupta, Y. K., & Mishra, A. K. (2023). Analysis and prediction of COVID-19 multivariate data using deep ensemble learning methods. International Journal of Environmental Research and Public Health, 20(11), 5943, DOI: 10.3390/ijerph20115943. Search in Google Scholar PubMed PubMed Central

[39] Sherstinsky, A. (2020). Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306, DOI: https://doi.org/10.1016/j.physd.2019.13230. Search in Google Scholar

[40] Sinha, T., Chowdhury, T., Shaw, R. N., & Ghosh, A. (2021). Analysis and prediction of COVID-19 confirmed cases using deep learning models: a comparative study. In Advanced Computing and Intelligent Technologies: Proceedings of ICACIT 2021, pp. 207–218. Springer, DOI: https://doi.org/10.1007/978-981-16-2164-2_18. Search in Google Scholar

[41] Sujath, R., Chatterjee, J. M., & Hassanien, A. E. (2020). A machine learning forecasting model for COVID-19 pandemic in India. Stochastic Environmental Research and Risk Assessment, 34, 959–972, DOI: https://doi.org/10.1007/s00477-020-01827-8. Search in Google Scholar PubMed PubMed Central

[42] Sulthana, R., Jovith, A., Jaithunbi, A. K. (2021). LSTM and RNN to predict COVID cases: Lethality’s and tests in GCC nations and India. International Journal of Performability Engineering, 17(3), 299, DOI: 10.23940/ijpe.21.03.p5.299306. Search in Google Scholar

[43] Suryasa, I. W., Rodríguez-Gámez, M., & Koldoris, T. (2021). The COVID-19 pandemic. International Journal of Health Sciences, 5(2), 572194, DOI: https://doi.org/10.53730/ijhs.v5n2.2937. Search in Google Scholar

[44] Wang, J., Yu, H., Hua, Q., Jing, S., Liu, Z., Peng, X., et al. (2020). A descriptive study of random forest algorithm for predicting COVID-19 patients outcome. PeerJ, 8, e9945, DOI: 10.7717/peerj.9945. Search in Google Scholar PubMed PubMed Central

[45] Wang, Y., Yan, Z., Wang, D., Yang, M., Li, Z., Gong, X., et al. (2022). Prediction and analysis of COVID-19 daily new cases and cumulative cases: times series forecasting and machine learning models. BMC Infectious Diseases, 22(1), 495, DOI: https://doi.org/10.1186/s12879-022-07472-6. Search in Google Scholar PubMed PubMed Central

[46] Xu, L., Magar, R., & Farimani, A. B. (2022). Forecasting COVID-19 new cases using deep learning methods. Computers in Biology and Medicine, 144, 105342, DOI: https://doi.org/10.1016/j.compbiomed.2022.105342. Search in Google Scholar PubMed PubMed Central

Received: 2025-04-18

Revised: 2025-07-23

Accepted: 2025-07-23

Published Online: 2025-09-15

This work is licensed under the Creative Commons Attribution 4.0 International License.