Simulation study on the impact of measurement errors in hierarchical Bayesian semi-parametric models

Amos Kipkorir Langat; Samuel Musili Mwalili; Lawrence Ndekeleni Kazembe

doi:10.1515/cmb-2024-0019

Article Open Access

Simulation study on the impact of measurement errors in hierarchical Bayesian semi-parametric models

Amos Kipkorir Langat , Samuel Musili Mwalili and Lawrence Ndekeleni Kazembe

Published/Copyright: June 25, 2025

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Computational and Mathematical Biophysics Volume 13 Issue 1

Abstract

This study examines the impact of measurement errors on parameter estimates within hierarchical Bayesian semiparametric (HBS) models, with a focus on the Lotka–Volterra predator–prey model as a case study. By employing Gibbs sampling within the Markov Chain Monte Carlo framework, we simulate various levels of measurement errors to assess the robustness of these models. Results indicate that HBS models effectively account for measurement error, mitigating its adverse effects on the accuracy of parameter estimates, especially for complex, nonlinear systems like predator–prey dynamics. The study demonstrates that as sample sizes increase, the models’ ability to recover true population interaction parameters, such as prey birth rates and predator consumption rates, improves significantly. These findings underscore the importance of using advanced Bayesian methods to correct for measurement errors, ensuring reliable statistical inferences in fields like ecological modeling, environmental studies, and agricultural systems. The integration of HBS models enhances the reliability of complex data analyses, providing essential insights into nonlinear interactions in real-world systems.

Keywords: hierarchical Bayesian semi-parametric models; measurement error; Gibbs sampling; Markov Chain Monte Carlo; predator–prey model; parameter estimation; Bayesian inference

MSC 2010: 62F15; 62P10; 62J12

1 Introduction

Measurement error is a critical issue in empirical research that can lead to biased estimates and invalid conclusions. Accurate measurement is essential for reliable statistical analysis and informed decision-making. However, discrepancies between true values and observed values due to instrument imprecision, environmental factors, or human error can compromise data quality [5,6]. This study investigates the impact of measurement errors on model parameters within the framework of Hierarchical Bayesian semi-parametric (HBS) models. Bayesian methods provide a flexible framework for addressing measurement errors by incorporating prior knowledge and allowing for complex modeling of error distributions [11,20]. Recent advancements have demonstrated the utility of Bayesian methods in handling measurement errors through various approaches. For instance, Gelman and Shalizi [4] highlighted the robustness of Bayesian approaches in accommodating complex dependencies among variables [2,14], and provided a comprehensive overview of measurement error models, including regression calibration and the simulation-extrapolation (SIMEX) approach, which have been widely adopted in various fields.

Hierarchical Bayesian models extend traditional Bayesian approaches by accommodating hierarchical data structures and capturing complex dependencies among variables. These models are particularly useful in situations where data are nested or multilevel, such as in environmental studies or longitudinal data analysis [8].

Statistical modeling plays a crucial role in understanding complex ecological systems, such as the interactions between predator and prey populations. Among the most widely studied models in this field is the Lotka–Volterra predator–prey model, which captures the dynamic relationship between two species over time. This model uses a set of ordinary differential equations (ODEs) to describe how the population of prey influences the growth of predator populations and vice versa. However, in real-world applications, data often contain measurement errors, which can significantly distort the accuracy of parameter estimates and lead to biased conclusions if not properly addressed [3,18].

One of the critical challenges in ecological modeling is estimating regression coefficients and parameter values in the presence of measurement error. In this study, the Lotka–Volterra predator–prey system is used as a case study to explore how measurement error impacts the accuracy of parameter estimates in a HBS framework. Previous studies have shown that failing to account for measurement errors in regression models can lead to biased estimates and increased uncertainty in the predicted outcomes [2,14]. By employing hierarchical Bayesian models, it is possible to account for the uncertainty introduced by measurement errors and obtain more reliable estimates of key parameters [4].

Recent applications of Bayesian methods span various fields [17]. In agriculture, Bayesian models have been applied to optimize fertilizer usage and improve crop yield. Previous study [12,13,15] utilized Bayesian optimization techniques to determine optimal planting densities and irrigation schedules. Similarly, Shirley et al. [16] applied Bayesian methods to predict maize yields, demonstrating the models’ effectiveness in agricultural planning and management. Additionally, these methods were utilized for forecasting crop disease outbreaks, optimizing irrigation schedules, and enhancing soil nutrient management [1]. The integration of Bayesian techniques also facilitated the development of early warning systems for pest invasions and improved the precision of weather-based agricultural advisories, ultimately contributing to increased food security and sustainable farming practices [10,17]. The HBS models developed in this study aim to correct measurement errors and enhance the reliability of parameter estimates in statistical analyses. By employing a Gibbs sampling method within the Markov Chain Monte Carlo (MCMC) framework, this study assesses the robustness and accuracy of parameter estimates under varying levels of measurement error. Therefore, this research investigates how hierarchical Bayesian models handle the complexities of nonlinear systems, such as the predator–prey dynamics, while adjusting for the effects of measurement error. The aim is to assess the robustness and reliability of these models when applied to noisy ecological data and to explore the extent to which measurement error affects key interaction terms like the prey birth rate ( α ) and the predator–prey interaction rate ( β ). Through this, the study provides insights into how measurement error can distort parameter estimates and how hierarchical Bayesian models can mitigate these effects.

2 Methodology

2.1 Topological spaces

A topological space is a set X equipped with a topology T , a collection of open subsets of X satisfying the following properties:

The empty set and X are in T .
The union of any collection of sets in T is also in T .
The intersection of any finite collection of sets in T is also in T .

Formally, ( X , T ) is a topological space where X is the set of points and T defines the structure of the space.

2.2 Parameter interaction in topological spaces

Consider a parameter space Θ , which is a topological space with topology T Θ . Let θ ∈ Θ be a parameter vector and X be the observed data in the topological space X .

Theorem 1

(Parameter interaction in space) Let θ be a vector in the parameter space Θ , and X represent observed data in the space X. The interaction between parameters θ in the space Θ can be understood as follows:

Parameters θ jointly influence the behavior of the system in the parameter space Θ .
The geometry of the parameter space Θ affects how parameters θ interact with each other.
Observing data X in the space X provides insights into the relationships between parameters θ .

2.2.1 Mathematical equation for the parameter interaction space theorem

(1) f ( θ ) = L ( X ∣ θ ) ⋅ dist ( θ 1 , θ 2 ) ⋅ Geometry ( Θ ) ,

where f ( θ ) represents the overall system behavior given the parameters θ , L ( X ∣ θ ) is the likelihood function that measures the probability of observing data X given θ , dist ( θ 1 , θ 2 ) measures the interaction between two parameters θ 1 and θ 2 , and Geometry ( Θ ) represents the geometry of the parameter space, influencing how the parameters interact.

This equation integrates the joint influence of parameters, their geometric interactions, and the observed data, providing a comprehensive representation of the theorem’s key points in a single mathematical expression.

In hierarchical Bayesian models, parameters θ 1 and θ 2 interact because of shared latent variables and dependencies imposed by the prior distributions. These interactions help the model capture complex relationships in the data that cannot be explained by a single parameter in isolation. For instance, the outcome X is influenced not just by one parameter but by the joint behavior of multiple parameters, capturing underlying correlations or dependencies in the data.

2.3 Bayes’ theorem

Theorem 2

(Bayes’ theorem) Let θ represent a vector of parameters of interest, and X denote observed data. Bayes’ theorem states:

(2) P ( θ ∣ X ) = P ( X ∣ θ ) ⋅ P ( θ ) P ( X ) ,

where

P ( θ ∣ X ) is the posterior probability distribution of parameters θ given the observed data X .
P ( X ∣ θ ) is the likelihood function representing the probability of observing data X given parameters θ .
P ( θ ) is the prior probability distribution of parameters θ , representing our beliefs about θ before observing any data.
P ( X ) is the marginal probability of observing data X , also known as the evidence.

This formulation allows us to update our beliefs about the parameters θ after observing data X , incorporating both prior knowledge and new evidence.

2.4 Measurement error variance

The relationship between the intermediate variable Z and the true value X true is

(3) X obs = X true + W ,

where W ∼ N ( 0 , σ W 2 ) represents the measurement error variance.

2.5 Outcome model

The relationship between the outcome Y and the variables X true , X obs , and Z is

(4) Y = β 0 + β 1 X true + β 2 Z + ε ,

(5) Y = β 0 + β 1 ( X true + W ) + β 2 Z + ε ,

(6) Y = β 0 + β 1 X obs + β 2 Z + ε ,

with the error term ε ∼ N ( 0 , σ Y 2 ) where β 0 , β 1 , and β 2 are coefficients and σ Y 2 is the variance of the error term in the outcome model.

– Equation (4) Y = β 0 + β 1 X true + β 2 Z + ε assumes the predictor variable X true , which is observed without error.

– Equation (5) Y = β 0 + β 1 ( X true + W ) + β 2 Z + ε introduces measurement error W into X true , simulating scenarios where observed data have some error.

– Equation (6) Y = β 0 + β 1 X obs + β 2 Z + ε uses X obs , the observed value of the predictor, which include inherent measurement error W .

Definitions of variables:

Y : outcome (dependent variable),
X true : true predictor variable (latent),
X obs : observed predictor variable,
W : measurement error, W ∼ N ( 0 , σ W 2 ) ,
Z : additional covariate,
ε : error term, ε ∼ N ( 0 , σ ε 2 ) .

2.6 Process model (likelihood)

The relationship between the outcome Y and our predictors, including the true variable X true and some nonlinear function of Z , is given by

(7) Y = β X obs + f ( Z ) + η ,

where f ( Z ) is a semiparametric function of Z and η ∼ N ( 0 , τ 2 ) .

Equation (7) extends the outcome model by introducing a nonlinear term for the covariate Z :

Y = β X obs + f ( Z ) + η ,

where f ( Z ) is a semi-parametric function that captures the nonlinear effects of Z on Y , and η is an additional error term, η ∼ N ( 0 , τ 2 ) . This semi-parametric function f ( Z ) allows the model to account for complex relationships that are not well-captured by a simple linear term.

In this context, f ( Z ) is a semi-parametric function learned from the data using flexible estimation methods such as Gaussian processes. This allows f ( Z ) to adapt to the data and capture nonlinear relationships between Z and Y , which would be missed by a purely parametric model. The flexibility of f ( Z ) helps improve the model’s accuracy in complex scenarios.

2.7 Bayesian integration using Bayes’ theorem

In the context of our hierarchical models

(8) P ( β , f ( z ) ∣ x obs , Y ) ∝ P ( Y ∣ β , f ( z ) , x obs ) P ( β , f ( z ) ∣ x obs ) .

Therefore,

(9) P ( β , f ( z ) ∣ x obs , Y ) = ∫ Ω W P ( Y ∣ β , f ( z ) , W ) P ( β , f ( z ) ∣ W ) P ( W ∣ X obs ) d W ,

where W is the true covariate. We consider a model where outcome Y is influenced by Z and W follows a semi-parametric form based on error-prone predictors X .

2.8 Bayesian inference

To infer the unknown parameters, we employed a MCMC sampling method, such as the Gibbs sampler. In each iteration of the Gibbs sampler, we update the model parameters.

2.8.1 MCMC

The MCMC method is a powerful mathematical technique for sampling from complex, high-dimensional probability distributions. Its core concept involves constructing a Markov chain with the target distribution as its stationary distribution and then simulating the chain for numerous iterations to obtain samples from the desired distribution.

As the name implies, MCMC techniques generate chains where each value generated depends on preceding values. In a Markov chain, the probability of transitioning to a specific state at a given point in the sequence depends solely on the preceding state in the chain, denoted as θ ( ′ ) , and is thus conditionally independent of all other preceding values

(10) θ ( 0 ) , θ ( 1 ) , … , θ ( T − 1 ) .

This can be expressed as

(11) P ( θ ( t + 1 ) ∈ D ∣ θ ( 0 ) , θ ( 1 ) , … , θ ( t ) ) = P ( θ ( t + 1 ) ∈ D ∣ θ ( t ) ) .

2.9 Posterior distribution

To obtain the posterior distribution of the parameters, the following steps are performed iteratively in the Gibbs sampler:

Update the distribution of β given X obs , Y , and the current values of the other parameters.
Update the distribution of f ( z ) given X obs , Y , and the current values of the other parameters.
Update the distribution of the error terms W and ε .

2.9.1 Model application

The developed HBS models were applied to simulation studies in determining optimal fertilizer application levels. The data used in this application include information on soil characteristics, land size, and fertilizer application levels.

2.9.2 Model validation

The performance of the models was evaluated using root mean squared error (RMSE), Akaike information criterion, and Bayesian information criterion. These metrics help in assessing the accuracy and efficiency of the models.

2.9.3 Simulation study design

Data were simulated from variants of the following Gaussian process:

(12) y i = β 0 + β 1 x 1 i * + β 2 x 2 i * + β 3 x 3 i * + β 4 x 4 i * + ε i ,

(13) x i * = x i + u i ,

(14) ε i ∼ N ( 0,1 ) ,

(15) u i ∼ N ( 0 , σ u 2 ) .

We set parameters β 0 = 0 , β 1 = 1.65 , β 2 = 1.35 , β 3 = 0.7 , β 4 = − 0.12 with σ u = 0 , 1, 2, 3, 4, 5.

The simulation study aimed to evaluate the impact of measurement error on the independent variables (regressors) in a regression analysis. By introducing controlled measurement errors into the regressors x 1 , x 2 , x 3 , and x 4 , the simulation study assessed how these errors influenced the accuracy and coverage of the regression model’s estimates.

3 Results

3.1 Empirical quantiles and density shapes

The simulation results indicate that the HBS models effectively approximate empirical quantiles and density shapes. The models demonstrate improved accuracy with increased sample size, suggesting that larger datasets enhance the reliability of parameter estimates.

As can be seen in Figure 1 and Tables 1, 2, 3, and 4 the effect of measurement error is to attenuate the regression coefficients and cause biases in parameter estimates, resulting in systematic deviations from true values and potentially misleading conclusions. Furthermore, when data are subject to measurement errors, the uncertainty of outcomes increases. This greater uncertainty may necessitate broader confidence intervals, thereby affecting the results’ statistical significance and reliability as shown in Tables 1–4.

$Figure 1 Estimate of β \beta ’s with measurement error variability. Source: Created by the authors.$

Figure 1

Estimate of β ’s with measurement error variability. Source: Created by the authors.

Table 1

Assessing the impact of measurement error of x 1 on β 1

σ u 2	β 1	β ˆ 1	Standard deviation (SD)	Bias	Mean squared error (MSE)	RMSE	Coverage (%)
0	1.65	1.60	0.00	0.00	0.00	0.00	83.30
1	1.65	1.47	0.02	− 0.13	0.02	0.13	23.80
2	1.65	0.69	0.11	− 0.91	0.85	0.92	0.00
3	1.65	0.35	0.06	− 1.25	1.56	1.25	0.00
4	1.65	0.24	0.05	− 1.36	1.86	1.37	0.00
5	1.65	0.14	0.04	− 1.46	2.12	1.46	0.00

Table 2

Assessing the impact of measurement error of x 2 on β 2

σ u 2	β 2	β ˆ 2	SD	Bias	MSE	RMSE	Coverage (%)
0	1.35	1.34	0.00	0.00	0.00	0.00	83.30
1	1.35	1.56	0.11	0.22	0.06	0.25	56.00
2	1.35	4.49	0.29	3.15	9.99	3.16	0.00
3	1.35	5.32	0.15	3.98	15.90	3.99	0.00
4	1.35	5.60	0.13	4.26	18.17	4.26	0.00
5	1.35	5.81	0.09	4.47	20.03	4.48	0.00

Table 3

Assessing the impact of measurement error of x 3 on β 3

σ u 2	β 3	β ˆ 3	SD	Bias	MSE	RMSE	Coverage (%)
0	0.70	0.68	0.00	0.00	0.00	0.00	83.30
1	0.70	0.68	0.01	0.00	0.00	0.01	81.80
2	0.70	0.39	0.04	− 0.29	0.08	0.29	0.00
3	0.70	0.30	0.02	− 0.38	0.15	0.38	0.00
4	0.70	0.27	0.01	− 0.41	0.17	0.41	0.00
5	0.70	0.26	0.01	− 0.42	0.18	0.42	0.00

Table 4

Assessing the impact of measurement error of x 4 on β 4

σ u 2	β 4	β ˆ 4	SD	Bias	MSE	RMSE	Coverage (%)
1	−0.12	−0.12	0.00	0.00	0.00	0.00	74.90
2	−0.12	−0.06	0.01	0.06	0.00	0.06	0.00
3	−0.12	−0.04	0.00	0.08	0.01	0.08	0.00
4	−0.12	−0.03	0.00	0.09	0.01	0.09	0.00
5	−0.12	−0.03	0.00	0.09	0.01	0.09	0.00

In Figure 1, “Linear” refers to estimates derived from a linear regression model where the relationship between the covariates and the outcome is assumed to be linear. “Smooth” refers to estimates from a semi-parametric model where the covariates are allowed to have nonlinear relationships with the outcome through smooth functions (Gaussian process). The blue dots represent the estimated values of the coefficients β i under different simulation settings, illustrating how measurement error impacts the coefficient estimates [7].

This figure displays how the estimates of regression coefficients ( β 1 , β 2 , β 3 , and β 4 ) change as the variability in measurement error ( σ u 2 ) increases from 0 to 5.

β 1 : As σ u 2 increases, β 1 estimates decrease, showing an increasing bias. This demonstrates that higher measurement error significantly impacts the accuracy of β 1 estimates, leading to an underestimation.
β 2 : The estimates for β 2 also show an increasing bias with higher measurement error, but the trend is more pronounced compared to β 1 .
β 3 : Similar to β 1 and β 2 , the β 3 estimates become more biased as σ u 2 increases, indicating a strong negative impact of measurement error on its accuracy.
β 4 : The estimates for β 4 display a decreasing trend with increasing measurement error, reflecting a consistent underestimation.

This figure highlights the critical need for accurate data collection and error correction methods to maintain the reliability of regression coefficient estimates [9].

3.2 Parameter estimates

The parameter estimates obtained from the HBS models are robust under varying levels of measurement error. The models consistently provide accurate estimates of the true parameter values, demonstrating their ability to handle measurement errors effectively.

These tables provide detailed numerical insights into how measurement error affects the bias, MSE, RMSE, and coverage of the regression coefficient estimates for β 1 , β 2 , β 3 , and β 4 .

Table 1 (impact on β 1 ): Shows increasing bias and MSE with higher measurement error, indicating reduced precision and accuracy.
Table 2 (impact on β 2 ): Demonstrates a significant increase in bias and MSE with higher σ u 2 , reflecting a strong impact on estimation accuracy.
Table 3 (impact on β 3 ): Indicates that higher measurement error leads to substantial biases, increasing MSE, and reducing the reliability of estimates.
Table 4 (impact on β 4 ): Reveals that the estimates for β 4 become increasingly biased and less accurate with higher measurement error, similar to the other coefficients.

These tables collectively emphasize that measurement errors can significantly distort parameter estimates, making it crucial to address these errors in statistical analyses.

3.3 Convergence and posterior distributions

The Gibbs sampling method ensures convergence to the posterior distribution, as evidenced by the stability of the parameter estimates across iterations. The posterior distributions of the parameters are well-defined, indicating that the models capture the underlying data structure accurately.

Figure 2 shows evidence of good mixing and convergence of three independent MCMC chains. The different colors in Figure 2 represent the varying levels of measurement error ( σ u 2 ). Each color corresponds to a specific level of noise introduced into the predictor variable, with darker colors indicating higher levels of error. This allows us to visually compare how increasing measurement error affects the parameter estimates. The auto-correlation function (ACF) for most parameters shows early decay (Figure 3), an indication of model convergence [9].

Figure 2

History of parameter estimates. Source: Created by the authors.

$Figure 3 ACF of β \beta ’s with measurement error variability. Source: Created by the authors.$

Figure 3

ACF of β ’s with measurement error variability. Source: Created by the authors.

4 Expansion of computational comparisons: Detailed mathematics and simulation design

4.1 Mathematical expansion for computational comparisons

For more comprehensive computational results, we applied the hierarchical Bayesian approach to a Lotka–Volterra predator–prey model, which is a well-known nonlinear dynamical system.

4.1.1 Lotka–Volterra model

The Lotka–Volterra equations describe the interaction between two species: predators and prey. The system is governed by the following ODEs:

d x d t = α x − β x y , d y d t = δ x y − γ y ,

where

x ( t ) : prey population at time t ,
y ( t ) : predator population at time t ,
α : prey birth rate,
β : predation rate (rate at which predators consume prey),
γ : predator death rate,
δ : rate at which predators increase by consuming prey.

Purpose of the model:

The Lotka–Volterra model captures the cyclical interaction between prey and predators, where

The prey population increases without bound if there are no predators.
The predator population decreases if there is no prey.
The populations oscillate when both are present, with the predator population lagging behind the prey population.

4.1.2 Introducing measurement error

In a real-world scenario, the observed populations x obs ( t ) and y obs ( t ) may contain measurement errors. We model the observed values as:

x obs ( t ) = x ( t ) + ε x , y obs ( t ) = y ( t ) + ε y ,

where

ε x ∼ N ( 0 , σ x 2 ) represents the measurement error in the prey population,
ε y ∼ N ( 0 , σ y 2 ) represents the measurement error in the predator population.

4.2 Simulation design process

To assess the performance of the hierarchical Bayesian approach, we follow these steps:

Simulate true data from the Lotka–Volterra model
- Use known parameters for α , β , γ , and δ .
- Solve the differential equations to obtain the true prey and predator populations over time.
Introduce measurement error
- Add Gaussian noise (measurement error) to the simulated true populations to create observed data ( x obs ( t ) and y obs ( t ) ).
Hierarchical Bayesian estimation
- Use the observed data to estimate the parameters α , β , γ , and δ through a hierarchical Bayesian model.
- Incorporate prior distributions for the parameters and use the likelihood derived from the Lotka–Volterra equations.
- Perform MCMC simulations to estimate the posterior distributions of the parameters.
Evaluate the results Compare the true parameter values with the posterior estimates to evaluate the robustness of the hierarchical Bayesian method in the presence of measurement error.

Figure 4 displays the parameter estimates for α , β , γ , and δ when no measurement error is introduced into the observed data. These parameters represent the interactions between the prey and predator populations in the Lotka–Volterra model. The results show accurate recovery of the true parameter values, demonstrating that the hierarchical Bayesian model performs well in estimating population dynamics when the data are accurate.

Figure 4

Parameter estimates with no error scenario. Source: Created by the authors.

The parameter α represents the birth rate of the prey population, and the accurate recovery of α suggests that the model can correctly capture the growth dynamics of the prey in the absence of predation. Similarly, the parameter β , which reflects the rate at which predators consume prey, is well-estimated, indicating that the predator–prey interaction is accurately modeled.

The estimates for γ , the death rate of the predator population, and δ , the rate at which predators increase by consuming prey, also show minimal bias. This confirms that the hierarchical Bayesian model captures the cyclical relationship between prey and predators effectively when the observed data are free from noise.

However, when measurement error is introduced, as shown in the subsequent figures, the accuracy of the parameter estimates for both prey and predator populations deteriorate. Specifically, the estimates for α and β become more biased, reflecting the impact of noise on the prey growth and predation rates. This increased bias leads to uncertainty in the model’s predictions of predator–prey dynamics, as indicated by the wider credible intervals.

In summary, when no measurement error is present, the model accurately estimates the population dynamics between prey and predators. However, the introduction of measurement errors increases bias and uncertainty, especially in the interaction terms ( β and δ ), affecting the reliability of the parameter estimates and predictions of the prey-predator relationship.

Figure 5 illustrates the parameter estimates for α , β , γ , and δ when measurement errors are introduced into the observed data for the prey and predator populations. The presence of measurement error significantly impacts the model’s ability to accurately estimate the dynamics of the predator–prey system.

Figure 5

Parameter estimates with measurement error scenario. Source: Created by the authors.

Impact on prey population: The parameter α , which represents the prey birth rate, becomes more difficult to estimate accurately when noise is introduced into the observed prey population. The bias in the estimation of α increases, and the credible intervals become wider, reflecting greater uncertainty in the model’s predictions for prey population growth. As a result, the hierarchical Bayesian model struggles to capture the true growth dynamics of the prey when the data contain measurement errors.

Similarly, the parameter β , which models the rate at which predators consume prey, is significantly affected by the measurement error. The estimates for β show larger bias compared to the no-error scenario, indicating that the interaction between predators and prey is more sensitive to data noise. This sensitivity can lead to overestimation or underestimation of the predation rate, depending on the extent of the error in the observed data.

Impact on predator population: The parameters γ and δ , which describe the predator population dynamics, are also impacted by the introduction of measurement error. The parameter γ , representing the predator death rate, shows a moderate increase in bias, but its estimation remains relatively stable compared to α and β . However, the parameter δ , which governs the rate at which predators increase by consuming prey, shows a significant increase in uncertainty. The credible intervals for δ are wider, reflecting the difficulty in estimating predator growth when the observed data contain noise.

Overall impact: The introduction of measurement errors not only affects individual parameter estimates but also distorts the overall predator–prey interaction captured by the model. As the error increases, the model becomes less reliable in predicting the cyclical relationship between prey and predator populations. The posterior distributions of the parameters show larger variance, and the estimated credible intervals become wider, indicating a loss of precision in the parameter estimates.

In conclusion, Figure 5 demonstrates the substantial impact that measurement error has on the accuracy and precision of parameter estimates in the Lotka–Volterra model. The prey birth rate ( α ) and the predator–prey interaction rate ( β ) are particularly sensitive to noise in the data. As a result, accounting for measurement error is critical in ensuring the reliability of the parameter estimates in ecological models like the predator–prey system.

Table 5 summarizes the posterior estimates for the parameters α , β , γ , and δ in the Lotka–Volterra predator–prey model, both in the absence and presence of measurement error. The table provides important insights into the accuracy and uncertainty of the parameter estimates under different scenarios.

Table 5

Summary statistics of the MCMC results

Mean	se_mean	SD	2.5%	25%	50%	75%	97.5%	Effective sample size ( n eff )	Rhat ( R ˆ )
1.0616	0.00044	0.02199	1.0192	1.0468	1.0616	1.0765	1.1043	2454.19	1.0007
0.01480	0.000077	0.00373	0.00736	0.01238	0.01478	0.01735	0.02192	2342.20	1.0010
0.01386	0.00021	0.01361	0.00041	0.00408	0.00954	0.01941	0.05133	4158.54	0.9996
0.03053	0.000021	0.00105	0.02857	0.02981	0.03050	0.03122	0.03268	2493.19	0.9997

Key findings:

Prey birth rate ( α ): The mean estimate for α remains close to the true value in the no-error scenario, with narrow credible intervals, indicating high precision. However, when measurement error is introduced, the SD increases, reflecting greater uncertainty in the estimation of the prey birth rate. The effective sample size ( n eff ) also decreases slightly, indicating reduced efficiency of the MCMC sampling process.
Predator consumption rate ( β ): The parameter β , which governs how quickly predators consume prey, shows a significant increase in uncertainty when measurement error is present. The SD and credible intervals widen, and the mean estimate becomes more biased compared to the no-error scenario. This indicates that the predator–prey interaction term is particularly sensitive to noise in the data.
Predator death rate ( γ ): The mean estimate for γ shows minimal bias in both the error-free and error scenarios. However, similar to other parameters, the introduction of measurement error increases the uncertainty in its estimation, as indicated by the wider credible intervals. Despite this, the high effective sample size ( n eff ) suggests that the MCMC chains for γ are well-mixed, even in the presence of noise.
Predator growth rate ( δ ): The parameter δ , which represents the growth rate of the predator population due to prey consumption, shows increased variability when measurement error is introduced. The wider credible intervals indicate that the model struggles to precisely estimate predator growth dynamics in the presence of data noise. However, the effective sample size remains relatively high, suggesting stable MCMC convergence.
Overall impact: The presence of measurement error leads to a clear increase in the uncertainty of all parameter estimates, as evidenced by the wider credible intervals and increased SD. The results emphasize the importance of accounting for measurement error in ecological models, as failure to do so can result in biased and less precise parameter estimates, especially for interaction terms like β .

In summary, Table 5 highlights the robustness of the hierarchical Bayesian model in estimating the parameters of the predator–prey system. However, it also demonstrates the sensitivity of the model to measurement error, particularly for interaction parameters such as β , which governs the rate at which predators consume prey.

Figure 6 displays the trace plots for the MCMC samples of the parameters α , β , γ , and δ in the Lotka–Volterra predator–prey model. The trace plots provide a visual assessment of the convergence and mixing behavior of the MCMC chains for each parameter, both in the presence and absence of measurement error. The findings are as follows:

Convergence and mixing: All four parameters ( α , β , γ , and δ ) exhibit good mixing in the MCMC chains, as evidenced by the consistent horizontal movement of the chains across iterations. The trace plots show no visible trend or drift, indicating that the chains have converged. This is true for both the no-error and measurement-error scenarios, suggesting that the hierarchical Bayesian model performs well in terms of convergence even when measurement errors are present.
Stability of α : The trace plot for α , representing the prey birth rate, shows stable sampling behavior across iterations, with little variability in the MCMC chains. This suggests that α is well-identified in the model, and the sampling process is efficient, even in the presence of measurement error.
Variability in β : The trace plot for β , the parameter governing the predator’s consumption rate of prey, shows slightly more variability than α , particularly in the measurement-error scenario. This increased variability suggests that β is more sensitive to noise in the data, which is consistent with the increased uncertainty seen in the posterior estimates. However, the trace plot still shows good mixing, indicating that the MCMC chains are exploring the posterior distribution efficiently.
Trace plots for γ and δ : The parameters γ (predator death rate) and δ (predator growth from prey consumption) also show stable and well-mixed trace plots. The sampling behavior for these parameters remains consistent across iterations, with no signs of autocorrelation or poor mixing. This indicates that the MCMC algorithm is sampling effectively from the posterior distributions of γ and δ , even when measurement errors are introduced.
Effective sample size ( n eff ): The effective sample sizes for all parameters are sufficiently large, indicating that the MCMC chains have explored the parameter space efficiently. The high effective sample size suggests that the chains are producing independent samples, further confirming good mixing.

Figure 6

Trace plot. Source: Created by the authors.

In summary, the trace plots in Figure 6 indicate that the hierarchical Bayesian model produces well-mixed and converged MCMC chains for all four parameters ( α , β , γ , and δ ), even when measurement errors are present. This ensures that the posterior estimates obtained from the MCMC sampling are reliable and robust. The increased variability in β suggests that this parameter is more sensitive to noise, but the overall convergence and mixing remain strong.

Figure 7 presents the autocorrelation plots for the MCMC samples of the parameters α , β , γ , and δ in the Lotka–Volterra predator–prey model. Autocorrelation plots are used to assess the degree of correlation between successive samples in the MCMC chains, with lower autocorrelation indicating better chain mixing and efficiency. The findings are as follows:

Low autocorrelation for α : The autocorrelation plot for α , representing the prey birth rate, shows a rapid decay to near-zero autocorrelation after just a few lags. This suggests that the MCMC samples for α are largely independent, with minimal correlation between successive iterations. As a result, the sampling process is efficient, leading to a high effective sample size for this parameter.
Increased autocorrelation for β : The autocorrelation plot for β , the predator’s consumption rate of prey, shows a slower decay compared to α , especially in the presence of measurement error. This indicates that there is some correlation between successive MCMC samples for β , resulting in a slight reduction in the effective sample size. The increased autocorrelation reflects the sensitivity of β to noise in the observed data and suggests that the sampling process for β may be less efficient in the presence of measurement error.
Moderate autocorrelation for γ : The autocorrelation plot for γ , the predator death rate, shows a moderate decay, but still reaches near-zero levels after a small number of lags. This indicates that while there is some correlation between successive samples, it does not significantly affect the sampling process. The effective sample size for γ remains high, indicating that the MCMC chains are producing mostly independent samples.
Autocorrelation for δ : The parameter δ , representing the predator growth rate from prey consumption, shows a similar autocorrelation pattern to γ . The autocorrelation decreases rapidly after a few lags, suggesting that the MCMC samples for δ are largely independent. This indicates that the hierarchical Bayesian model is efficiently sampling the posterior distribution of δ even in the presence of measurement error.
Impact of measurement error: In the presence of measurement error, the autocorrelation for all parameters slightly increases, especially for β . This suggests that measurement error introduces some correlation between successive MCMC samples, reducing the efficiency of the sampling process. However, the autocorrelation still decays to near-zero within a reasonable number of lags, indicating that the MCMC chains remain effective in exploring the posterior distribution.

Figure 7

Autocorrelation plot. Source: Created by the authors.

In summary, the autocorrelation plots in Figure 7 demonstrate that the MCMC samples for all parameters ( α , β , γ , and δ ) exhibit low autocorrelation after a few lags, indicating efficient sampling and good chain mixing. Although measurement error introduces some autocorrelation, particularly for β , the chains remain effective in exploring the parameter space. This ensures that the posterior estimates obtained from the MCMC chains are reliable and well-sampled.

5 Discussion

This study evaluated the performance of HBS models in estimating regression coefficients and parameter values in the presence of measurement error, using the Lotka–Volterra predator–prey system as a case study [19]. The results from both simulation scenarios – one without measurement error and one with increasing levels of measurement error – provide critical insights into the impact of data precision on model accuracy and reliability.

5.1 Effect of measurement error on parameter estimates

The results demonstrate that hierarchical Bayesian models provide highly accurate parameter estimates when data are free from measurement errors, as shown in Figure 4. In this ideal scenario, the model effectively recovers the true values of key parameters such as the prey birth rate ( α ) and the predator–prey interaction rate ( β ). The narrow credible intervals and minimal bias indicate the model’s ability to capture the predator–prey dynamics accurately, reflecting the strength of the hierarchical Bayesian approach in handling nonlinear systems like the Lotka–Volterra model.

However, when measurement error is introduced (Figure 5), the accuracy of the parameter estimates declines, particularly for interaction terms. The estimates for α and β show increased bias, and their credible intervals widen significantly, reflecting the sensitivity of these parameters to noisy data. The predator growth rate ( δ ) also becomes more uncertain, as indicated by the wider credible intervals. These results highlight the challenge of recovering true population dynamics in the presence of measurement error, particularly for parameters that depend heavily on precise data.

As shown in Figure 1 and Tables 1–4, measurement error has a significant impact on parameter estimates. The simulation results demonstrate that measurement error introduces bias into the estimates of regression coefficients ( β 1 , β 2 , β 3 , and β 4 ), systematically shifting them away from their true values. Specifically, as the measurement error variability ( σ u 2 ) increases, the magnitude of bias grows larger. For instance, β 1 shows increasing underestimation, while β 2 and β 3 show both bias and substantial variance.

This highlights the critical need for accurate data collection and robust error correction mechanisms in statistical modeling. When data are subject to high levels of measurement error, as shown in Tables 1–4, the uncertainty in parameter estimates increases. The MSE and RMSE also increase as σ u 2 increases, underscoring the degradation of model performance as the noise in the data rises. This finding is particularly relevant in real-world settings where perfect data collection is often not possible, making error mitigation techniques essential to maintain the reliability of model outcomes.

5.2 Robustness and convergence of the model

Despite the challenges posed by measurement error, the hierarchical Bayesian model demonstrated robust convergence and mixing in both simulation scenarios, as indicated by the trace plots in Figure 6. The MCMC chains for all parameters, including α , β , γ , and δ , exhibited stable and well-mixed behavior, ensuring that the posterior estimates were reliable even when measurement error was present. This suggests that the hierarchical Bayesian approach is highly robust, maintaining strong convergence properties even under less-than-ideal data conditions.

The autocorrelation plots (Figure 7) confirm this robustness, showing a rapid decay in autocorrelation after only a few lags for all parameters, particularly α and γ . While the introduction of measurement error increased autocorrelation for β , the MCMC chains still effectively explored the parameter space, indicating that the sampling process remained efficient even when the data contained noise.

5.3 Comparison of linear and semi-parametric models

The results further reveal the differences in performance between linear and semi-parametric models. As shown in Figure 1, the estimates of regression coefficients under a linear model display a greater sensitivity to measurement error compared to those obtained from a semi-parametric model. The “Smooth” estimates, derived from the HBS model, are less affected by data noise due to their flexibility in capturing nonlinear relationships between covariates and outcomes. This demonstrates the advantage of using semi-parametric models in scenarios where the relationships between variables are complex and susceptible to measurement error.

5.4 Comparison across scenarios

The results across both scenarios reveal the critical role that data quality plays in statistical modeling. As summarized in Table 5, the estimates in the error-free scenario were precise and unbiased, demonstrating the model’s strength when applied to accurate data. In contrast, the presence of measurement error increased both bias and uncertainty, particularly for interaction terms like β . This suggests that the hierarchical Bayesian model is more sensitive to noise when estimating parameters that capture population interactions, compared to those like the predator death rate ( γ ), which showed more stability across both scenarios.

The increased bias and variance in parameter estimates in the measurement error scenario underscore the need for careful data collection and error-correction methods in ecological modeling. Failure to account for measurement error can lead to significant distortions in the estimates, particularly for key interaction terms that define predator–prey relationships.

5.5 Implications for ecological and statistical modeling

The implications of this study extend beyond the predator–prey model, as measurement error is a common issue in ecological and environmental studies where precise data collection is often difficult. Hierarchical Bayesian models, with their ability to incorporate prior knowledge and handle complex data structures, provide a powerful framework for addressing measurement errors. However, as demonstrated in this study, these models are not immune to the impact of noisy data, particularly when estimating interaction terms like β .

In practical applications, where measurement errors are unavoidable, it is essential to account for these errors through proper modeling techniques. The findings from this study emphasize the need for incorporating error-correction methods to ensure the reliability of parameter estimates in ecological systems. By doing so, researchers can maintain the validity of their models and provide more accurate predictions of species interactions and population dynamics.

In summary, the HBS model offers a robust and flexible approach for modeling dynamic systems like the predator–prey model. While measurement error introduces bias and uncertainty, the model remains resilient, providing reliable parameter estimates even in the presence of noisy data. These findings highlight the importance of addressing data imperfections in statistical analyses and underscore the utility of hierarchical Bayesian models in real-world ecological applications.

6 Conclusion

This study investigated the performance of HBS models in estimating the parameters of the Lotka–Volterra predator–prey system under various conditions, including the presence and absence of measurement error. The results provide key insights into the robustness and flexibility of these models, particularly in capturing complex population dynamics.

In an ideal scenario without measurement error, the hierarchical Bayesian model successfully recovered the true values of key parameters, such as the prey birth rate ( α ) and the predator–prey interaction rate ( β ), with minimal bias and narrow credible intervals. These results highlight the model’s ability to handle nonlinear systems like the Lotka–Volterra model effectively, ensuring accurate predictions of population dynamics.

However, the introduction of measurement error significantly impacted the accuracy of parameter estimates. Interaction terms, such as the predator–prey consumption rate ( β ) and the predator growth rate ( δ ), were particularly sensitive to data noise, resulting in increased bias and wider credible intervals. These findings emphasize the need for robust data collection techniques and appropriate error correction methods to minimize the impact of measurement error in practical applications.

Despite these challenges, the hierarchical Bayesian model demonstrated strong convergence and efficient sampling in both error-free and measurement-error scenarios. The MCMC trace and autocorrelation plots confirmed good mixing and stability across all parameters, ensuring reliable posterior estimates even in the presence of noisy data.

The comparison between linear and semi-parametric models further underscores the advantages of the hierarchical Bayesian approach. The semi-parametric model proved more resilient to the effects of measurement error due to its ability to capture nonlinear relationships, making it a better fit for complex ecological systems.

This research has broader implications for ecological and environmental studies where measurement error is a common issue. Hierarchical Bayesian models provide a powerful framework for integrating prior knowledge and handling complex data structures, but they are not immune to the effects of data noise. Addressing measurement error through proper correction methods is critical to ensure accurate and reliable estimates in real-world applications.

In conclusion, HBS models offer a robust and flexible solution for modeling dynamic ecological systems, such as the predator–prey model. While measurement error introduces bias and increases uncertainty, the model’s strong convergence properties and adaptability ensure that it remains a valuable tool in statistical modeling. By employing effective error-correction techniques, researchers can enhance the model’s performance and ensure accurate insights into complex ecological interactions.

Funding information: This research received support from the African Union through the Pan African University Institute for Basic Sciences, Technology, and Innovation.
Author contributions: Amos Kipkorir Langat: conceptualization, methodology, formal analysis, software implementation, manuscript drafting, and supervision. Samuel Musili Mwalili: theoretical development, critical review, and refinement of methodology. Lawrence Ndekeleni Kazembe: model validation, statistical review, and manuscript revision.
Conflict of interest: The authors declare that there are no conflicts of interest regarding the publication of this study.
Ethical approval: This study was conducted in compliance with ethical guidelines and research integrity standards. The research did not involve human participants, animal subjects, or personally identifiable data, and therefore, ethical approval was not required. The authors affirm that this work adheres to principles of academic integrity and transparency.
Data availability statement: Data supporting this study s findings are available from the corresponding author upon reasonable request. Simulated datasets and model code are provided in the supplementary materials.

References

[1] Adinarayana, S., Raju, M. G., Srirangam, D. P., Prasad, D. S., Kumar, M. R., & Veesam, M. R. (2024). Enhancing resource management in precision farming through AI-based irrigation optimization. How Machine Learning is Innovating Today’s World: A Concise Technical Guide, (pp. 221–251). Hoboken, NJ, USA: Wiley. DOI: https://doi.org/10.1002/9781394214167.ch15. Search in Google Scholar

[2] Carroll, R. J., Ruppert, D., Stefanski, L. A., & Crainiceanu, C. M. (2006). Measurement error in nonlinear models: a modern perspective. New York: Chapman and Hall/CRC, DOI: https://doi.org/10.1201/9781420010138. Search in Google Scholar

[3] Curtsdotter, A., Banks, H. T., Banks, J. E., Jonsson, M., Jonsson, T., Laubmeier, A. N., et al. (2019). Ecosystem function in predator–prey food webs-confronting dynamic models with empirical data. Journal of Animal Ecology, 88(2), 196–210, DOI: https://doi.org/10.1111/1365-2656.12892. Search in Google Scholar PubMed

[4] Gelman, A., & Shalizi, C. R. (2013). Philosophy and the practice of Bayesian statistics. British Journal of Mathematical and Statistical Psychology, 66(1), 8–38, DOI: https://doi.org/10.1111/j.2044-8317.2011.02037.x. Search in Google Scholar PubMed PubMed Central

[5] Hunziker, S., Gubler, S., Calle, J., Moreno, I., Andrade, M., et al. (2017). Identifying, attributing, and overcoming common data quality issues of manned station observations. International Journal of Climatology, 37(11), 4131–4145, DOI: https://doi.org/10.1002/joc.5037. Search in Google Scholar

[6] Karkouch, A., Mousannif, H., Al Moatassime, H., & Noel, T. (2016). Data quality in internet of things: A state-of-the-art survey. Journal of Network and Computer Applications, 73, 57–81, DOI: https://doi.org/10.1016/j.jnca.2016.08.002. Search in Google Scholar

[7] Langat, A. K., Mwalili, S. M., & Kazembe, L. N. (2024). Hierarchical Bayesian semi-parametric models for measurement error correction in determining optimal fertilizer application levels. Scientific African, 26, e02423, DOI: https://doi.org/10.1016/j.sciaf.2024.e02423. Search in Google Scholar

[8] Langat, A. K., Mwalili, S. M., & Kazembe, L. N. (2025). Advancing measurement error correction: A systematic review and meta-analysis of hierarchical Bayesian semi-parametric models. Applied and Computational Mathematics, 14(1), 23–36, DOI: https://doi.org/10.11648/j.acm.20251401.13. Search in Google Scholar

[9] Langat, A. K., Mwalili, S. M., & Kazembe, L. N. N. (2024). Mixed effects and semi-parametric Bayesian integration models for measurement error correction in the context of fertilizer application levels: a simulation study. Communications in Mathematical Biology and Neuroscience, 2024, 105, DOI: https://doi.org/10.28919/cmbn/8744. Search in Google Scholar

[10] Langat, A. K., Mwalili, S. M., & Kazembe, L. N. (2024). Mixed effects and semi-parametric Bayesian integration models for measurement error correction in the context of fertilizer application levels: a simulation study. Communications in Mathematical Biology and Neuroscience, 2024, 105. DOI: https://doi.org/10.28919/cmbn/8744. Search in Google Scholar

[11] Langat, A. K., Ofori, M. A., Baranon, M. D., Biftu, D. B., & Mwalili, S. M. (2023). A Bayesian nonparametric modeling approach to settlement patterns of pastoralists population in Kenya. Asian Journal of Probability and Statistics, 25(2), 17–28, DOI: https://doi.org/10.9734/ajpas/2023/v25i2549. Search in Google Scholar

[12] Majumdar, P., Bhattacharya, D., Mitra, S., Solgi, R., Oliva, D., & Bhusan, B. (2023). Demand prediction of rice growth stage-wise irrigation water requirement and fertilizer using Bayesian genetic algorithm and random forest for yield enhancement. Paddy and Water Environment, 21(2), 275–293, DOI: https://doi.org/10.1007/s10333-023-00930-0. Search in Google Scholar

[13] Mashamba, A. (2010). Bayesian optimization and uncertainty analysis of complex environmental models, with applications in watershed management. PhD thesis. Montana State University. Search in Google Scholar

[14] Mwalili, S. M., Lesaffre, E., & Declerck, D. (2005). A Bayesian ordinal logistic regression model to correct for interobserver measurement error in a geographical oral health study. Journal of the Royal Statistical Society Series C: Applied Statistics, 54(1), 77–93, DOI: https://doi.org/10.1111/j.1467-9876.2005.00471.x. Search in Google Scholar

[15] Ribeiro, V. P. (2024). Bayesian modelling of a decision support system for irrigation. Thesis (PhD in Mechanical Engineering) - Faculdade de Engenharia e Ciências, Universidade Estadual Paulista, Guaratinguetá. Search in Google Scholar

[16] Shirley, R., Pope, E., Bartlett, M., Oliver, S., Quadrianto, N., Hurley, P., et al. (2020). An empirical, Bayesian approach to modelling crop yield: Maize in USA. Environmental Research Communications, 2(2), 025002, DOI: https://doi.org/10.1088/2515-7620/ab67f0. Search in Google Scholar

[17] van de Schoot, R., Depaoli, S., King, R., Kramer, B., Märtens, K., Tadesse, M. G., et al. (2021). Bayesian statistics and modelling. Nature Reviews Methods Primers, 1(1), 1, DOI: https://doi.org/10.1038/s43586-020-00001-2. Search in Google Scholar

[18] Wang, Y., & Zou, X. (2020). On a predator–prey system with digestion delay and anti-predation strategy. Journal of Nonlinear Science, 30(4), 1579–1605, DOI: https://doi.org/10.1007/s00332-020-09618-9. Search in Google Scholar

[19] Zhao, J. (2020). Complexity and chaos control in a discrete-time Lotka–Volterra predator–prey system. Journal of Difference Equations and Applications, 26(9–10), 1303–1320, DOI: https://doi.org/10.1080/10236198.2020.1825702. Search in Google Scholar

[20] Zyphur, M. J., & Oswald, F. L. (2015). Bayesian estimation and inference: A user’s guide. Journal of Management, 41(2), 390–420, DOI: https://doi.org/10.1177/0149206313501200. Search in Google Scholar

Received: 2024-06-25

Revised: 2024-12-11

Accepted: 2024-12-12

Published Online: 2025-06-25

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/cmb-2024-0019

Keywords for this article

hierarchical Bayesian semi-parametric models; measurement error; Gibbs sampling; Markov Chain Monte Carlo; predator–prey model; parameter estimation; Bayesian inference

Creative Commons

BY 4.0