Startseite QR.break: An R Package for Structural Breaks in Quantile Regression
Artikel Öffentlich zugänglich

QR.break: An R Package for Structural Breaks in Quantile Regression

  • Zhongjun Qu EMAIL logo , Tatsushi Oka und Samuel Messer
Veröffentlicht/Copyright: 4. Juli 2025
Veröffentlichen auch Sie bei De Gruyter Brill

Abstract

The QR.break package provides methods for detecting, estimating, and conducting inference on multiple structural breaks in linear quantile regression models, based on one or multiple quantiles and applicable to both time series and repeated cross-sectional data. The main function, rq.break(), returns testing and estimation results based on user specifications of the quantiles of interest, the maximum number of breaks allowed, and the minimum length of a single regime. This note outlines the underlying methods and explains how to use the main function with two datasets: a time series dataset on U.S. real GDP growth rates and a repeated cross-sectional dataset on youth drinking and driving behavior. Both datasets are included in the package available on CRAN.

JEL Classification: C21; C22; C31; C52; C63

1 Introduction

The issue of structural breaks has been extensively studied. Much of the literature has focused on the conditional mean, but in many cases, structural change in the conditional quantile function is more informative. For example, in studying income inequality, it is important to assess whether the wage gap between racial groups, conditional on covariates, has changed over time. Inequality may increase conditional dispersion without affecting the mean. Similarly, when evaluating a policy aimed at improving outcomes for low-performing students, attention should focus on lower quantiles. In both cases, it is desirable to estimate break dates from the data: in the former, the cause of change may be unclear a priori; in the latter, the policy effect may exhibit an unknown time lag.

To address these issues, Qu (2008) and Su and Xiao (2008) developed tests for detecting structural change in conditional quantile functions with unknown break dates, but did not consider estimation and inference for the number of breaks or their locations. Oka and Qu (2011) studied the estimation of multiple structural breaks at unknown dates in conditional quantile functions for two models: a time series model, useful for analyzing macroeconomic data, and a repeated cross-section model, relevant for evaluating social programs and policy effects. Oka and Qu’s (2011) framework allows for structural change in single or multiple quantiles. Analyzing multiple quantiles requires stronger assumptions but can improve estimation efficiency.

Key aspects of Oka and Qu’s (2011) procedure are as follows: assuming a known number of breaks, the methods construct estimates of break dates and coefficients as global minimizers of the check function over admissible break points. For multiple quantiles, the check function is integrated over the set of quantiles chosen by the user. The assumptions permit dynamic models and impose restrictions only in neighborhoods around quantiles of interest, leaving other quantiles unspecified. This flexibility allows researchers to examine slices of the conditional distribution without imposing global distributional assumptions. The distributions of the break estimators were derived following Picard (1985) and Yao (1987). These distributions involve consistently estimable parameters, which enables confidence interval construction without simulation. Their paper also proposes a test for the number of breaks based on subgradient methods from Qu (2008). These tests do not require variance estimation and have monotonic power even with multiple breaks. Taken together, Qu (2008) and Oka and Qu (2011) offer the most comprehensive treatment to date of multiple structural breaks in quantile regression.

To date, there has been no published software package in any language to test for, estimate, and conduct inference on multiple structural breaks in linear quantile regression models. This led us to develop an R package that performs these tasks in an easy-to-use fashion. The resulting package, QR.break, is available on CRAN. Its main function, rq.break(), returns all testing and estimation results based on user-specified quantiles of interest, the maximum number of allowed breaks, and the minimum length of a single regime. To use it, the user simply runs:

r q . b r e a k ( y , x , v e c . t a u , N , t r i m . e , v e c . t i m e , m . max , v . a , v . b , v e r b o s e )

with the following arguments:

  1. y: the dependent variable, given as a vector or data frame.

  2. x: the regressors, provided as a matrix or data frame. A column of ones should not be included, as it is automatically added during estimation.

  3. vec.tau: the quantiles of interest, e.g., vec.tau <- seq(0.1, 0.9, by = 0.1).

  4. N: the size of the cross-section; set to 1 for time series data.

  5. trim.e: the minimum length of any regime, expressed as a fraction of the total time span (e.g., trim.e = 0.1).

  6. vec.time: the time indices of the data, used for reporting the estimated break dates.

  7. m.max: the maximum number of breaks allowed.

  8. v.a, v.b: significance levels over a pre-specified set for testing and estimation, respectively.

  9. verbose: set to 1 to display estimation results in the R console; the default is 0.

We illustrate the application of the function through two empirical examples. The first revisits the “Great Moderation” in U.S. GDP growth using quarterly data. Results suggest the decline in volatility mainly affected the upper tail, with the median and lower quantiles remaining stable. This implies that expansions became less rapid, while recessions remained as severe. The second application analyzes blood alcohol levels of young drivers in California from 1983 to 2007. Two breaks are detected, consistent with the 1984 National Minimum Drinking Age Act and a 1991 beer tax increase. The effects are more pronounced at lower quantiles, indicating a greater impact on lighter drinkers than on heavier ones. These findings are documented in Oka and Qu (2011). Here, the focus is on the detailed steps of application of the methods to obtain these results.

Below, we first outline the methods implemented in the package, followed by a detailed description of their application to the two empirical examples.

2 Methods for Estimation and Inference

This section explains: (1) the model and the econometric issues of interest; (2) the main steps for estimating the break locations and regression coefficients when the number of breaks is known, based on a single quantile; (3) the main steps for estimating the break locations and regression coefficients when the number of breaks is known, based on multiple quantiles; (4) a procedure for determining the number of breaks; and (5) inference on quantile regression coefficients after estimating the breaks.

2.1 Econometric Models and Issues of Interest

The methods apply to both time series and repeated cross-sectional data. We begin with time series data.

2.1.1 Time Series Models

Let y t be a random variable, x t a p-dimensional vector of covariates, and Q y t ( τ | x t ) the conditional τ-quantile of y t given x t , where t is the time index. Let T denote the sample size. Assume the conditional quantile function is linear and potentially affected by m structural breaks:

(1) Q y t ( τ | x t ) = x t β 1 0 ( τ ) , x t β 2 0 ( τ ) , x t β m + 1 0 ( τ ) , t = 1 , , T 1 0 ,         t = T 1 0 + 1 , , T 2 0 , t = T m 0 + 1 , , T , 

where τ ∈ (0, 1), β j 0 ( τ ) ( j = 1 , , m + 1 ) are the unknown parameters, and T j 0 ( j = 1 , , m ) are the unknown break dates. The regressors x t may include discrete as well as continuous variables. A column of ones is automatically added to the regression when applying the methods. The following examples, taken from Oka and Qu (2011), illustrate the model.

Example 1

Oka and Qu (2011) studied the following model for U.S. quarterly real GDP growth rates for the period 1947:Q2 to 2009:Q2:

Q y t ( τ | y t 1 , , y t p ) = μ j ( τ ) + i = 1 p α i , j ( τ ) y t i    t = T j 1 0 + 1 , , T j 0 .

They detect a structural break in 1984 that affects only the upper quantiles of the distribution. The coefficient estimates suggest that growth was slower during expansions, while recessions remained just as severe when they occurred. This dataset is included in the package, and we will use it in Section 3 to illustrate the application of the models in a time-series setting.

Example 2

Cox et al. (1985) considered the following model for a short-term interest rate r t : d r t = α + β r t d t + σ r t 1 / 2 d W t , where W t is a standard Brownian motion process. The model has the following discrete-time approximation: r t + 1 r t = α + β r t + σ r t 1 / 2 u t + 1 , with ut+1i.i.d.N(0, 1), implying that the τth conditional quantile of rt+1 given r t is given by Q r t + 1 ( τ | r t ) = α + ( 1 + β ) r t + σ r t 1 / 2 F u 1 ( τ ) , where F u 1 ( τ ) is the τth quantile of N(0,1). The methods in the package can be used to estimate structural changes in α, β and σ assuming that the quantile fucntion is given by the discrete-time approximation.

Example 3

Chernozhukov and Umantsev (2001) studied Value-at-Risk using Q y t ( τ | x t ) = x t β ( τ ) , where y t is an asset return, and x t is a vector of information variables that may include returns on other securities, lagged values of y t , and proxies for volatility. An important question is whether this risk relationship undergoes structural changes over time. The methods in this package can be applied to address this question without specifying the change points a priori.

Example 4

Piehl et al. (2003) applied structural change methods to evaluate the effect of the Boston Gun Project on youth homicides, allowing the break date to be unknown to account for potential policy delays. Their focus was on structural change in the conditional mean; however, the methods in this package could be used to analyze structural changes along the distribution.

2.1.2 Models for Repeated Cross-Sections

For repeated cross-sectional data, let y it be a random variable, x it a p-dimensional vector of covariates, and Q y i t ( τ | x i t ) the conditional quantile of y it given x it , where i and t are individual and time indices. Let N be the number of cross-sectional units, assumed constant over time, and T the number of time periods. The conditional quantile is potentially affected by m breaks:

(2) Q y i t ( τ | x i t ) = x i t β 1 0 ( τ ) , x i t β 2 0 ( τ ) , x i t β m + 1 0 ( τ ) , t = 1 , , T 1 0 ,         t = T 1 0 + 1 , , T 2 0 , t = T m 0 + 1 , , T ,

where i = 1, …, N; τ ∈ (0, 1); β j 0 ( τ ) ( j = 1 , , m + 1 ) are the unknown parameters; T j 0 ( j = 1 , , m ) are unknown break dates; and x it can may both discrete and continuous variables. A column of ones is automatically added to the regression. The following application, taken from Oka and Qu (2011), illustrates the model.

Example 5

Motor vehicle crashes are the leading cause of death among youth aged 15–20, a high proportion of which involve drunk driving. Blood alcohol concentration (BAC) is a key measure of alcohol impairment, and changes in BAC among young drivers provide useful information on how young drivers’ drinking behavior has changed over time. Motivated by this, Oka and Qu (2011) studied structural change in BAC among young drivers involved in traffic accidents using the following model:

Q y i t ( τ | x i t ) = μ j ( τ ) + x i t γ j ( τ )      t = T j 1 0 + 1 , , T j 0 ,

where y it is the BAC level. The regressors are age, gender, and a dummy variable for the fourth quarter. They detect breaks in 1985 and 1992. The changes are negative and meaningful in magnitude. However, the change is smaller for higher quantiles, suggesting that the policies are more effective for “light drinkers” than for “heavy drinkers” in the sample. This is encouraging but falls short of expectations, as heavy drinkers are more likely to cause accidents, suggesting that additional policies are needed to deter heavy drinking. This dataset is included in the package, and later in Section 3, we will use it to illustrate the application of the models in a cross-sectional setting.

2.1.3 Econometric Issues of Interest

The methods in this package address the following issues:

  1. Estimation based on a single quantile when the number of breaks is known. The method estimates both the break locations and the regression coefficients. If the user specifies more than one quantile level, the analysis is performed independently for each quantile, allowing break locations to differ across quantiles. The program returns the estimated break locations, their confidence intervals, and the corresponding estimates and intervals for the regression coefficients.

  2. Estimation based on multiple quantiles when the number of breaks is known. In this case, the break locations are assumed to be common across quantiles and are estimated using information from all specified quantiles. The program returns the estimated break locations, their confidence intervals, and the corresponding estimates and intervals for the regression coefficients.

  3. Selection of the number of breaks. The user specifies the maximum number of breaks, and the program determines the number of breaks using a dynamic programming algorithm.

The package is structured so that a main function performs all of these tasks. This will be demonstrated through empirical applications. Next, we describe the methods, starting with estimation of the model when the number of breaks is known, based on a single quantile.

2.2 Estimating Break Locations Based on a Single Quantile

Consider time series data first. Suppose that the τth quantile is affected by m structural changes. Define the following function for a set of candidate break dates T b = (T1, …, T m ):

(3) S T ( τ , β ( τ ) , T b ) = j = 0 m t = T j + 1 T j + 1 ρ τ ( y t x t β j + 1 ( τ ) ) ,

where ρ τ (u) is the check function given by ρ τ (u) = u(τ − 1(u < 0)), see Koenker (2005); β(τ) = (β1(τ)′, …, βm+1(τ)′)′; T0 = 0; and Tm+1 = T. We estimate the break dates and coefficients β(τ) jointly by solving

(4) ( β ̂ ( τ ) , T ̂ b ) = arg min β ( τ ) , T b Λ ε S T ( τ , β ( τ ) , T b ) ,

where β ̂ ( τ ) = ( β ̂ 1 ( τ ) , , β ̂ m + 1 ( τ ) ) and T ̂ b = ( T ̂ 1 , , T ̂ m ) . In (4), Λ ɛ denotes the set of possible partitions to ensure that each estimated regime is a positive fraction of the sample:

(5) Λ ε = ( T 1 , , T m ) : T j T j 1 ε T ( j = 2 , , m ) , T 1 ε T , T m ( 1 ε ) T ,

where ɛ is a positive small number specifying the minimum length of a regime. The user will specify ɛ when using the package. For repeated cross-sections, the estimation procedure is similar, except that the objective function S T (τ, β(τ), T b ) is replaced by

S N T ( τ , β ( τ ) , T b ) = j = 0 m t = T j + 1 T j + 1 i = 1 N ρ τ ( y i t x i t β j + 1 ( τ ) ) ,

where an additional summation “ i = 1 N ” is included to incorporate the cross-sectional observations. For both settings, the computation is carried out using a dynamic programming algorithm, as in Bai and Perron (2003), such that the computation is of order O(T2) irrespective of the number of breaks allowed in the model.

2.3 Estimating Break Locations Based on Multiple Quantiles

Suppose that the quantiles in T ω = [ ω 1 , ω 2 ] with 0 < ω1 < ω2 < 1 are affected by structural changes. A natural approach is to consider a partition of this interval and examine a set of quantiles denoted by τ h , h = 1, …, q. The estimation can then be carried out in a similar way as in the single quantile case. Specifically, define the following objective function for a set of candidate break dates T b = (T1, …, T m ) and parameter values β ( T ω ) = ( β ( τ 1 ) , , β ( τ q ) ) :

S T ( T ω , β ( T ω ) , T b ) = h = 1 q j = 0 m t = T j + 1 T j + 1 ρ τ h ( y t x t β j + 1 ( τ h ) ) ,

and solve

(6) ( β ̂ ( T ω ) , T ̂ b ) = arg min β ( T ω ) , T b Λ ε S T ( T ω , β ( T ω ) , T b ) ,

where Λ ɛ has the same definition as in (5). Regarding the partition, we find that a coarse partion, such as quantiles spaced by 0.1, is sufficient to deliver informative results. For repeated cross-sectional data, again the estimation procedure is the same, except that the objective function should be replaced by

S N T ( τ , β ( τ ) , T b ) = h = 1 q j = 0 m t = T j + 1 T j + 1 i = 1 N ρ τ h ( y i t x i t β j + 1 ( τ h ) ) ,

where an additional summation “ i = 1 N ” is included to incorporate the cross-sectional observations. For both settings, the computation is carried out using a dynamic programming algorithm, as in Bai and Perron (2003), such that the computation is of order O(T2) irrespective of the number of breaks allowed in the model.

For all cases discussed above, confidence intervals can be computed based on the limiting distributions of the break point estimates. For example, in the repeated cross-section, multiple-quantile case, the limiting distribution is given by, for j = 1, …, m:

π ̄ j * σ ̄ j * 2 v T 2 T ̂ j T j 0 d arg max s W ( s ) | s | / 2 s 0 σ ̄ j + 1 * / σ ̄ j * W ( s ) π ̄ j + 1 * / π ̄ j * | s | / 2 s > 0 ,

where W(s) is the standard two-sided Brownian motion, and π ̄ j * , π ̄ j + 1 * , σ ̄ j * 2 , and σ ̄ j + 1 * 2 are constants that can be estimated consistently. This distribution has an analytical density function and no simulation is needed to obtain the critical values, therefore reducing the computational cost; see Bai (1995) and Oka and Qu (2011) for details.

2.4 Determining the Number of Breaks

The package use two test statistics SQ τ (for a single quantile) and DQ (for multiple quantiles) proposed in Qu (2008) to determine the number of breaks for the single and multiple quantile cases, respectively. We first briefly explain these two tests and then describe a recommended testing procedure based on them.

2.4.1 Testing for a Single Structural Break

The SQ τ test is designed to detect the presence of a structural break in a given quantile τ:

S Q τ = sup λ 0,1 ( τ ( 1 τ ) ) 1 / 2 H λ , T ( β ̂ ( τ ) ) λ H 1 , T ( β ̂ ( τ ) ) ,

where

H λ , T ( β ̂ ( τ ) ) = t = 1 T x t x t 1 / 2 t = 1 λ T x t ψ τ ( y t x t β ̂ ( τ ) )

and ψ τ (u) = τ − 1(u < 0) if we have a single time series, and

H λ , T ( β ̂ ( τ ) ) = t = 1 T i = 1 N x i t x i t 1 / 2 t = 1 λ T i = 1 N x i t ψ τ ( y i t x i t β ̂ ( τ ) )

with repeated cross-sectional data, β ̂ ( τ ) is the parameter estimate using the full sample assuming no structural change, and . is the sup norm to reveal the strongest evidence against the null hypothesis, i.e. for a generic vector z = ( z 1 , , z k ) , z = max ( z 1 , , z k ) .

The DQ test is designed to detect structural changes in quantiles in an interval T ω :

D Q = sup τ T ω sup λ 0,1 H λ , T ( β ̂ ( τ ) ) λ H 1 , T ( β ̂ ( τ ) ) .

The expressions in the sup norm as defined as before.

2.4.2 Testing l Against l + 1 Breaks

We also need the following tests for the purpose of testing l against l + 1 breaks, labelled as SQ τ (l + 1|l) test and DQ(l + 1|l) test. Suppose a model with l breaks has been estimated with the break estimates denoted by T ̂ 1 , , T ̂ l . These values partition the sample into (l + 1) segments, with the jth segment being [ T ̂ j 1 + 1 , T ̂ j ] . The strategy proceeds by testing each of the (l + 1) segments for the presence of an additional break. We let SQτ,j and DQ j denote the SQ τ and DQ test applied to the jth segment, i.e.,

S Q τ , j = sup λ 0,1 ( τ ( 1 τ ) ) 1 / 2 H λ , T ̂ j 1 , T ̂ j ( β ̂ j ( τ ) ) λ H 1 , T ̂ j 1 , T ̂ j ( β ̂ j ( τ ) ) , D Q j = sup τ T ω sup λ 0,1 H λ , T ̂ j 1 , T ̂ j ( β ̂ j ( τ ) ) λ H 1 , T ̂ j 1 , T ̂ j ( β ̂ j ( τ ) ) ,

where

H λ , T j 1 , T j ( β ̂ j ( τ ) ) = t = T j 1 + 1 T j x t x t 1 / 2 t = T j 1 + 1 λ ( T j T j 1 ) x t ψ τ ( y t x t β ̂ j ( τ ) )

if we have a single time series, and

H λ , T j 1 , T j ( β ̂ j ( τ ) ) = t = T j 1 + 1 T j i = 1 N x i t x i t 1 / 2 t = T j 1 + 1 λ ( T j T j 1 ) i = 1 N x i t ψ τ ( y i t x i t β ̂ j ( τ ) )

for repeated cross-sectional data, where β ̂ j ( τ ) is the parameter estimate using the jth regime. SQ τ (l + 1|l) and DQ(l + 1|l) represent the maximum of the SQτ,j and DQ j over the l + 1 segments:

S Q τ ( l + 1 | l ) = max 1 j l + 1 S Q τ , j , D Q ( l + 1 | l ) = max 1 j l + 1 D Q j .

We reject the null hypothesis in favor of a model with l + 1 breaks if the resulting value exceed the corresponding critical value.

2.4.3 Critical Values

These tests are asymptotically nuisance parameter free, and tables for critical values are provided in Qu (2008). They do not require the estimation of any variance parameter, thus having monotonic power even when multiple breaks are present. Qu (2008) provided a simulation study. The results suggest that these two tests perform favorably compared to Wald-based tests (see Figure 1 in Qu 2008).

2.4.4 The Recommended Procedure

We can use this procedure to determine the number of breaks (we consider the interval T ω and focus on the quantile grid τ 1 , , τ q T ω .

  1. Step 1. Apply the DQ test. If the test does not reject, conclude that there is no break and terminate the procedure. If it rejects, then estimate the model allowing one break. Save the estimated break date and proceed to Step 2.

  2. Step 2. Apply the DQ(l + 1|l) tests starting with l = 1. Increase the value of l if the test rejects the null hypothesis. In each stage, the model is re-estimated and the break dates are the global minimizers of the objective function allowing l breaks. Continue the process until the test fails to reject the null.

  3. Step 3. Let l ̂ denote the first value for which the test fails to reject. Estimate the model allowing l ̂ breaks. Save the estimated break dates and confidence intervals.

  4. Step 4. This step treats the q quantiles separately. Specifically, for every quantile τ h (h = 1, …, q), apply the SQ τ and SQ τ (l + 1|l) tests. Carry out the same operations as in Steps 1 to 3. Examine whether the estimated breaks are in agreement with those from Step 3.

2.4.5 Size Control

Since this is a sequential procedure, it is important to consider its overall rejection error under the null hypothesis. Suppose there is no break, and a 5 % significance level is used. Then, there is a 95 % chance that the procedure will be terminated in Step 1, implying the probability of finding one or more breaks is 5 % in large samples. If there are m breaks with m > 0, then, similarly, the probability of finding more than m breaks will be at most 5 %. The probability of finding less than m breaks in finite samples will vary from case to case depending on the magnitude of the breaks. This means that the overall significance level of this procedure is bounded from above by 5 % under the null hypothesis. It does not over-reject the null hypothesis.

2.5 Inference on Quantile Regression Coefficients after Estimating the Breaks

After determining the break dates, the quantile regression coefficients are estimated conditional on these break dates. This is equivalent to partitioning the sample using the break dates and running standard quantile regression on each subsample. These results are reported automatically when calling the main function of the package.

For inference on coefficients, Oka and Qu (2011) showed that if the break size is of higher order than T−1/2, so that they are not confounded with estimation uncertainty of order T−1/2, then the break dates are estimated fast enough that the asymptotic distribution of the estimated coefficients is the same as if the break dates were known – a familiar result in the structural break literature. This result permits computing confidence intervals for the quantile regression coefficients treating the estimated break dates as known. Below, we outline the asymptotic distribution and explain how it is computed in the package.

For any given quantile τ and for the coefficients in the jth regime, the asymptotic distribution is given by

T β ̂ j ( τ ) β j 0 ( τ ) d N 0 , V j ,

with

V j = τ ( 1 τ ) H j 0 ( τ ) 1 J j 0 H j 0 ( τ ) 1 λ j 0 λ j 1 0 2 , for  j = 1 , , m + 1 ,

where λ j 0 = T j 0 / T is the break fraction, and H j 0 ( τ ) and J j 0 are the limits of two second-moment matrices:

1 T j 0 T j 1 0 t = T j 1 0 + 1 T j 0 f t ( F t 1 ( τ ) ) x t x t p H j 0 ( τ ) , and 1 T j 0 T j 1 0 t = T j 1 0 + 1 T j 0 x t x t p J j 0 ,

where f t ( F t 1 ( τ ) ) is the conditional density of the dependent variable at the corresponding quantile. To construct confidence intervals, these limiting expressions are replaced in the package by their consistent estimates:

H ̂ j ( τ ) = 1 T ̂ j T ̂ j 1 t = T ̂ j 1 + 1 T ̂ j f ̂ t ( F t 1 ( τ ) ) x t x t , J ̂ j = 1 T ̂ j T ̂ j 1 t = T ̂ j 1 + 1 T ̂ j x t x t .

The density estimate f ̂ t ( F t 1 ( τ ) ) is computed using a difference quotient; see Qu (2008, pp. 176–177) for details. Instead of using the package, users can also obtain the same confidence intervals directly by using the quantreg package with the standard error option set to ‘nid’.

3 Applications

The section illustrates how to use the main function using: a time series dataset on U.S. real GDP growth rates and a repeated cross-sectional dataset on youth drinking and driving behavior. Both datasets are included in the package.

3.1 Time Series Example

After loading the package using library(QR.break), the US GDP data can be loaded using the command

> data(gdp)

Moreover, the dataset can be viewed by running:

> gdp
yq gdp lag1 lag2
1947 Q4 6.06552 −0.32868 −0.61348
1948 Q1 6.35664 6.06552 −0.32868
2009 Q1 −6.58992 −5.48400 −2.70336
2009 Q2 −0.73980 −6.58992 −5.48400
The yq column contains the dates, the gdp column contains the dependent variable, and the remaining two columns contain the first and second lagged values of gdp to be used as regressors.

3.1.1 Using the Function

Below is the main function to estimate this model:

> result = rq.break(y, x, vec.tau, N, trim.e, vec.time, m.max, v.a, v.b, verbose)

We now take a closer look at the inputs to this function. The following commands define the y and x variables in the quantile regression:

> y = gdp[,"gdp"]

> x = gdp[, c("lag1", "lag2")]

A column of ones are always added to the regressors when estimating the model. Therefore, in this case, the model has three parameters that are allowed to be affected by structural breaks: the intercept, the coefficient for the first lag, and that for the second lag.

The next command specifies the quantiles of interest:

> vec.tau = seq(0.20, 0.80, by = 0.15)

Using these inputs, the function will perform two sets of calculations. First, it conducts analysis using the quantiles in vec.tau independently, which means that the number of breaks and their locations can be different across quantiles. Then, it carries out the analysis using all quantiles simultaneously. In this case, the breaks are assumed to be common across quantiles, and information across quantiles is pooled together to estimate the break dates.

Since this is a time series regression, the number of cross-sectional units should be set as:

> N=1

The program requires the specification of the minimum length of a regime. This parameter is important because if the regime length is too small, then the model fit might be non-unique and the estimation might pick up spurious breaks. In this example, we can set:

> trim.e = 0.15

This implies a regime is at least 15 % of the sample size. We recommend setting this value between 10 % and 20 % in applications.

An issue related to the minimum length of a regime is the maximum number of breaks allowed. In this example, we can set:

> m.max = 3

This means that we allow at most 3 breaks and therefore 4 regimes for the entire sample period.

The next two parameters are for the significance levels of the analysis. The input v.a controls the significance level used for determining the number of breaks; 1, 2 or 3 for 10 %, 5 % or 1 %, respectively. The input v.b controls the coverage level for the confidence intervals of break dates; 1 or 2 for 90 % and 95 %, respectively. In this example, we can set:

> v.a = 2

> v.b = 2

There are two additional inputs that control the printing and displaying functions while running the code. If we set:

> vec.time = gdp[,"yq"]

then the program will use vec.time as the time index to display the break dates. Here a break date is the final date of the existing regime, not the starting date of a new regime. Alternatively, one can set:

> vec.time = NULL

so that the break dates are expressed in terms of integers, e.g., 50 if the break is estimated to be the 50-th observation of the sample. The input verbose controls whether to automatically display the results in the R console. The default is FALSE, while setting it to TRUE will display the results:

> verbose = TRUE

Either way, the results from estimation and testing are all saved when running the code, and they can be displayed using:

> print(result)

3.1.2 Interpreting the Results

The results consist of two parts: one based on individual quantiles (result$s.out) and one based on all quantiles (result$m.out).

Output based on separate quantiles   The entries in result$s.out are ordered according to vec.tau: first for τ = 0.2, then for τ = 0.35, and so on. For each quantile, the break testing results are shown first; if at least one significant break is detected, the break locations and parameter estimation results follow.

For example, for τ = 0.2, the results are:

> $s.out
$s.out$test_0.2 #testing results at the chosen significance level
1 Breaks 2 Breaks 3 Breaks
SQ test 1.423269 1.373012 0
Critical values 1.529859 1.637547 0
$s.out$nbreak_0.2 #number of breaks detected
[1] 0

For this quantile, the test for the null hypothesis of no break against a single break is equal to 1.423269, while the critical value is 1.529859. The value is insignificant at the chosen level, so no break is detected. The final column is equal to zero because the test is not computed when the previous tests are insignificant at the 10 % level. No break estimation results are produced. The results for other quantiles also show insignificance until the quantile level 0.65. At that point, one significant break is detected, followed by its confidence intervals reported in two ways: first in terms of index values, then in terms of dates.

$s.out$test_0.65 #testing results at the chosen significance level
1 Breaks 2 Breaks 3 Breaks
SQ test 1.817933 1.023126 0
Critical values 1.529859 1.637547 0
$s.out$nbreak_0.65 #number of breaks detected
[1] 1
$s.out$br_est_0.65 #point estimate and confidence interval for break date
Estimate CI_Lower_Bound CI_Upper_Bound
Break 1 147 83 161
$s.out$br_est_time_0.65
Estimate CI_Lower_Bound CI_Upper_Bound
Break 1 "1984 Q2" "1968 Q2" "1987 Q4"
Therefore, one break is detected, with a point estimate of t = 147, corresponding to the second quarter of 1984, with a confidence interval of [83,161], or equivelantly [1968Q2:1987Q4]. For this quantile, the output also includes the estimated coefficients for the two regimes:

$s.out$coef_0.65
$s.out$coef_0.65$Regime_1
Value Std. Error t value Pr(>|t|)
Intercept 4.5169837 0.63003316 7.169438 3.625833e-11
x1 0.4199469 0.09643037 4.354924 2.512308e-05
x2 −0.1051057 0.09458095 −1.111277 2.683010e-01
$s.out$coef_0.65$Regime_2
Value Std. Error t value Pr(>|t|)
Intercept 2.2855100 0.4866286 4.696620 8.711282e-06
x1 0.1724133 0.1218581 1.414869 1.603091e-01
x2 0.2386392 0.1320517 1.807164 7.383669e-02
In this case, the sum of the autoregressive coefficients changes little, but the intercept decreases significantly, indicating a notably lower 0.65 quantile after the break:

$s.out$‘bsize_0.65_Regime_2_minus Regime_1‘
Value Std. Error t value Pr(>|t|)
Intercept −2.2314737 0.7957371 −2.804285 0.005453863
x1 −0.2475336 0.1375265 −1.799898 0.073127343
x2 0.3437448 0.1418386 2.423492 0.016109478

Similarly, the method detects a break in the 0.80 quantile; we omit the details.

Output based on multiple quantiles   The results are structured in a similar way as the single quantile case. Here the break testing and estimation results are based on all chosen quantiles in vec.tau:

> $m.out
$m.out$test_joint #testing results at the chosen significance level
1 Breaks 2 Breaks 3 Breaks
DQ test 1.0275870 0.5892746 0
Critical values 0.9098714 0.9584567 0
$m.out$nbreak_joint #number of breaks detected
[1] 1
$m.out$br_est_joint #point estimate and confidence interval for break date
Estimate CI_Lower_Bound CI_Upper_Bound
Break 1 146 120 147
$m.out$br_est_joint_time
Estimate CI_Lower_Bound CI_Upper_Bound
Break 1 "1984 Q1" "1977 Q3" "1984 Q2"

A single break is detected, as in the analysis based on individual upper quantiles. The rest of the output contains the coefficient estimates and their confidence intervals as in the single quantile case; we omit the details.

In summary, the findings here shed light on the “Great Moderation” debate on U.S. GDP growth. Results suggest the decline in volatility mainly affected the upper tail, with the median and lower quantiles remaining stable. This implies that expansions became less rapid, while recessions remained as severe.

3.1.3 Computational Time

The package uses a dynamic programming algorithm, as in Bai and Perron (2003), to determine the globally optimal break partitions. The resulting computational cost grows with the square of the sample size, regardless of the number of breaks allowed. In this example, the program is expected to finish within a few minutes on a typical desktop computer with a single processor.

The package has built-in critical values for common configurations: for the SQ test when the number of regressors is less than 100, and for the DQ test when the number of regressors is less than 20 and the quantile trimming is symmetric, as in the example. With more than 20 regressors, or when the quantile trimming is asymmetric – as in the cross-sectional example below – the critical values for the DQ test are computed via simulation, which can add a few minutes or more of computational time when running the function.

3.1.4 Potential Error Messages

The function is structured to display error messages if the input configurations are not set up properly. Here we give some examples. Suppose we set:

> trim.e = 0.2

> m.max = 6

The product of these two values exceeds 1 because it is not possible to allow 6 breaks when each regime is at least 20 % of the sample size. Running the code will produce:

Error: m.max*trim.e exceeds 1. This occurs because too many regimes are allowed or the minimum length of a regime is too large. Consider decreasing m.max, trim.e, or both.

As another example, if we set trim.e too small, we will get an error message with a suggestion to increase it:

> trim.e = 0.01

Error: trim.e * nrow(x) must be at least the number of regressors; otherwise, the estimation results are not unique. Consider increasing trim.e.

When an error message is produced, the program will exit with no saved results and the user can modify the input parameters and re-start the program.

3.2 Repeated Cross-Section Example

The main steps and the output structure are both similar to the time series case. In particular, the data can be loaded and the variables defined by running the following commands:

> data(driver)

> Driving_data<-driver

> y <- Driving_data[,"bac"]

> x <- Driving_data[, c("age", "gender", "winter")]

> N <- 108

Here the variable y is the dependent variable, x contains the three regeressors, N is the number of entities in each time period. The data are organized such that the first N rows are for the first time period, the next N rows for the next time period, and so forth.

Other input parameters can be specified as follows, with the same interpretations as in the time series case:

> vec.tau = seq(0.70, 0.85, 0.05)

> trim.e <- 0.05

> vec.time <- unique(Driving_data[,"yq"])

> m.max <- 3

> v.a <-2

> v.b <-2

The function to run is:

> result <- rq.break(y, x, vec.tau, N, trim.e, vec.time, m.max, v.a, v.b)

We now highlight some outputs from the function. As in the time series case, the first part of the output is for analysis based on separate quantiles. The first quantile is τ = 0.7, for which two breaks are detected, along with their confidence intervals:

$s.out$test_0.7
1 Breaks 2 Breaks 3 Breaks
SQ test 5.179373 2.205150 1.339123
Critical values 1.574681 1.679331 1.737955
$s.out$nbreak_0.7
[1] 2
$s.out$br_est_0.7
Estimate CI_Lower_Bound CI_Upper_Bound
Break 1 10 6 15
Break 2 38 35 39
$s.out$br_est_time_0.7
Estimate CI_Lower_Bound CI_Upper_Bound
Break 1 "1985 Q2" "1984 Q2" "1986 Q3"
Break 2 "1992 Q2" "1991 Q3" "1992 Q3"

The breaks are consistent with the 1984 National Minimum Drinking Age Act and a 1991 beer tax increase. The output continues with the parameter estimates for the three regimes; we omit the details. After that, it continues with the next quantile, τ = 0.75, and reports the same types of results.

The results for the multipe-quantile case are saved in

$m.out
$m.out$test_joint
1 Breaks 2 Breaks 3 Breaks
DQ test 2.3734870 1.0105268 0.5995825
Critical values 0.7761262 0.8223711 0.8486304
$m.out$nbreak_joint
[1] 2
$m.out$br_est_joint
Estimate CI_Lower_Bound CI_Upper_Bound
Break 1 9 5 13
Break 2 38 34 39
$m.out$br_est_joint_time
Estimate CI_Lower_Bound CI_Upper_Bound
Break 1 "1985 Q1" "1984 Q1" "1986 Q1"
Break 2 "1992 Q2" "1991 Q2" "1992 Q3"

The rest of the output contains the coefficient estimates for each quantile followed by break size estimates at each quantile. We omit the details.

4 Conclusions

This paper has introduced the QR.break package, which implements a set of tools for detecting and estimating structural breaks in quantile regression models. The methods accommodate both time series and repeated cross-sectional data and allow researchers to work with a single quantile or a range of quantiles. Estimation is based on a dynamic programming algorithm, and confidence intervals for break dates are derived from analytical limiting distributions. The package also includes sequential testing procedures to determine the number of breaks.

There are several directions in which the current framework can be extended. Allowing break points to depend on covariates, adapting the methods to high-dimensional settings, incorporting partial structural breaks where only a subset of coefficients is allowed to change, or developing versions for panel data with fixed effects are all natural next steps. We hope this package provides a useful foundation for researchers working with quantile models in environments where structural stability cannot be taken for granted.


Corresponding author: Zhongjun Qu, Department of Economics, Boston University, Boston, USA, E-mail: 

We are grateful to the co-editor Tong Li and an anonymous referee for helpful comments.


References

Bai, J. 1995. “Least Absolute Deviation Estimation of a Shift.” Econometric Theory 11 (3): 403–36, https://doi.org/10.1017/s026646660000935x.Suche in Google Scholar

Bai, J., and P. Perron. 2003. “Computation and Analysis of Multiple Structural Change Models.” Journal of Applied Econometrics 18 (1): 1–22, https://doi.org/10.1002/jae.659.Suche in Google Scholar

Chernozhukov, V., and L. Umantsev. 2001. “Conditional Value-At-Risk: Aspects of Modeling and Estimation.” Empirical Economics 26 (1): 271–92, https://doi.org/10.1007/s001810000062.Suche in Google Scholar

Cox, J. C., J. E. Ingersoll, and S. A. Ross. 1985. “A Theory of the Term Structure of Interest Rates.” Econometrica 53 (2): 385–407, https://doi.org/10.2307/1911242.Suche in Google Scholar

Koenker, R. 2005. Quantile Regression. Cambridge University Press.10.1017/CBO9780511754098Suche in Google Scholar

Oka, T., and Z. Qu. 2011. “Estimating Structural Changes in Regression Quantiles.” Journal of Econometrics 162 (2): 248–67, https://doi.org/10.1016/j.jeconom.2011.01.005.Suche in Google Scholar

Picard, D. 1985. “Testing and Estimating Change-Points in Time Series.” Advances in Applied Probability 17 (4): 841–67, https://doi.org/10.2307/1427090.Suche in Google Scholar

Piehl, A. M., S. J. Cooper, A. A. Braga, and D. M. Kennedy. 2003. “Testing for Structural Breaks in the Evaluation of Programs.” The Review of Economics and Statistics 85 (3): 550–8, https://doi.org/10.1162/003465303322369713.Suche in Google Scholar

Qu, Z. 2008. “Testing for Structural Change in Regression Quantiles.” Journal of Econometrics 146 (1): 170–84, https://doi.org/10.1016/j.jeconom.2008.08.006.Suche in Google Scholar

Su, L., and Z. Xiao. 2008. “Testing for Parameter Stability in Quantile Regression Models.” Statistics & Probability Letters 78 (16): 2768–75, https://doi.org/10.1016/j.spl.2008.03.018.Suche in Google Scholar

Yao, Y. C. 1987. “Approximating the Distribution of the Maximum Likelihood Estimate of the Change-Point in a Sequence of Independent Random Variables.” The Annals of Statistics 15 (3): 1321–8, https://doi.org/10.1214/aos/1176350509.Suche in Google Scholar

Received: 2025-04-22
Accepted: 2025-06-09
Published Online: 2025-07-04

© 2025 Walter de Gruyter GmbH, Berlin/Boston

Heruntergeladen am 2.10.2025 von https://www.degruyterbrill.com/document/doi/10.1515/jem-2025-0010/html?srsltid=AfmBOooi8Ter-e_8UTd96jN-cAl_M1846XW6qQ_UKMBfjV2IQHGFgsTG
Button zum nach oben scrollen