Home Dynamic Bunching Estimation with Panel Data
Article Publicly Available

Dynamic Bunching Estimation with Panel Data

  • Benjamin M. Marx ORCID logo EMAIL logo
Published/Copyright: May 28, 2024
Become an author with De Gruyter Brill

Abstract

Bunching estimation of distortions in a distribution around a policy threshold provides a means of studying behavioral parameters. Standard cross-sectional bunching estimators rely on identification assumptions about heterogeneity that I show can be violated by serial dependence of the choice variable or attrition related to the threshold. I propose a bunching estimation design that exploits panel data to obtain identification from relative within-agent changes in income and to estimate new parameters. Simulations using household income data demonstrate the benefits of the panel design. An application to charitable organizations demonstrates opportunities for estimating elasticity correlates, causal effects, and extensive-margin responses.

JEL Classification: H00; B40; C23; D04

1 Introduction

Taxes, income eligibility limits, and other policies can change agents’ incentives at threshold values of a choice variable. Some policies create a kink in the budget set at the threshold, while others create discontinuities, or “notches” (Blinder and Rosen 1985; Slemrod 2010). If costs increase above the threshold, the incentive to locate at or below the threshold can create “bunching” in the distribution of the choice variable. Saez (2010) used bunching of incomes around a kink in the individual income tax schedule to estimate the taxable income elasticity. In a literature reviewed by Kleven (2016), bunching estimation has since been employed to quantify behavioral parameters using a variety of kinks and notches in tax and regulatory schedules.

Standard bunching estimation uses cross-sectional data or treats panel data as pooled cross-sections. A graphical example appears in Figure 1, which is explained in detail in Section 3.1. Based on the assumption that responses to the threshold are local, the researcher fits a polynomial to the binned density using observations outside of an “omitted range” near the threshold to estimate a “counterfactual” density with no distortion. Bunching is estimated as observed probability mass in excess of this counterfactual. In the case of a kink, agents are expected to bunch somewhere around the kink. In the case of a notch, the subject of this paper, the discontinuity in the budget set should result in excess mass on one side of the threshold and reduced mass on the other.

Figure 1: 
Standard bunching estimation. Notes:The figure depicts standard bunching estimation with charity data (Panel A) and the smoothed PSID income distribution (Panel B). In both panels, the underlying data are represented by a histogram in blue circles, and each bin is treated as an observation. Bin counts are regressed on a polynomial, which estimates the counterfactual distribution depicted by the red curve, and a dummy variable for each bin in the omitted range indicated by the dashed lines. Excess “bunching” mass is calculated as the sum of coefficients on dummy variables for each bin in the bunching region between the first dashed line and the solid line at the notch. Similarly, the estimated reduction in mass above the notch is the sum of coefficients on dummies for the bins between the solid line and second dashed line. For Panel A, the polynomial has degree 3, the omitted range is $80–130,000, bin width = $1000, and N = 264,770. For Panel B, the polynomial has degree 5, the omitted range of $35–55,000, bin width = $2500, and N = 100,000.
Figure 1:

Standard bunching estimation. Notes:The figure depicts standard bunching estimation with charity data (Panel A) and the smoothed PSID income distribution (Panel B). In both panels, the underlying data are represented by a histogram in blue circles, and each bin is treated as an observation. Bin counts are regressed on a polynomial, which estimates the counterfactual distribution depicted by the red curve, and a dummy variable for each bin in the omitted range indicated by the dashed lines. Excess “bunching” mass is calculated as the sum of coefficients on dummy variables for each bin in the bunching region between the first dashed line and the solid line at the notch. Similarly, the estimated reduction in mass above the notch is the sum of coefficients on dummies for the bins between the solid line and second dashed line. For Panel A, the polynomial has degree 3, the omitted range is $80–130,000, bin width = $1000, and N = 264,770. For Panel B, the polynomial has degree 5, the omitted range of $35–55,000, bin width = $2500, and N = 100,000.

Recent research has challenged the identifying assumptions of bunching estimation. Bertanha, McCallum, and Nathan (2023) and Blomquist et al. (2021) highlight that the polynomial used to extrapolate the counterfactual density in the omitted range assumes a particular functional form. Their critique applies to kinks, as both papers demonstrate that notches can nonparametrically identify the bunching response in a one-period model. I will show, however, that even for notches, the standard design generally fails to identify bunching when there is more than one period. The issue is particularly acute when the notch is in place for multiple periods, but I show that it can be solved with panel data. The designs that I propose also enable estimation of causal effects of bunching.

The issue arises if there is serial dependence in the choice variable. Serial dependence of one period’s income on prior years’ income levels implies that bunching in one period can distort future income at levels far from the notch. For example, suppose that agents’ income grows by a fixed percentage each year, such as in a firm offering a uniform raise to all workers. Some workers might turn down the offer if it would cause their incomes to cross over a notch, such as the income eligibility limit for receiving Medicaid. This response would appear as bunching in the year that these workers could have crossed the notch, but it would also reduce these workers’ incomes when the following year’s raise was offered. If the notch remains in place, then workers might bunch repeatedly, causing bunching to accumulate as others’ incomes approach the notch. In the extreme, if all of the workers bunch permanently, then there will be no observations above the notch with which to estimate the counterfactual. Serial dependence therefore affects not only the interpretation of the estimates but also their bias. As Supplementary Material Table A.1 shows, bunching is often estimated for choice variables that are likely to be serially dependent, such as individual or firm income, number of employees, sales, assets, donations, and consumption of goods such as drugs and electricity.

This paper proposes and evaluates new designs for panel data. I will refer to these designs as “dynamic bunching estimation” because identification is obtained by using within-agent changes in income to account for unobserved heterogeneity. The idea is depicted in Figure 2, which is explained in detail in Section 3.2. Consider a base period and two groups that are at different levels of log income, both of which are well below a notch.[1] For each of these base-year incomes we can observe the distribution of next-year log income, which is centered near the base-year level and exhibits bunching at the notch. Because the groups differ in base-year income, they also differ in terms of the level of growth of log income that would take them to the notch in the next year. Therefore, if the two groups have similar distributions of growth rates, then each provides a counterfactual for the other’s probability of growing by the amount that would take them to the notch. The extent of the distortions can therefore be estimated by comparing the shape of one group’s growth distribution around its next-year notch to the corresponding, undistorted section of the growth distribution among agents in the other group.

Figure 2: 
Illustration of identification in dynamic bunching estimation. Notes:The figure shows the distribution of next-year log income recentered relative to the notch in that year (Panel A) and growth from base-year income to next-year income (Panel B) for households in two illustrative levels of base-year income. The sample is generated from the smoothed PSID income distribution. Due to bunching, the distributions for each group exhibit a spike just below the notch and a depression just above it. The growth distribution of each group is similar except around the notch, which appears in a different part of each distribution. Thus, growth distributions for varying levels of base-year income can be compared to estimate the extent of distortion to the rates of growth that bring households close to the notch.
Figure 2:

Illustration of identification in dynamic bunching estimation. Notes:The figure shows the distribution of next-year log income recentered relative to the notch in that year (Panel A) and growth from base-year income to next-year income (Panel B) for households in two illustrative levels of base-year income. The sample is generated from the smoothed PSID income distribution. Due to bunching, the distributions for each group exhibit a spike just below the notch and a depression just above it. The growth distribution of each group is similar except around the notch, which appears in a different part of each distribution. Thus, growth distributions for varying levels of base-year income can be compared to estimate the extent of distortion to the rates of growth that bring households close to the notch.

The researcher can implement dynamic bunching estimation in a variety of ways. I first propose a simple and transparent ordinary-least-squares extension of the standard design. In this approach, the researcher logs the choice variable that is subject to the notch. Denote this logged variable r it and its growth rate g it = r it+1r it . The researcher can then bin both r it and g it , recognizing that certain combinations of the two result in next-year income r it+1 that will be in a range around the notch. Bins can be made wide enough for this range to include both the bunching range below the notch and the reduced-mass range above it, implying that these bin combinations will include both agents who will bunch and agents who will not. An indicator variable for these bin combinations therefore captures a sample that is not selected on bunching. The indicator can therefore be included in a regression with a polynomial of r it for each bin of g it to estimate the causal effect of approaching the notch on outcomes such as next-year income and whether the agent crosses the notch.

The dynamic OLS implementation offers several benefits. It is easy to execute and has already been employed, based on an earlier draft of this paper, by St.Clair (2016). In addition to obtaining identification despite serial dependence, it allows the researcher to conduct heterogeneity analysis using continuous variables and those that vary over time. It is also straightforward to estimate effects of approaching the notch on longer-run outcomes such as long-run income growth. As the illustrative example above suggests, these analyses can inform interpretation questions such as whether misreporting or foresight influence the quantity of bunching.

This paper also describes two additional implementations of the panel-based strategy. In an appendix, I examine the special case of a one-time notch. For such settings, one need only bin on the choice variable in the period of the notch. As with the multi-period OLS design, an indicator for a bin around the notch that includes both bunchers and non-bunchers can be used to quantify bunching as well as causal effects of bunching on other outcomes. An alternative implementation for the general case uses maximum likelihood estimation. Rather than binning, MLE allows the researcher to estimate the joint distribution of r it and g it as a flexible function of both. As with OLS, the counterfactual is estimated from values of r it+1 outside of an omitted range around the next year’s notch. Bunching is estimated using censoring around the omitted range, with excess mass just below the notch quantifying bunching and reduced mass above it reflecting both bunching and extensive-margin responses or other notch-related attrition such as strategic non-response. It is therefore recommended that researchers use the dynamic MLE approach when there is evidence of notch-related attrition, such as when the reduced mass is found to exceed the excess mass in the bunching range. R code is provided with this paper to facilitate adoption. We know that MLE is the most efficient estimator when the distribution is known, but MLE dynamic bunching estimation can be estimated as a flexible function as with OLS methods to reflect that the true data generating process is rarely known.

This paper provides mathematical examples and simulations to support the proposed methods. The examples show that the existence of a notch in one year can result in inconsistent static bunching estimates in the next year when there are fixed costs of adjustment, smoother forms of serial dependence of income, notch-related attrition, or extensive-margin responses. I show mathematically that a dynamic design can address these issues, and then after explaining the methods in detail, I assess their performance using simulations. To avoid specification of a particular income process, these simulations use draws of household income profiles from the Panel Study of Income Dynamics (PSID). I document the biases in static estimates caused by each of serial dependence in income and notch-related attrition, the latter including extensive-margin responses and effects of the notch on the probability of filing. In contrast, simulations with dynamic estimators produce bias close to zero and coverage rates close to 0.05.

An application to the bunching of charities at a reporting threshold demonstrates benefits of the proposed methods. Charities employ 10 percent of the private workforce in the U.S. (Salamon and Newhouse 2019). Though exempt from taxes, these organizations must file an annual information return with the Internal Revenue Service. Charities bunch below the income eligibility threshold for filing a simplified “EZ” form. Marx (2018) focuses on this application and derives welfare implications as a function of estimable parameters, and the fact that the setting offers publicly available panel data with varying notches provides an ideal illustration of the estimation methods in this paper. OLS implementations of the dynamic design reveal significant heterogeneity in the bunching response and show that the notch permanently reduces the growth of some charities. The MLE implementation of the dynamic design finds significant notch-related attrition (most likely non-filing or late filing) that appears to be at least as important as the bunching response.

Kleven and Waseem (2013) developed theory and techniques for estimating bunching at notches. Their analyses mostly treated the data as repeated cross sections, though one exercise compared the probability of income changes between taxpayers just below tax notches with taxpayers just above them. The methods proposed in this paper utilize panel data by studying income changes of agents who may be starting far from the notch and are not selected on bunching, a distinction that allows for estimation of the causal effect of approaching the notch on income and other variables. This extension of bunching estimation has parallels in the development of simulated tax instruments that use panel data to estimate the taxable income elasticity (Giertz 2008; Gruber and Saez 2002; Kopczuk 2005; Weber 2014). It is relevant to empirical work, as shown by Supplementary Material Table A.1 of publications in leading journals that estimate bunching. Of these 73 papers, more than half estimate bunching at a notch. Half of these papers use panel data and all have multiple periods, but more than half face the potential issue discussed in this paper that the threshold remains at a constant (nominal or inflation-indexed) level of the choice variable.

The designs proposed here encompass some important special cases that have received recent attention. In earlier drafts, I showed how bunching can be used to estimate causal effects and extensive-margin responses (Marx 2012) and evaluated an estimator for the special case of a temporary notch (Marx 2015). Diamond and Persson (2016) also study a temporary notch (the threshold for passing a high-stakes exam in Sweden) and propose a nonlinear-least-squares estimator of the causal effect of bunching.[2] Gelber et al. (2021) apply a regression kink design to obtain a lower bound for extensive-margin responses in settings with intensive-margin frictions and an undistorted pre-period, features that are not required by the methods proposed here. Gelber, Jones, and Sacks (2020) estimate a fixed cost of adjustment by examining whether bunching persisted after elimination of the Social Security earnings test. I show that fixed costs of adjustment are one example of the general phenomenon of serial dependence of income and that this serial dependence introduces bias into static bunching estimates that can be avoided with the techniques proposed here.

Other methodological papers share important commonalities. Aronsson, Jenderny, and Lanot (2022) offer an approach that exploits information in both cross sections and differences, and their simulations show that this can improve precision relative to IV-diff-in-diff and standard bunching estimators. Dynamic bunching estimation also uses both types of information but does not require knowledge of the data generating process. Caetano et al. (2021) propose a “dummy test” of continuity of outcomes at a mass point in a control variable. They derive the probability limit of the coefficient on an indicator for the mass point and decompose this into discontinuities in the error term. Their theoretical result would likely apply to the OLS designs proposed here with minor modifications to account for binning, effects on a range of values, and a treatment indicator that applies to a range rather than a point. This paper focuses on motivating the use of panel data and describing methods for doing so, clarifying assumptions, and testing their performance and those of existing methods.

The paper proceeds as follows. Section 2 lays out a conceptual framework and discusses identification. Section 3 describes the standard estimation approach in the literature and dynamic alternatives that exploit panel data using OLS or MLE. Section 4 presents simulations quantifying the issues with static estimators when there is notch-related attrition or serial dependence and showing that the latter is solved with the dynamic OLS estimator and the former with the dynamic MLE estimator. Section 5 demonstrates the benefits of the proposed tools in the context of a reporting notch for charitable organizations. Section 6 provides concluding remarks, including a summary list of diagnostics that researchers can check when estimating bunching.

2 Conceptual Framework

A simple model can highlight challenges for static bunching estimation and advantages offered by dynamic estimation. These empirical strategies can be used to estimate a variety of parameters of interest. Marx (2018) shows that for the charity reporting application, welfare effects of reforming the notch depend on not only the static intensive-margin response but also the amounts of serial dependence and notch-related attrition. That paper employs a dynamic decision model, but in this paper I focus on estimation by following the familiar model of the taxable income elasticity as in the seminal paper of Saez (2010) and subsequent literature. I consider several processes for income dynamics, while the agent’s optimization problem is static so as to make as stark as possible that problems arise for estimation and not simply interpretation.[3] Examples consider a notch that is moved or removed because these provide the simple benchmark that there should be no bunching after the reform, but the issues also apply to notches that remain in the same location over, perhaps especially so because the distortions can accumulate.

Consider two time periods: a base year, t = 0, and the next year, t = 1. In each period, the model follows the standard setup in the literature. An agent has quasilinear utility u c t , z t , a t = c t a t 1 + 1 / ϵ z t a t 1 + 1 / ϵ over exogenous ability a t and chosen consumption c t and income z t . It will suffice to consider a time-invariant income tax function T z t and an additional lump-sum tax (or cost, such as loss of transfer-program eligibility) T t that is levied when income exceeds a threshold ρ t . The agent maximizes utility subject to the budget constraint T z t + T t + c t = z t . If the notch constraint does not bind, then optimal income z t * = a t 1 T z t * ϵ exhibits constant tax-price elasticity ϵ. The expression for optimal income shows that for a given tax schedule, heterogeneity arises entirely from differences in ability a t but can be fully characterized in terms of the implied z t * . I refer to z t * as “potential income,” which will differ from observed income z t if the agent is bunching.

Bunching occurs when an agent with potential income above the notch can increase utility by reducing observed income to the notch to avoid the tax T t . Bunching will be desirable for potential incomes slightly above the notch but not for those well above the notch. The agent will be indifferent between the bunching income z t = ρ t and non-bunching income for some z t = z t * ρ t such that 1 T ρ t + z t 1 + 1 / ϵ ρ t + z t 1 / ϵ ρ t 1 + 1 / ϵ = z t T t T ρ t + z t ρ t + z t 1 T ρ t + z t 1 + 1 / ϵ (as shown in Supplementary Material A). While this equation does not have an analytical solution, it implicitly defines ϵ as a function of the observable tax schedule and estimable △z t , allowing the researcher to numerically solve for the elasticity ϵ after obtaining an estimate of △z t .[4] Bunching analysis obtains estimates of △z t from the observed distribution of income. Let F z t * z t * and F z t z t denote, respectively, the cumulative distribution functions of potential income and observed income at time t.

This paper examines the empirical implications of changes in income. Income transitions, which can arise because ability changes over time, are determined by the conditional CDF F z 1 * | z 0 * z 1 * | z 0 * , z 0 . Thus, potential income is allowed, in general, to depend on the lagged value of potential income (e.g. a wage that is commensurate with human capital developed), observed income (e.g. a raise equal to a percentage of the prior year’s wage), or both (e.g. income re-timed from one year to the next for tax purposes). Potential income is serially dependent on lagged income if F z 1 * | z 0 * z 1 * | z 0 * , z 0 is not constant in z 0. For example, serial dependence arises mechanically when the variable determining the notch is a stock, such as assets (e.g. Brülhart et al. 2016; Seim 2017).[5]

2.1 Failure of Identification with Cross-Sectional Data Analysis

Within this simple two-period model, I construct several examples in which the static bunching estimator using cross-sectional data is not consistent. Mathematical details for each example are provided in Supplementary Material A. Each example starts with a base year in which the traditional method identifies ϵ. Researchers estimate the distribution F z t z t over some range of income surrounding the notch at z t = ρ t and the reduced-mass range z t ρ t , ρ t + z t from which agents bunch.[6] The width of this reduced range identifies a homogeneous elasticity, as Bertanha, McCallum, and Nathan (2023) and Blomquist et al. (2021) have shown in static models.

In practice, researchers have relaxed the assumption of a homogeneous elasticity but have relied on stronger empirical assumptions. This is because real-world data rarely exhibit the range of zero mass that would be predicted if the elasticity is homogeneous and there are no optimization frictions. However, an average elasticity can be derived from the excess mass or reduced mass (Kleven and Waseem 2013). In the simpler model, one can estimate any of the excess mass, reduced mass, or bunching range.

Here, I will formalize the parametric assumption in the bunching literature. Doing so will allow for a precise comparison of this assumption with the version for dynamic bunching estimation. Denote by f z t z t the PDF of z t .

Assumption S (in static estimation): Choose a continuous function of finite parameters C z t over a range Z which is such that z t ρ t , ρ t + z t , C z t = f z t z t . Assume that ∀z t Z, f z t * z t * = C z t .

The bunching literature most often assumes that f z t * z t * is a polynomial of a particular order over the range Z. It is inherently parametric because two functions that take the same values for z t ρ t , ρ t + z t need not take the same values for the omitted range z t ρ t , ρ t + z t . This lack of unique identification has been the subject of recent econometric critiques of bunching estimation. This paper provides concrete examples in which Assumption S fails and demonstrates that even in these cases a panel-data version of the design can provide consistent estimates.

When is Assumption S likely to fail? I provide three simple examples two-period examples. In the base year of each example that follows, the distribution of potential income is uniform, i.e. the density f z 0 * z 0 * is constant except in the omitted range around the notch. The excess mass below the notch ρ 0 is F z t ρ 0 F z t * ρ 0 , and in these examples with a uniform distribution, the counterfactual mass F z t * ρ 0 can be accurately quantified by extrapolating the constant density that is observed away from the notch. In the very next year, however, Assumption S fails to hold.

(1) Fixed costs of adjustment. Suppose that some agents face a large fixed cost of adjusting their next-year income, such as the cost of hiring a tax expert or the psychic cost of avoiding or evading. Such a setting can be modeled as a share of agents for whom z 1 * = z 0 . If some of these agents bunch in the base year, then in the next year they will have z 1 = z 1 * = ρ 0 , implying excess mass at this income level even if there is no notch when t = 1. Gelber, Jones, and Sacks (2020) show a version of this result for kinks (rather than notches) and use it to estimate fixed costs of adjustment using data spanning the removal of the kink. In contrast, I highlight the implication for identification. Even if the notch remains in place, the static approach does not generally provide a consistent estimator because what appears as bunching includes both the active bunching that the researcher wishes to quantify and some lingering effects of past bunching. Fixed costs of adjustment are a form of serial dependence, and the appendix derives results for examples with either discrete or continuous distributions of income.

(2) Continuous serial dependence. Income may be serially dependent if an agent’s salary or hours are a function of those in the previous period or if business opportunities include those of the previous period and a random shock. Serial dependence creates issues even if the transition distribution F z 1 * | z 0 * z 1 * | z 0 * , z 0 does not exhibit discrete mass points. In the next example, this distribution is continuous but gives greater probability to values of z 1 * that are close to z 0. In other words, income growth from one year to the next is most likely to take on small absolute values, and larger values of z 0 increase the likelihood of larger values of z 1. If agents experience large-absolute-value changes in income with positive probability, then bunching in the base year even affects the density at next-year incomes that are far from the notch. Supplementary Material B provides an additional illustration of this bias using the first 10 years of earnings from each household in the PSID. In this example, one can see that serial dependence can distort the entire distribution of income, yet it may not be apparent from the observed distribution that such distortion has occurred.

Note that allowing for serial dependence in income does not imply determinism; agents make a choice in each period over whether or not to bunch. However, these examples demonstrate that when income is serially dependent, the income distribution reflects accumulated effects, and the standard methodology does not recover the behavioral parameters governing these effects. This accumulation of effects may explain findings that bunching at kinks in individual income taxes has increased over time since 1996 (Chetty, Friedman, and Saez (2013) and Mortenson and Whitten (2020)).

(3) Notch-related attrition/extensive-margin responses. The bunching literature has examined notch-related attrition in the form of extensive-margin responses, i.e. those who choose income of zero over either the potential income above the notch or the bunching income just below it. A point made by Kleven and Waseem (2013) and generalized by Best and Kleven (2018) is that such responses should not occur just above the notch because agents can simply bunch by giving up very little income relative to their preferred income level. Kopczuk and Munroe (2015) note that even if this holds in the limit as potential income approaches the notch, extensive-margin responses may still occur close enough to the notch to affect estimation, and that it is not possible (with the standard bunching design) to measure extensive-margin responses without making strong assumptions. Moreover, a notch may induce other forms of systematic attrition, such as if crossing the notch increases the probability of being audited and removed from the sample. If this attrition is not determined by agent maximization, then it need not approach zero near the notch.

These examples may be thought of as mechanisms for the identification concerns documented in existing papers. These concerns can be mitigated with panel data. The approach allows for agents to make static or dynamic choices, as it is like the static approach in that it estimates reduced-form parameters rather than uniquely identifying structural parameters. The estimation is dynamic in the sense that it exploits the distribution of income growth rather than the distribution of cross-sectional income. It requires no changes in the notch policy, though these can be useful illustrating identification and for identifying additional parameters.

2.2 Identification with Panel Data

The ideal setting for estimating responses to a notch would be to observe agents’ choices with and without the notch in the budget set. In practice, this will rarely be possible without a within-subjects experiment. Observational studies must therefore make an assumption about what is a credible counterfactual when all agents face a notch. With panel data, one can make identifying assumptions about the distribution of income growth that are likely to hold even when the corresponding assumptions for a cross-section are violated. This can be seen in the preceding examples, where F z 1 z 1 fails to identify F z 1 * z 1 * , but for all observations not bunching in the base year (i.e. z 0 * = z 0 ), F z 1 | z 0 * z 1 | z 0 identifies F z 1 * | z 0 * z 1 * | z 0 * , z 0 .

Figure 2 provides intuition for dynamic designs using panel data. Income levels and growth are drawn from the smoothed PSID distribution as in the simulations presented in Section 3.1, and the real-world bunching of charities produces a very similar figure (Marx 2018). Panel A of the figure shows the distribution of next-year log income (relative to the notch) for two illustrative levels of base-year income. For each group, the distribution of next-year income is distorted around the notch as one would expect, with excess mass just to the left and reduced mass just to the right. Panel B shows the distribution of growth rates for each group, defined as the difference between next-year log income and base-year log income.

Panel B of Figure 2 shows that each base-year-income group can be used as a counterfactual for the other. The two groups have similarly-shaped growth distributions, except that the two groups require different income growth rates to reach the notch, and hence the bunching distortions lie in different parts of the two groups’ growth distributions. The extent of the distortions can therefore be estimated by comparing the shape of one group’s growth distribution around its notch to the corresponding, undistorted section of the growth distribution among households in the other group.

To formalize the intuition, consider again an observed distribution of base-year income F z 0 z 0 and a latent distribution of potential next-year income F z 1 * z 1 * = F z 1 * | z 0 z 1 * | z 0 d F z 0 z 0 . To generalize from the homogeneous elasticity above, allow △z 1 to have distribution F | z 0 z 1 | z 0 with domain 0 , z 1 ̄ . Note that the possibility of △z 1 = 0 allows for optimization frictions. The resulting empirical elasticity is the appropriate subject for a paper studying estimation, and this estimate can be translated to different structural elasticities depending on the additional assumptions that the researcher makes about the setting (Einav, Finkelstein, and Schrimpf 2017). To simplify the exposition in the text, I assume away attrition and extensive-margin responses, which I incorporate into a generalization in Supplementary Material A. However, I allow for heterogeneous elasticities ϵ i .

Consider a monotonic income-growth function g z 0 mapping a base-year income to a value that income could possibly take in the next year. For example, g z 0 = 1.1 z 0 would represent income growth of 10 percent. I will first show that for any such growth function there is an identifying assumption similar that is similar to Assumption S but that holds in the preceding examples where Assumption S fails. While this assumption remains parametric, I then note that comparing distributions for multiple growth functions g z 0 offers an identification opportunity more akin to comparing distributions over multiple budget sets.

Consider a range Z = z 0 ̲ , z 0 ̄ such that g z 0 ̲ < ρ and g z 0 ̄ > ρ + z 1 ̄ . Denote by Z ̃ z 0 Z : g z 0 ρ , ρ + z 1 ̄ the set of base-year incomes in Z for which income growth g z 0 does not lead to next-year income in the reduced-mass range. Thus, for z 0 Z ̃ , F z 1 | z 0 g z 0 | z 0 = F z 1 * | z 0 g z 0 | z 0 . Denote by f z 1 * | z 0 g z 0 | z 0 the PDF of z 1 * conditional on z 0.

Assumption D (in dynamic estimation): Choose a continuous function of finite parameters C z 0 over a range Z which is such that z 0 Z ̃ , C z 0 = f z 1 * | z 0 g z 0 | z 0 , the conditional PDF. Assume that ∀z 0Z, f z 1 * | z 0 g z 0 | z 0 = C z 0 .

Theorem:

Under Assumption D, F | z 0 z 1 | z 0 = d d z 1 F z 1 | z 0 ρ + z 1 | z 0 C z 0 , and hence for any structural model determining ϵ i for each △z 1, the conditional distribution of ϵ i is identified.

The proof of the theorem appears in Supplementary Material A. To interpret the result, note that the numerator is observed and is equal to f z 1 | z 0 z 1 | z 0 if F z 1 | z 0 is differentiable over z 1 0 , z 1 ̄ . Consider a small elasticity and corresponding small z 1 ̲ . Conditional on z 0, the share of agents with small enough elasticities that z 1 z 1 ̲ is equal to F | z 0 z 1 ̲ | z 0 = d d z 1 F z 1 | z 0 ρ + z 1 ̲ | z 0 C z 0 = f z 1 | z 0 ρ + z 1 ̲ | z 0 f z 1 * | z 0 ρ + z 1 ̲ | z 0 , which is the share of the agents with potential income of ρ + z 1 ̲ who do not bunch.

For intuition about the identification, consider a 50 percent growth rate ( g z 0 = 1.5 z 0 ) and the special case in which the distribution of growth rates does not depend on base-year income. If ρ + z 1 ̲ is a value in the next-year reduced range, then 50 percent growth would take an observation with base-year income of z 0 = ρ + z 1 ̲ / 1.5 to the reduced range in the next year. Thus, the probability of growing by 50 percent from this level of base-year income should be reduced because some agents whose income would have grown at this rate will instead only grow income enough to reach the notch. To quantify this distortion, one can simply compare Pr ρ + z 1 ̲ | z 0 = ρ + z 1 ̲ / 1.5 with, say, Pr ρ + z 1 ̲ | z 0 = ρ + z 1 ̲ / 2 . The undistorted latter probabilities provide a simple counterfactual in the special case of a growth distribution that does not vary with base-year income, in which case these probabilities should be equal if not for bunching. More generally, Assumption D maintains that the growth probability for a particular level of base-year income can be predicted using surrounding values of base-year income for which such growth should not be distorted. Such an assumption can be tested by creating placebo notches and checking whether observations surrounding any income level in the base year predict the probability of growing at any particular rate.

Assumption D differs from Assumption S in that it restricts the distribution of within-agent changes in income. It can therefore be implemented with a generalized difference-in-differences design rather than simple differences in a cross section. Moreover, while the static approach imposes an assumption about the density near the threshold, dynamic estimates can be obtained from observations that are far from the threshold in the base year. This difference is likely to be particularly important for notches that are in place for multiple periods, causing distortions around the notch to accumulate. Researchers can estimate bunching separately for multiple combinations of base-year-income and growth, or these estimates can be combined, as noted in the following corollary to the theorem.

Corollary:

If ϵ i is independent of z 1 * for z 1 * ρ , ρ + z 1 ̄ , then for all z 0 we have F | z 0 z 1 | z 0 = F z 1 , and F z 1 is over-identified by the set of functions g z 0 and values z 0 for which the identifying assumption holds.

This corollary exploits an assumption in the literature that in the small range of incomes z 1 * from which agents bunch, z 1 * may be uncorrelated with ϵ i and hence △z 1, such as if the distribution of ϵ i is the same for all agents. In this case, there are multiple potential sources of identification for the distribution of △z 1 and hence ϵ i . In essence, different levels of base-year income provide budget set variation because the cost of exceeding the notch is imposed at different amounts of income growth.

3 Methods

Next I describe three approaches to estimating bunching. Section 3.2 provides a description of the static method for reference. Section 3.2 presents an implementation of dynamic bunching design involving binning and using OLS. This approach offers familiarity, ease of implementation, and transparent evidence of whether agents manipulate income when approaching a notch. Section 3.3 presents an implementation using MLE, which offers precision and the opportunity to estimate notch-related attrition or extensive-margin responses.

3.1 Static Bunching Estimation

Consider a notch that imposes a discrete cost on agents i if choice variable z i is greater than a threshold ρ. Let z i be collapsed into bins of width ω. Let bin b denote the maximum value of z i in a bin. The bin count is c b = i 1 z i bin b ω , bin b . The standard estimating equation for bunching at a notch is

(1) c b = h = 0 n E 1 β E , h 1 bin b = ρ h ω + j = 1 n R β R , j 1 bin b = ρ + j ω + k = 1 K α k bin b k + e i

where n E and n R are the numbers of bins in the excess and reduced ranges, respectively. The counterfactual is a polynomial of order K with parameters α k . This equation provides two measures of bunching that should be equal if there are no extensive-margin responses or notch-related attrition: h = 0 n E 1 β E , h gives the excess mass of bunchers just below the notch, and j = 1 n R β R , j gives the reduction of mass just above the notch.

Visual examples of the standard approach are provided in Figure 1. These use data from the charity application in Section 2 and the PSID. Relative to the counterfactual polynomial in each figure, mass is elevated in the bunching range just below the notch and reduced in the reduced range above the notch. Often this mass is divided by the counterfactual level of the density at the notch, and this bunching ratio gives an estimate of the amount by which the average buncher has reduced the choice variable (Kleven and Waseem 2013).

The standard methodology has also been augmented to address some further complications. Recognizing agents may reduce the choice variable from all levels above a notch or especially a kink, researchers have estimated the counterfactual separately on either side of the threshold or adjusted the counterfactual above the threshold so as to equalize the excess- and reduced-mass estimates (e.g. Chetty et al. 2011). As another example, the mass in a strictly dominated region just above the notch can be used to estimate the number of agents facing frictions that prevent bunching (Kleven and Waseem 2013). These adjustments are straightforward to incorporate into the dynamic designs proposed in this paper.

3.2 Dynamic OLS Estimation

As shown in Section 2, a dynamic design can identify bunching parameters using the panel nature of the data. Such dynamic designs can be implemented in a variety of ways. In theory, one could estimate the entire multivariate density of the choice variable in all available years, but allowing for such generality would be computationally intensive. I propose two methods of implementation that focus on pairs of the base-year logged choice variable ( r i t = ln z i t ) and growth to its next-year value (g it = r it+1r it ).

3.2.1 Concept

To develop intuition before considering the full OLS implementation, consider restricting a dataset to agents for whom g it falls within a particular range, i.e. a growth rate bin. Figure 3 provides an illustration using the data described in Section 5. I restrict the sample to charity-year observations with g i t 0.1 , 0.2 , then plot the (conditional) mean growth rate by bins of r ̃ i t = ln z i t 100,000 , which is base-year income recentered relative to the notch at $100,000. For example, among these charities with g i t 0.1 , 0.2 , the ones that are around 0.5 log points below the notch in the base year r ̃ i t 0.5 have a mean of g it is close to 0.147. This conditional mean should not be distorted by the notch because even if g it = 0.2, which is the maximum value of growth for all depicted observations, they will have r ̃ i t + 1 0.3 , which is well below the notch. Similarly, there should be no distortion for charities well above the notch. All such bins are depicted with dark circles, and the curve with standard error bands illustrates a quadratic fit to their outcomes.

Figure 3: 
Growth of treated charities versus counterfactual for an illustrative bin of growth rates. Notes:The figure shows growth of income from the current year to the next year as a function of current income (recentered around the reporting notch at $100,000). The figure sample consists of organizations in an illustrative growth bin that includes are organizations with growth between 0.1 and 0.2 log points. The marker with a 95-percent confidence interval represents the bin (defining the “near notch
it
” dummy described in the text) for which growth of 0.1–0.2 implies that future receipts lie in the “omitted range” straddling the notch. The conditional average growth rate of these charities is just below 0.145, which is significantly less than the counterfactual growth rate interpolated from charities with higher and lower current incomes. The difference is interpreted as a measure bunching; some charities that approach the notch reduce their income to stay below it, and therefore conditional average growth is less than predicted. N = 152,191. Omitted range is −0.08–0.07. Bin width = 0.05.
Figure 3:

Growth of treated charities versus counterfactual for an illustrative bin of growth rates. Notes:The figure shows growth of income from the current year to the next year as a function of current income (recentered around the reporting notch at $100,000). The figure sample consists of organizations in an illustrative growth bin that includes are organizations with growth between 0.1 and 0.2 log points. The marker with a 95-percent confidence interval represents the bin (defining the “near notch it ” dummy described in the text) for which growth of 0.1–0.2 implies that future receipts lie in the “omitted range” straddling the notch. The conditional average growth rate of these charities is just below 0.145, which is significantly less than the counterfactual growth rate interpolated from charities with higher and lower current incomes. The difference is interpreted as a measure bunching; some charities that approach the notch reduce their income to stay below it, and therefore conditional average growth is less than predicted. N = 152,191. Omitted range is −0.08–0.07. Bin width = 0.05.

The target of estimation is the outcome of the “treatment” bin that appears in the figure as the filled circle with a confidence interval. For this bin, g i t 0.1 , 0.2 translates to a range of future income that straddles the notch, thereby including those that cross the notch and also those that bunch below it. This bin is therefore not selected based on whether an agent is a buncher, but it includes agents “treated” with the opportunity to bunch, and therefore offers an instrumental variable for bunching. The instrument’s relevance can be seen from the bin’s mean growth rate, which is significantly reduced relative to the quadratic counterfactual because some charities in the bin reduce their growth in order to bunch. To avoid bias of this counterfactual, the researcher may wish to drop nearby bins, depicted with lighter gray circles, for g i t 0.1 , 0.2 translates to some values of r ̃ i t + 1 that are close enough to the notch on one side or the other to overlap with the bunching or reduced range.

3.2.2 Estimation

A more complete use of the data combines multiple growth-rate bins. Denote the width of these bins as ω g , with the maximum value in the bin labeled gbin it . Once again, denote by r it agent i’s log income in base year t, and label the growth rate to the next year’s income g it = r it+1r it . Bins of base-year income can be selected with width ω r and maximum value in the bin labeled rbin it . To estimate the effect of nearing the notch on outcome Y it+1, I propose estimating equations of the form

(2) Y i t + 1 = β NearNotch i t + k = 0 K γ α k γ r i t k 1 gbin i t = γ .

Here, the “treatment” variable is

NearNotch i t = 1 rbin i t ω r + gbin i t ω g < ρ t + 1 < rbin i t + gbin i t ,

which indicates pairs of income and growth-rate bins that produce a range of next-year income that straddles the notch ρ t+1. The α coefficients describe the counterfactual, which allows for a separate polynomial of base-year income for each bin of growth. The parameter β gives the difference between the conditional mean of the outcome and what is predicted by the counterfactual. If the outcome is either g it or r it+1, then bunching would imply that β < 0.

With this design, the researcher can construct a binary outcome to estimate the share of agents that bunch. Along with the notation above, label the minimum value of a base-year income bin rmin it . Consider a growth-rate (g it ) bin of γ , γ + ω g for some γ. The treatment bin of current income has minimum value rmin it = ρ it+1ω E γ. Agents with income at the minimum of this bin will cross the notch if they grow by γ + ω E . Other agents in the bin will cross the notch if growth is greater than γ + ω E r i t rmin i t . Let cross γ i t = 1 g i t > γ + ω E r i t rmin i t . This outcome is defined for all observations, regardless of bin, and indicates whether the agent achieves growth at a rate that would correspond to the notch if the observation were in the treatment bin. Because cross γit only has significance for the treatment bin, the probability that cross γit = 1 should be reduced in the bin of interest by the share of agents that bunch, and it should not be affected for other bins. When pooling growth bins in a single regression, the researcher can define cross i t = γ cross γ i t * 1 γ growth i t < γ + ω g . This indicator is useful in part because it can be used as the endogenous variable in two-stage-least-squares regressions with instrument NearNotch it to estimate the causal effect of bunching on long-run income or other outcomes.

Optimal bin selection may be a fruitful subject for future research. To some extent, the choice of bin width in the dynamic method is as ad-hoc as in the static method. I am not aware of any formal results for bin selection but anecdotally find that bin width has little effect on either static or dynamic estimates. For the dynamic approach, however, it is important to bin in such a way that the treatment bins cover the full omitted range so as not to induce selection on outcomes. In Supplementary Material D, I propose a test of the exogeneity of the treatment bin that compares its bin count to those of surrounding bins.

Supplementary Material E describes the special case of estimating causal effects using a notch that is only in place at one point in time. Diamond and Persson (2016) examine an application of this form and provide a detailed description of an estimator for such cases. One benefit of a temporary notch is that the researcher can simply bin income in the year of the notch rather than multiple years’ incomes. This is because agents near the notch should not be a selected sample as they might be had there been a notch in the prior year at or near the same income level. In the appendix, I describe the simplified method for a temporary notch and provide simulations that support the use of this method for estimating bunching and its causal effects. Estimates for the charity application show that bunching at a temporary notch has a permanent effect on income, providing evidence that responses to the notch are real and not simply misreporting. Simulations with the PSID show that a temporary notch can be used to estimate bunching in the year of the notch and causal effects in later years, including IV estimates of serial dependence.

3.3 Maximum Likelihood Estimation

Dynamic OLS estimation is straightforward and provides a number of potential advantages over static OLS estimation. However, OLS estimation will not account for extensive-margin responses without additional assumptions and will not be efficient because binning the data treats observations within a bin as equivalent. A maximum likelihood bunching estimator can address these limitations.

3.3.1 Concept

Kopczuk and Munroe (2015) used MLE to estimate bunching within the static conceptual framework. They did so by estimating the parameters of an exponentiated polynomial, a function known to have the nonnegativity property of a PDF, which they also constrained to integrate to unity. In the dynamic framework there are several advantages to instead work directly with the CDF F * g i t * | r i t , Θ of latent log growth g i t * conditional on the base-year income r it and parameters Θ. First, it is desirable to estimate points of sample truncation or excess attrition among those who cross the notch, and the CDF gives the probabilities of these occurrences. Second, the CDF makes it straightforward to constrain the reduced mass to equal the bunching mass (except for differences due to systematic attrition). Third, truncation requires integration of the likelihood between limits that vary with the level of current receipts, a practical issue for programs performing multidimensional integration.

To first describe the concept, suppose that g i t * comes from a known distribution F * g i t * | r i t , Θ . The parameter vector Θ, and as well as agents’ responses to the notch, are estimated by numerical maximization of the likelihood of the CDF F g i t | r i t , Θ , Ω of observed growth g it . This function includes additional parameters Ω describing bunching and any censoring or attrition. Because the handling of censoring is standard, I will leave this to the implementation section.

Bunching consists of excess mass B r i t in the bunching region g i t + r i t ρ ω E , ρ that would otherwise lie in the reduced region g i t + r i t ρ , ρ + ω R , where ρ denotes the notch and ω E and ω R the respective widths of affected ranges below and above the notch. Denote by b r i t = B r i t F * ρ + ω R r i t | r i t , Θ F * ρ r i t | r i t , Θ the share of agents whose latent growth takes them to the reduced region but who instead appear in the bunching region. This bunching propensity can be allowed to vary with r it , as in the implementation below, but for simplicity here I will assume a constant b.

As with OLS methods, one should “omit” the bunching and reduced regions from estimation of the latent distribution. However, the mass of each of these regions is needed to identify bunching and attrition parameters. The researcher can achieve both goals by defining the following variable:

g ̃ i t = ρ r i t for ρ ω E r i t < g i t ρ ρ + ω R r i t for ρ < g i t ρ + ω R r i t g i t otherwise

This variable is equal to the observed growth rate except that all observations entering the bunching range are assigned one value of growth and all observations entering the reduced range are assigned a different value.[7] The number of observations at these two points quantifies the bunching response. In the absence of bunching or attrition, the probability of entering the bunching range would be F * ρ r i t | r i t , Θ F * ρ ω E r i t | r i t , Θ , and the probability of entering the reduced range would be F * ρ + ω R r i t | r i t , Θ F * ρ r i t | r i t , Θ . Multiplying the latter by b gives the bunching mass B, which in the likelihood function is added to Pr g ̃ i t = ρ r i t and subtracted from Pr g ̃ i t = ρ + ω R r i t .

Next, one can incorporate attrition functions if the data include missing values or extensive-margin responses. Extensive-margin responses refer to effects on the probability that a non-negative outcome is equal to zero. This can be generalized to a lower bound with F * r i t | r i t , Θ = 0 . If instead there are observations that do not appear in the next period’s data, then these can be coded as having g ̃ i t = r i t , and estimation is the same, so I will simply refer to both as attrition. In applications with attrition, one should account for a general share λ r i t of agents that will attrit. One can also estimate the share δ of agents whose attrition is caused by the notch, i.e. for whom g i t * + r i t > ρ . Identification of the attrition functions is similar to the identification of bunching in the standard framework: if λ r i t is a polynomial, then it can be identified by observations with r it not too close to the notch, whereas δ will be identified by elevation attrition among observations that are closer to the notch and therefore more likely to cross it, as determined by F * ρ r i t | r i t , Θ .

The observed conditional CDF is then:

F g ̃ i t | r , Θ , Ω = λ r i t + 1 λ r i t F * g ̃ i t | r i t , Θ + δ 1 F * ρ r i t | r i t , Θ λ r i t + 1 λ r i t F * ρ ω E r i t | r i t , Θ + δ 1 F * ρ r i t | r i t , Θ 1 λ r i t F * ρ r i t + ω R | r i t , Θ F * ρ r i t ω E | r i t , Θ + b 1 λ r i t δ F * ρ r i t + ω R | r i t , Θ F * ρ r i t | r i t , Θ 1 b 1 λ r i t δ F * ρ r i t + ω R | r i t , Θ F * ρ r i t | r i t , Θ λ r i t + δ + 1 λ r i t δ F * g ̃ i t | r i t , Θ

for r i t g ̃ i t < ρ r i t ω E for ρ r i t ω E g ̃ i t < ρ r i t for g ̃ i t = ρ r i t for g ̃ i t = ρ r i t + ω R for ρ r i t + ω R < g ̃ i t

In practice, the distribution function is not known in advance. However, parameters can be added to a known CDF to improve the fit just as the order of the polynomial can be increased in OLS implementations. For example, one could adopt the Normal CDF but allow the standard deviation of g i t * to be a linear function of r it . This generalization of F * g i t * | r i t , Θ may allow for parameter estimates for which it would no longer be a CDF, and the researcher may need to impose restrictions to ensure that it is, including that it be nondecreasing and that lim g F * g i t * | r i t , Θ = 0 and lim g F * g i t * | r i t , Θ = 1 .

3.3.2 Implementation with a Generalized Laplace Distribution

In both the PSID and charity application, growth exhibits a non-Normal distribution that is common for financial variables, with a peak around zero growth and fat tails. I therefore start from the Laplace (or “double exponential”) distribution.[8] The basic Laplace CDF for growth g i t * , with mode θ and additional parameter χ, would be equal to 1 2 exp g i t * θ χ for g i t * < θ and equal to 1 1 2 exp θ g i t * χ for g i t * θ . To allow for more flexibility, I replace g i t * and r it with functions of those variables. Here, I use functions P l g i t * , r i t , Θ and P u g i t * , r i t , Θ respectively for the lower and upper parts of the growth distribution:

(3) F * g i t * | r i t , Θ = exp P l g i t * , r i t , Θ g i t * < θ 1 exp P u g i t * , r i t , Θ g i t * θ

The functions P l g i t * , r i t , Θ and P u g i t * , r i t , Θ can be quite complicated; Supplementary Material D shows functions that allowed me to fit both PSID and charity data. These can appear more arbitrary than their derivatives, such as using inverse tangents to better fit the high frequency of values of g i t * that are close to zero, because the derivative of arctan x is 1 1 + x 2 . All of these functions of g i t * are multiplied by quadratic functions of r it to allow the growth distribution to vary with base-year income. The more that the empirical distribution varies with r it , the more flexibility the researcher will want to allow in these functions. Increasing complexity of the MLE comes, however, at a much greater cost in terms of computation time. Researchers may therefore want to estimate dynamic OLS first and assess how many parameters are needed before employing dynamic MLE.

To ensure that F * g i t * | r i t , Θ is a CDF, it must be the case that lim g P l g i t * , r i t , Θ = lim g F * g i t * | r i t , Θ = 0 and that lim g P u g i t * , r i t , Θ = lim g F * g i t * | r i t , Θ = 1 . In the appendix, I show that this can be achieved for the functions that I chose using simple constraints on just a few parameters, and these constraints are easily implemented by using exponentiated coefficients in the numerical maximization. In addition, a CDF must be nondecreasing. The Laplace function has only one point of nondifferentiability at g = θ, and the nondecreasing property requires lim g θ F * g i t * | r i t , Θ lim g θ + F * g i t * | r i t , Θ . I require this relation to hold with equality, giving continuity of the CDF and ruling out a point mass at g i t * = θ . This is achieved by solving the equality exp P l θ , r i t , Θ = 1 exp P u θ , r i t , Θ , which gives another parameter restriction that is easily incorporated and reduces the number of parameters that need to be estimated. I set the mode θ equal to zero by inspection for the PSID data, and doing so for the charity application gives very similar results to using a nonparametric estimate of the mode.

The implied latent density is

f * g i t * | r i t , Θ = P l g i t * , r i t , Θ exp P l g i t * , r i t , Θ g i t * < θ P u g i t * , r i t , Θ exp P u g i t * , r i t , Θ g i t * θ

where P l g i t * , r , Θ = and P u g i t * , r , Θ are derivatives with respect to g i t * . These derivatives can be assured of the correct sign by exponentiating each of the relevant coefficients, but this would impose more than is required because nonnegativity of the density does not necessitate that all the coefficients have the same sign. In practice, I instead impose a prohibitive penalty on the value of the likelihood function if f * g i t * | r i t , Θ is negative for any observations. Researchers may also choose to exclude observations with current receipts in the omitted range from the estimation sample, or to allow the density to be discontinuous in r it at the threshold for the base-year notch, as I do in the applications. This completes the specification of the latent distribution.

Completion of the empirical distribution F g ̃ i t | r , Θ , Ω requires specification of the bunching, censoring, and attrition functions. I allow the bunching propensity b r i t to take one value for those with base-year income below the notch and a different value for those with base-year income above the notch, as there may be a fixed cost of crossing the notch for the first time that increases bunching among those crossing from below. I set the lower-bound to = log 25,000 because this level of next-year income is far from the notch and because charities do not file information returns if income falls below $25,000. For the rate of attrition that is not related to the notch, λ r i t , I estimate a cubic polynomial. I allow the share that attrit instead of crossing the notch, δ, to take one of two different values depending on whether growth takes the agent to the reduced range or to higher income levels where it is not worth bunching.

The full observed conditional CDF is written out in Supplementary Material D. Supplementary Material Figure D.1 displays observed and estimated distributions for the smoothed PSID distribution. For each level of base-year income, the estimated curve fits the data points closely. Estimated curves are also plotted for different levels of base-year income, showing that the distribution shifts somewhat to the right with base-year income, i.e. that richer households grow income by somewhat more than poorer households.

4 Simulations

4.1 Static Method Simulations

With the theory having demonstrated some threats to identification, I use simulations to assess the magnitude of biases. To conduct simulations, I use a panel of household taxable income from the Panel Study of Income Dynamics. Use of real-world data from the PSID shows that results are not driven by a particular distribution of known functional form. In addition, these data are publicly available, which will facilitate replication and extension of the proposed methods. I restrict attention to the nationally-representative sample within the PSID and exclude the over-sample of low-income families. The sample covers years from 1967 through 2013, and I inflate all amounts to 2013 dollars using the Consumer Price Index. The variable of interest is the household’s taxable income.[9] In order to evaluate static and dynamic estimates on the same set of observations, I use each pair of consecutive years in which household income is observed, which I label the “base year” and “next year.”

To control the true amount of bunching in this naturally occurring data, I estimate the smoothed joint distribution of income and one-year growth. I define r it as the log of realized base-year income and g it = r it+1r it as the growth rate from base-year income to next-year income. I first bin the inverse CDF of g it and estimate it as a flexible function with range 0,1 so that I can generate random numbers u i from a uniform distribution and convert them to values of g it .[10] I then bin the joint distribution of r it and g it to estimate the inverse CDF F 1 r i t | g i t as a flexible function with range 0,1 so that I can generate random numbers v i from a uniform distribution and convert them to values of r it .[11] This allows me to generate a sample of any size according to the smoothed joint distribution of income and growth.

I generate bunching at a simulated notch with a heterogeneous distribution of abilities and elasticities. I employ the quasilinear utility function u c t , z t , a t = c t a t 1 + 1 / ϵ z t a t 1 + 1 / ϵ as in Section 2. I assume a linear budget constraint c t 1 τ z t T t 1 z t > ρ t for marginal tax rate τ and fixed cost T t imposed when income exceeds the notch ρ t . I set τ = 0.2 and create a hypothetical notch for certain years t, as described for each exercise, with ρ t = $40, 000 and T t = $1000. I assume that ϵ = 0 for half of all households (e.g. due to optimization frictions), and for the other half of households, ϵ Unif 0,1 . Note that larger elasticities require a wider reduced range for both the static and dynamic methods. Larger elasticities or fewer optimization frictions also imply more bunching, which will result in greater bias for the static method. Households bunch in the base year, and in the next year the notch is removed. A notch need not be removed for biases to arise, but I remove the notch in the simulation because this means that there should be exactly zero bunching after the reform, and therefore the mean estimate gives the bias of the estimator.

To examine the severity of the problem posed by notch-related attrition or extensive-margin responses, I take 10,000 draws of 100,000 observations from the smoothed distribution of incomes in the PSID and generate attrition from a fraction of those whose incomes should lie above the notch. Across simulations, I vary the extent of this notch-related attrition response between 0 and 10 percent of households, the largest value motivated by an estimated notch-related attrition rate for charities of 9.3 percent in Marx (2018) and below. No households bunch. I report static estimates of the average income foregone by bunchers using each of the excess mass below the threshold and the reduction in mass above the threshold. Because the true amount of bunching is zero, the average estimate gives the bias, and the percentage of statistically significant results gives the coverage, or simulated rejection rate. I also report the root mean squared error (root-MSE).

Table 1 shows the effect of notch-related attrition on static bunching estimates. Estimates using the reduced mass above the notch show small biases, as well as coverage rates close to 0.05, when the attrition rate is 0 or 1 percent. When the attrition rate is as small as 2 percent, the reduced-mass estimates falsely attribute them to bunching and reject too frequently. If 5 percent of agents respond on the extensive margin, the reduced-mass estimate rejects nearly three times as often as it should. The pattern is similar for the excess-mass estimates, though these show inflated coverage rates even for small rates of attrition. This may be surprising, given that the attrition response occurs entirely above the notch and not in the bunching range. Even so, these estimates are biased because the reduced mass above the omitted range lowers the counterfactual mass within the omitted range, including below the notch.

Table 1:

Bias of static estimates in simulation with notch-related attrition responses.

Responder share Reduced-mass estimates Excess-mass estimates
Bias Coverage Root MSE Bias Coverage Root MSE
0 −90 0.051 516 72 0.068 213
0.01 2 0.049 506 103 0.078 227
0.02 103 0.055 529 128 0.095 243
0.05 438 0.132 690 207 0.166 295
0.10 1023 0.450 1161 362 0.374 423
  1. Notes: The table shows results of static bunching estimation performed on data with no bunching. Thus, the estimate for each outcome should equal zero, and the coverage rate should be close to 0.05. Each row presents results for 10,000 random samples of 100,000 observations generated from the smoothed PSID income distribution. The outcome is the bunching ratio, i.e. the bunching mass divided by bin width and counterfactual number of observations. Estimates are shown using either the excess mass below the notch or reduced mass above it for the bunching mass. The Responder Share is the percentage of observations that drop from the data when income exceeds the hypothetical threshold of $40,000. Responders are drawn at random. Estimation using 5th-order polynomial with omitted range of $35,000–$55,000.

I next quantify the bias arising in static estimates when income is serially dependent. I impose bunching in the base year but then estimate bunching in the next year when there is no notch and hence no active bunching. If income is not serially dependent, then bunchers’ next-year incomes are determined by their base-year potential income. If income is serially dependent, then bunchers’ next-year incomes are determined by their base-year actual income or a combination of actual and potential income. I vary the degree of serial dependence over the simulations by choosing different weights η and determining next-year income by adding the randomly drawn growth rate g it to a weighted average of base-year potential income (with weight η) and base-year actual income (with weight 1 − η). I consider possibilities ranging from full weight on actual income, as suggested by estimated one-for-one causal effects of base-year income on future-year income in Marx (2018), to a case of double weight on potential income as in the case of pure retiming of the income that is reduced in order to bunch.

Results appear in Table 2. When income exhibits no serial dependence (η = 1), static estimates have coverage rates close to 0.05 and relatively small bias. With positive serial dependence (η < 1, i.e. bunching reduces next-year income), both estimates show a positive bias, and coverage rates rise. Indeed, for the one-for-one case found of η = 0 found by Marx (2018), the reduced-mass estimator is off by nearly $1000 on average and rejects in more than one third of samples. With negative serial dependence (η > 1, i.e. bunching increases next-year income), the bias in the excess-mass estimates becomes negative, while coverage rates are again too high for both estimates.

Table 2:

Bias of static estimates in simulation with serially-dependent income.

Weight on base-year potential income Reduced-mass estimates Excess-mass estimates
Bias Coverage Root MSE Bias Coverage Root MSE
0 879 0.358 1040 210 0.161 306
0.5 251 0.074 602 150 0.103 269
1 51 0.050 549 −7 0.051 222
1.5 250 0.074 606 −177 0.125 285
2 623 0.199 837 −315 0.289 387
  1. Notes: The table shows results of static bunching estimation performed on data for the year after bunching. Thus, the estimate for each outcome should equal zero, and the coverage rate should be close to 0.05. The data, sample size, hypothetical notch, estimation, and outcome are as in Table 1. The weight ω on base-year potential income captures serial dependence, as described in the text: ω = 1 implies no serial dependence on observed income, ω = 0 implies that only observed income matters, and ω = 2 implies that agents who bunch in the base year grow by more in the next year.

4.2 Dynamic OLS Simulation

Dynamic OLS estimation should be consistent under serial dependence of income. I evaluate this design in the same way as the static estimator, with 10,000 random samples of 100,000 observations, base-year bunching at $40,000, and no bunching in next-year income.[12]

Table 3 presents the results of the dynamic OLS simulation. The outcome in the left half of the table is income, which would be reduced in the treated bins approaching the notch if households in these bins chose to bunch, as would the outcome in the right half, which is the probability of crossing. Across the different types of serial dependence shown in each row, the coverage rate for the effect on income is close to 0.05. Coverage rates for the second outcome are closer to 0.06, but they do not change much with serial dependence, and they remain lower than the rates for the static estimator in all but ideal conditions.

Table 3:

Dynamic OLS estimates in simulation with serially-dependent income.

Weight on base-year potential income Income-change estimates Share-bunching estimates
Bias Coverage Root MSE Bias Coverage Root MSE
0 −0.431 0.048 55.14 0.006 0.058 0.025
0.5 −0.377 0.049 55.84 0.006 0.059 0.025
1 0.146 0.049 55.60 0.006 0.057 0.025
1.5 0.752 0.049 56.14 0.006 0.056 0.025
2 −0.084 0.052 56.06 0.006 0.060 0.025
  1. Notes: The table shows results of OLS dynamic bunching estimation performed on data with bunching occurring in the base year but no bunching and no notch-related attrition in the next. Thus, the estimate for each outcome should equal zero, and the coverage rate should be close to 0.05. The data, sample size, and hypothetical notch are as in Table 1. The outcomes in the two halves of the table are income and the “Cross” dummy for having income growth putting the household above the notch, as described in the text. Estimation using log income, omitted base-year range of 10.55–10.65 (log(40,000) = 10.6), and growth rate bins of width 0.1.

Since the bunching ratio is the static estimate of the effect on income, results for this outcome can also be compared to those of the static estimates in the middle row of Table 2, in which there is no serial dependence. The root-MSE for the dynamic estimate is 55.6, compared to 549 and 222 for the static reduced-mass and excess-mass estimates, respectively. This suggests that the dynamic method is more precise, but one can also see that root-MSE is closely related to the bias, which is nonzero for both of these parametric methods. In general, the dynamic OLS method will lose some precision due to the dropped bins and to not estimating any bunching that is occurring in the base year. This will not be a problem if there is a base year prior to the existence of the notch that wouldn’t be used at all in the static design. If this is not the case, then the dynamic estimator would have relatively less precision if the number of years in the panel is small.

The dynamic OLS design estimates conditional growth probabilities as polynomials of base-year income. A potential downside of this approach is that it may not obtain an adequate fit if the distribution of conditional growth rates changes rapidly across values of base-year income r i . To test this possibility, I run additional simulations in which I alter the growth rates of the draws from the smoothed PSID distribution. For context, the growth rate of log income in that distribution increases monotonically from −0.14 to −0.01 across deciles of base-year income, and the standard deviation varies non-monotonically between 0.45 and 0.53. To increase the variation in these conditional moments, I define a normalized base-year variable r ̃ i = r i min r i max r i min r i 0,1 and then consider transformations of g i using the linear, quadratic, and exponential functions of r ̃ i . In the first three simulations, these functions are added to g i , thereby changing the conditional mean but not the variance. Three more simulations change both moments by multiplying g i by the same functions of r ̃ i .

Results appear in Table 4. For all three of the additive transformations, results are similar to those with the untransformed distribution, with coverage rates close to 0.05. The same is true for the first and third multiplicative transformations. However, multiplication by r ̃ i 2 produces coverage rates in excess of 0.16 for both outcomes. This finding shows that the proposed method can fail if the conditional distribution of growth differs in particular ways across levels of base-year income. In light of this, researchers can plot the binned conditional growth rate probabilities for undistorted bins and visually examine how well these are approximated by polynomials of base-year income. They can also test for “effects” on growth to placebo thresholds and assess robustness to varying the order of the polynomial.

Table 4:

Dynamic OLS estimates in simulation with serially-dependent income.

Transformation of growth Income-change estimates Share-bunching estimates
Bias Coverage Root MSE Bias Coverage Root MSE
g i + r ̃ i 8.151 0.048 71.32 0.002 0.051 0.031
g i + r ̃ i 2 −7.696 0.051 71.46 −0.008 0.054 0.032
g i + e r ̃ i 1.643 0.049 294.60 0.039 0.062 0.133
g i * r ̃ i −12.013 0.054 60.29 0.006 0.059 0.026
g i * r ̃ i 2 82.110 0.169 116.63 0.040 0.196 0.054
g i * e r ̃ i 0.283 0.049 55.65 0.002 0.051 0.024
  1. Notes: The table shows results of OLS dynamic bunching estimation as in Table 3. In these simulations there is no serial dependence, but the distribution of growth g i conditional on base-year income r i has been altered using transformations of g i and r ̃ i = r i min r i max r i min r i 0,1 . Each row displays results of a simulation using the transformation given in the first column.

4.3 Dynamic MLE Simulation

The simulation of MLE dynamic bunching estimation follows the earlier simulations. Here, the 10,000 random samples each contain 10,000 observations (reduced from the sample size of 100,000 used for OLS estimation due to the longer estimation time). I again impose bunching in the base year but no bunching or notch-related attrition responses in the next year. To allow for notch-related attrition responses, I randomly select observations to attrit at the rate that I observe among high-income households in the raw data, which is roughly four percent.[13] I treat all attritors as having r i t + 1 = log 10,000 , with corresponding g it , and then estimate a probability mass for this level of g it that can vary with r it and g it . I capture notch-related attrition responses among households whose next-year income should exceed the notch by incorporating a parameter that, if ρ it is the growth rate that would bring a household to the notch, shifts mass from ρ it to log 10,000 .

Table 5 presents results from the simulation. I report estimates describing bunching and attrition responses, both of which have a true value of zero. The top row describes estimates of the share of bunchers among the total number of households that should move to the reduced region. The second row reflects the estimated effect on the income of these households, and the third row represents estimates of the share, of those that should cross the notch, that instead exit the sample. All of these estimates exhibit a bias close to zero and a coverage rate close to 5 percent.

Table 5:

Dynamic MLE estimates in simulation with serially-dependent income.

Variable Bias Coverage Root MSE
Share bunching −0.0003 0.049 0.008
Income reduction −9.20 0.049 241.1
Share exiting 0.0039 0.052 0.025
  1. Notes: The table shows results of MLE dynamic bunching estimation performed with bunching occurring in the base year but no bunching and no notch-related attrition. Thus, the estimate for each outcome should equal zero, and the coverage rate should be close to 0.05. 10,000 random samples of 10,000 observations were generated with the data, hypothetical notch, and omitted range as in Table 3. Of households that would have moved into the reduced range, “Share Bunching” instead bunches below the notch. Of those that would have moved into the reduced range, the “Share Exiting” instead leaves the sample.

The results in Table 5 also provide some information about precision. While MLE is efficient when the distribution is correctly specified, the estimator in the simulation does not use the functional form of the actual generating distribution, but rather a distribution function meant to provide enough flexibility for researchers to reasonably approximate a variety of empirical distributions. The estimated income reduction is the same quantity that is estimated using the bunching ratio in the static design, and therefore results from the middle row of Table 4 can therefore be compared to those in Tables 1 and 2. The root-MSE of the dynamic estimate is about 10 percent large than that of the static estimate of excess mass and less than half that of the static estimate of the reduced mass.[14] Thus, the off-the-shelf MLE estimator performs nearly as well as the better of the static estimators, and precision can be further improved by tailoring the MLE to reduce unnecessary parameters and to better fit the empirical distribution outside of the omitted range.

5 Empirical Illustration

I apply dynamic bunching estimation to a federal reporting requirement for charitable organizations. Marx (2018) studies the welfare effects of this regulation and of adjustments to regulatory notches. Here, the setting provides a useful application for the proposed methods because there is publicly available panel data, variation in the location of the notch, and interesting patterns in each of the dimensions in which dynamic bunching estimation provides new information.

The charities dataset was provided by the National Center for Charitable Statistics (NCCS), an initiative of the Urban Institute. This dataset is the union of all annual “Core Files,” which provide digital records of the information returns that tax-exempt organizations are required to file with the IRS. The organizations must file IRS Form 990 unless both total assets and gross receipts fall below certain thresholds that convey eligibility to file Form 990-EZ. The gross receipts threshold was $100,000 from before the beginning of the panel in 1991 until 2008. I restrict the sample to years before 2008 and, for comparability between the static and the dynamic estimates that follow, observations of charities that appear in consecutive years.

I estimate several parameters of relevance. Charities whose revenues rise above the level of the notch could bunch by reducing fundraising, misreporting receipts, or retiming receipts to the next year. Alternatively, they could leave the data by not filing because charities are only required to file if they exceed a size threshold, and the Internal Revenue Service conducts a limited number of audits that would reveal whether a charity should have filed (Marx 2018). I examine these responses when the notch is first approached as well as in later years because the welfare effect of a notch depends on the total distortion caused in both the short-run and long-run.

I first note the results of static bunching estimation for charities. These appear in Supplementary Material Table D.1. The estimated reduction of mass above the notch is nearly 70 percent larger than the estimate of the excess mass below the notch, i.e. charities appear almost twice as likely to drop out of the sample as to bunch. The table also shows that the estimates are not reconciled by allowing for a discontinuity at the notch or by estimating separate polynomials on each side of the notch. These findings caution against constraining estimation so as to require that the excess mass equal the reduced mass, as this equality need not hold in the presence of extensive-margin responses, notch-related attrition, or serial dependence. Kopczuk and Munroe (2015) compare the excess and reduced mass as a diagnostic to test for extensive-margin responses. The dynamic bunching design allows me to quantify such responses and demonstrate their empirical importance in the charity application.

I demonstrate benefits of dynamic OLS estimation by examining effects on long-run income growth in the charity application. A notch is less distortionary if agents only respond temporarily, such as in a one-time misreport or retiming of income, than if their income is affected for many years. As one measure of long-run effects, consider the binary outcome cross it defined in Section 3.1, the indicator for growth that exceeds the level required to put the agent above the notch at time t + 1. Corresponding indicators can be defined for any horizon h to estimate effects of approaching the notch at time t + 1 on crossing it at time t + h.[15] Table 6 displays the results of regressions with h varying from 1 to 12.[16] For charities in the bins approaching the notch in year t + 1, the estimated counterfactual is that 40 percent should have income greater than the notch at t + 1, and 75 percent should have income above the notch in year t + 10. The estimates in the table indicate that bunching reduces these probabilities in both the short- and long-run. I find the largest effect, a 5.3 percentage point reduction, in the year that the notch is first approached. I then find effects of around 1.5 to 2 percentage points at all other horizons, indicating that the notch permanently reduced the growth of a significant number of charities, a result that cannot be shown with static estimation.

Table 6:

Dynamic OLS estimates for charities.

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Near notch −0.053*** −0.021*** −0.018** −0.015** −0.016** −0.017*** −0.016** −0.015** −0.017*** −0.022*** −0.012* −0.014***
(0.009) (0.007) (0.007) (0.007) (0.007) (0.006) (0.008) (0.006) (0.005) (0.006) (0.007) (0.005)
N 307,526 260,209 261,771 256,548 252,669 247,364 245,228 240,193 234,728 231,303 225,570 221,296
  1. Notes: The table shows the results of regressing a dummy for crossing the level of growth corresponding to the notch (“Cross” as defined in the text) after (t) years on the “Near Notch” dummy for bins straddling the notch in the next year, controlling for growth-rate bins of width 0.1 and a quadratic in current receipts. The coefficients show a significant reduction in the probability of crossing the notch at all horizons. The sample includes charities within one log point of the notch in any starting year from 1990 to 1997 and growing by 0–1 log points. Standard errors are clustered by state. *** p < 0.01, ** p < 0.05, * p < 0.1.

The dynamic design can also enhance heterogeneity analysis. Researchers have estimated studied heterogeneity using the static estimator by splitting the sample, an approach that is not well suited to variables that are continuous or that vary over time, particularly those whose value may be affected by bunching. Note, however, that base-year characteristics should not be affected by nearing a notch in the next year, and hence interactions with base-year characteristics can be used to describe heterogeneity in the dynamic design. For example, a researcher using the dynamic design can test whether the extent of bunching varies with the level of base-year income rather than simply estimating the average extent. Using such interactions, Marx (2018) finds that the responses of charities to the reporting notch are related to both size and staffing, providing evidence of a fixed cost of compliance.

For the dynamic MLE, I use functions similar to those used in the simulation. Details appear in Supplementary Material C. The latent distribution is similar to the PSID distribution, but attrition is more common in the charity data. Attrition could be due to late filing, earning income below the level at which filing is required, shutting down, merging, or simply failing to comply with the reporting requirement. I allow attrition to vary according to a quadratic function of r it on either side of the threshold and to exhibit policy-related excess attrition among charities with latent growth that would raise their earnings above the notch.

Table 7 displays the maximum likelihood dynamic bunching estimates. The top panel of Table 7 shows estimates of parameters governing bunching and systematic attrition. Note that the parameters in this top panel cannot be estimated using the static approach. The first parameter gives the bunching propensity among charities that have base-year receipts below the notch. An estimated 2.6 percent of the charities that should enter the reduced range from below are instead bunching. The second row shows this bunching propensity is lower for those with current receipts above the notch (who have already filed Form 990 in the base year), suggesting that part of the cost of filing is a one-time adjustment. The third and fourth rows show excess attrition among charities that should be crossing the notch from below. This is estimated separately for those that would have entered the reduced range and those that would have grown to a point further above the notch, though these estimates turn out to be similar. Comparing the attrition and bunching propensities, the number of charities that avoid filing by bunching is dwarfed by the number leaving the data.

Table 7:

MLE estimates for charities.

Static Dynamic
1991–2007 1991–2007
Share bunching from below notch 0.026***
(0.003)
Share bunching from above notch 0.005***
(0.001)
Attrition of those crossing to reduced range 0.080***
(0.004)
Attrition of those crossing to higher incomes 0.093***
(0.005)
Excess mass just below the notch (×100) 0.194*** 0.103***
(0.017) (0.007)
Reduction in mass in reduced range (×100) 0.293*** 0.354***
(0.019) (0.012)
Bunching ratio 753*** 403***
(67) (65)
N 2,196,564 2,815,026
  1. Notes: The table shows the results of maximum likelihood dynamic bunching estimation along with static estimates for a comparable sample. The latter includes all charities appearing in the base year, while the former excludes charities that were missing or far above the notch in the next year. Estimates in the top panel describe effects of approaching the notch from below. Rows 1 and 2 show that charities bunch, and rows 3 and 4 show that a significant share of those that should grow to an income level above the notch instead exit from the sample. The lower panel shows that the static approach overestimates the excess number of organizations just below the notch and underestimates the number that should be just above it. Standard errors for dynamic estimates are calculated numerically. *** p < 0.01, ** p < 0.05, * p < 0.1.

The lower panel of Table 7 reveals the estimated excess share of charities below the notch and reduction in the share above it. The excess and reduction are found by aggregating the bunching and attrition propensities across all observations in the base year according to their counterfactual probability of moving to the reduced region in the next year. The dynamic estimates of the share of charities that appear in the bunching range exceeds by 0.103 percentage points (or roughly 200 charities per year) the quantity that would have been found in the absence of a response. Consistent with the difference in estimated magnitudes of the propensities to bunch or leave the sample, the reduction of mass in the range just above the notch is significantly greater at 0.366 percentage points. The static estimates of the excess and reduced mass appear to be biased towards each other, and both differ from the dynamic estimates by several times their standard errors. Finally, the bunching ratio compares the number of bunchers with the mass at the threshold, providing an estimate of the average dollar-value reduction in reported income among bunchers. The static estimate of the bunching ratio is roughly twice the size of the dynamic estimate.

Figure 4 shows the difference between dynamic and static approaches to estimation. The figure plots the density of receipts in the next year and estimated counterfactual densities. The MLE counterfactual is akin to what Kleven and Waseem (2013) refer to as “a ‘partial’ counterfactual stripped of intensive-margin responses only.” Due to the notch-related attrition, the counterfactual is not continuous at the notch. Thus, the smooth curve estimated with the static design underestimates the density below the notch and therefore overestimate the excess mass in the bunching region.

Figure 4: 
Non-smooth counterfactual distribution of charities’ income. Notes:The dynamic bunching estimation fits growth rates from each base year of data to the next year, and the figure shows the density of log gross receipts in the next year. Details of the dynamic estimates are provided in Section 3.3 and Appendix D. Both the dynamic and static estimates of the counterfactual fit the observed distribution fairly closely away from the notch. The counterfactual estimated using the dynamic MLE strategy is not smooth around the notch because it allows for notch-related attrition. The two estimation strategies imply different counterfactual distributions within the omitted range around the notch and therefore give different estimates of the amount of bunching. N = 2,815,026.
Figure 4:

Non-smooth counterfactual distribution of charities’ income. Notes:The dynamic bunching estimation fits growth rates from each base year of data to the next year, and the figure shows the density of log gross receipts in the next year. Details of the dynamic estimates are provided in Section 3.3 and Appendix D. Both the dynamic and static estimates of the counterfactual fit the observed distribution fairly closely away from the notch. The counterfactual estimated using the dynamic MLE strategy is not smooth around the notch because it allows for notch-related attrition. The two estimation strategies imply different counterfactual distributions within the omitted range around the notch and therefore give different estimates of the amount of bunching. N = 2,815,026.

6 Conclusions

This paper proposes new tools for analyzing bunching. Dynamic bunching estimation offers an identification strategy based on assumptions that are likely to hold more broadly than the assumptions required for static estimates. Theoretical examples and simulations demonstrate the robustness of dynamic estimators to factors that bias the static estimator. The differences between static and dynamic estimates can be quite large, as seen in an application to bunching at a reporting threshold for charities. Dynamic estimation also provides new opportunities to describe behavior by estimating notch-related attrition, extensive-margin responses, additional forms of preference heterogeneity, long-run effects of approaching a notch, and the effect of bunching in one year on income in subsequent years.

There are a number of diagnostics that researchers can use to test for bias in static bunching estimates. Researchers studying notches can, even with only a single cross section of data, perform a statistical test of whether the excess mass to one side of the notch equals the reduction in mass on the other side. This test can be generalized by estimating the counterfactual distributions separately on each side of the notch (Kopczuk and Munroe 2015). With repeated cross sections, the researcher can examine whether bunching or donut-RD estimates vary by year, and in particular whether there is accumulation of mass or growth in discontinuities over time. With panel data, the researcher can compare the static and dynamic OLS estimates to assess whether estimation with dynamic MLE appears warranted. Finally, with a notch that is moved or removed, the researcher can estimate serial dependence in the running variable.

In addition to relaxing identifying assumptions, dynamic bunching estimation can be used to learn more about agents’ responses to thresholds in policies and budget sets. For example, estimates of serial dependence can provide evidence about whether bunching reflects a reduction in income, misreporting, or retiming. Future research might apply these methods to questions such as the nature of optimization frictions or the degree to which individuals are forward looking. Another avenue for future research would be to develop methods for exploiting panel data, as this paper has done for notches, to analyze bunching at kinks.


Corresponding author: Benjamin M. Marx, Department of Economics, University of Illinois at Urbana-Champaign, 1407 W. Gregory, Urbana, IL 61801, USA, E-mail:

I am especially grateful to Bruce Kogut, Wojciech Kopczuk, Brendan O’Flaherty, Bernard Salanié, and Miguel Urquiola for invaluable advice. I also thank Raj Chetty, Julie Berry Cullen, Francois Gerard, Daniel Hungerman, Robert McMillan, Abigail Payne, Petra Persson, Jonah Rockoff, Ugo Troiano, Lesley Turner, Reed Walker, Caroline Weber, and seminar participants at universities and IIPF, U.S. Treasury OTA, NTA, APET, and MEA. I gratefully acknowledge the financial support of a Nonprofit Research Fellowship from the National Bureau of Economic Research and the Outstanding Dissertation Award from the National Tax Association.


References

Aronsson, Thomas, Katharina Jenderny, and Gauthier Lanot. 2022. “The Quality of the Estimators of the ETI.” Journal of Public Economics 212: 104679.10.1016/j.jpubeco.2022.104679Search in Google Scholar

Best, Michael, and Henrik Kleven. 2018. “Housing Market Responses to Transaction Taxes: Evidence from Notches and Stimulus in the UK.” The Review of Economic Studies 85: 157–93. https://doi.org/10.1093/restud/rdx032.Search in Google Scholar

Bertanha, Marinho, Andrew McCallum, and Nathan Seegert. 2023. “Better Bunching, Nicer Notching.” Journal of Econometrics 237 (2): 105512.10.1016/j.jeconom.2023.105512Search in Google Scholar

Best, Michael Carlos, Anne Brockmeyer, Henrik Jacobsen Kleven, Johannes Spinnewijn, and Mazhar Waseem. 2015. “Production versus Revenue Efficiency with Limited Tax Capacity: Theory and Evidence from Pakistan.” Journal of Political Economy 123 (6): 1311–55. https://doi.org/10.1086/683849.Search in Google Scholar

Blinder, Alan S., and Harvey S. Rosen. 1985. “Notches.” The American Economic Review 75 (4): 736–47.Search in Google Scholar

Blomquist, Sören, Whitney K. Newey, Anil Kumar, and Che-Yuan Liang. 2021. “On Bunching and Identification of the Taxable Income Elasticity.” Journal of Political Economy 129 (8): 2320–43.10.1086/714446Search in Google Scholar

Brülhart, Marius, Jonathan Gruber, Matthias Krapf, and Kurt Schmidheiny. 2016. “Taxing Wealth: Evidence from Switzerland.” NBER Working Paper No. 22376.10.3386/w22376Search in Google Scholar

Caetano, Carolina, Gregorio Caetano, Hao Fe, and Eric Nielsen. 2021. “Dummy Test Identification in Models with Bunching.” Finance and Economics Discussion Series 2021-068.10.17016/feds.2021.068Search in Google Scholar

Chen, Zhao, Zhikuo Liu, Juan Carlos Suárez Serrato, and Daniel Yi Xu. 2018. “Notching R&D Investment with Corporate Income Tax Cuts in China.” NBER Working Paper No. 24749.10.3386/w24749Search in Google Scholar

Chetty, Raj, John N. Friedman, and Emmanuel Saez. 2013. “Using Differences in Knowledge Across Neighborhoods to Uncover the Impacts of the EITC on Earnings.” The American Economic Review 103 (7): 2683–721. https://doi.org/10.1257/aer.103.7.2683.Search in Google Scholar

Chetty, Raj, Tore Olsen, Luigi Pistaferri, and John N. Friedman. 2011. “Adjustment Costs, Firm Responses, and Micro vs. Macro Labor Supply Elasticities: Evidence from Danish Tax Records.” Quarterly Journal of Economics 126: 749–804. https://doi.org/10.1093/qje/qjr013.Search in Google Scholar PubMed PubMed Central

Diamond, Rebecca, and Petra Persson. 2016. “The Long-Term Consequences of Teacher Discretion in Grading of High-Stakes Tests.” NBER Working Papers No. 22207.10.3386/w22207Search in Google Scholar

Einav, Liran, Amy Finkelstein, and Paul Schrimpf. 2017. “Bunching at the Kink: Implications for Spending Responses to Health Insurance Contracts.” Journal of Public Economics 146: 27–40. https://doi.org/10.1016/j.jpubeco.2016.11.011.Search in Google Scholar PubMed PubMed Central

Gelber, Alexander, Damon Jones, and Dan Sacks. 2020. “Estimating Earnings Adjustment Frictions: Method and Evidence from the Social Security Earnings Test.” American Economic Journal: Applied Economics 12 (1): 1–31. https://doi.org/10.1257/app.20170717.Search in Google Scholar

Gelber, Alexander M., Damon Jones, Daniel W. Sacks, and Jae Song. 2021. “Using Non-Linear Budget Sets to Estimate Extensive Margin Responses: Method and Evidence from the Earnings Test.” American Economic Journal: Applied Economics 13 (4): 150–93, https://doi.org/10.1257/app.20180811.Search in Google Scholar

Giertz, Seth. 2008. “Panel Data Techniques and the Elasticity of Taxable Income.” CBO Working Paper 2008-20. Washington: Congressional Budget Office.Search in Google Scholar

Gruber, Jonathan, and Emmanuel Saez. 2002. “The Elasticity of Taxable Income: Evidence and Implications.” Journal of Public Economics 84 (1): 1–32. https://doi.org/10.1016/s0047-2727(01)00085-8.Search in Google Scholar

Kleven, Henrik. 2016. “Bunching.” Annual Review of Economics 8: 435–64. https://doi.org/10.1146/annurev-economics-080315-015234.Search in Google Scholar

Kleven, Henrik J., and Mazhar Waseem. 2013. “Using Notches to Uncover Optimization Frictions and Structural Elasticities: Theory and Evidence from Pakistan.” Quarterly Journal of Economics 128 (2): 669–723. https://doi.org/10.1093/qje/qjt004.Search in Google Scholar

Kopczuk, W. 2005. “Tax Bases, Tax Rates and the Elasticity of Reported Income.” Journal of Public Economics 89 (11–12): 2093–119. https://doi.org/10.1016/j.jpubeco.2004.12.005.Search in Google Scholar

Kopczuk, Wojciech, and David Munroe. 2015. “Mansion Tax: The Effect of Transfer Taxes on the Residential Real Estate Market.” American Economic Journal: Economic Policy 7 (2): 214–57. https://doi.org/10.1257/pol.20130361.Search in Google Scholar

Kotz, Samuel, Tomasz J. Kozubowski, and Krzysztof Podgórski. 2001. The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance. Basel, Switzerland: Birkhauser.10.1007/978-1-4612-0173-1Search in Google Scholar

Kozubowski, Tomasz J., and Saralees Nadarajah. 2010. “Multitude of Laplace Distributions.” Statistical Papers 51 (1): 127–48. https://doi.org/10.1007/s00362-008-0127-2.Search in Google Scholar

Marx, Benjamin M. 2012. “Regulatory Hurdles and Growth of Charitable Organizations: Evidence from a Dynamic Bunching Design.” http://www.columbia.edu/˜bmm2126/papers/BMM-Dynamic_Bunchin g.pdf.Search in Google Scholar

Marx, Benjamin M. 2015. “Dynamic Bunching Estimation and the Cost of a Regulatory Hurdle for Charities.” https://tinyurl.com/y5cjmo96.Search in Google Scholar

Marx, Benjamin M. 2018. “The Cost of Requiring Charities to Report Financial Information.” MPRA Paper No. 88660.Search in Google Scholar

Mortenson, Jacob, and Andrew Whitten. 2020. “Bunching to Maximize Tax Credits: Evidence from Kinks in the U.S. Tax Schedule.” American Economic Journal: Economic Policy 12 (3): 402–32.10.1257/pol.20180054Search in Google Scholar

Saez, Emmanuel. 2010. “Do Taxpayers Bunch at Kink Points?” American Economic Journal: Economic Policy 2 (3): 180–212. https://doi.org/10.1257/pol.2.3.180.Search in Google Scholar

Salamon, Lester M., and Chelsea L. Newhouse. 2019. “The 2019 Nonprofit Employment Report.” Nonprofit Economic Data Bulletin no. 47. Baltimore: Johns Hopkins Center for Civil Society Studies, ccss.jhu.edu.Search in Google Scholar

Seim, David. 2017. “Behavioral Responses to Wealth Taxes: Evidence from Sweden.” American Economic Journal: Economic Policy 9 (4): 395–421, https://doi.org/10.1257/pol.20150290.Search in Google Scholar

Slemrod, Joel. 2010. “Buenas Notches: Lines and Notches in Tax System Design.” Mimeo: University of Michigan.Search in Google Scholar

St.Clair, Travis. 2016. “How Do Nonprofits Respond to Regulatory Thresholds: Evidence from New York’s Audit Requirements.” Journal of Policy Analysis and Management 35: 772–90. https://doi.org/10.1002/pam.21931.Search in Google Scholar

Weber, Caroline. 2014. “Toward Obtaining a Consistent Estimate of the Elasticity of Taxable Income Using Difference-In-Differences.” Journal of Public Economics 117: 90–103. https://doi.org/10.1016/j.jpubeco.2014.05.004.Search in Google Scholar


Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/jem-2022-0031).


Received: 2022-10-06
Accepted: 2024-03-21
Published Online: 2024-05-28

© 2024 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 11.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/jem-2022-0031/html
Scroll to top button