You can edit almost every page by Creating an account and confirming your email.

Clustering standard errors

From EverybodyWiki Bios & Wiki




Clustering standard errors is a technique used in regression analysis, especially of panel data, when the assumption of independence of standard errors across some dimension or subgroup is thought to be or known to be violated. Clustering means that the variance-covariance matrix of standard errors is not assumed to be diagonal, but rather block-diagonal, with non-zero coefficients estimated.

The need for clustering most commonly arises when explanatory variables are at a higher aggregation level than dependent variables. Standard regression estimates in this case produce standard errors that are biased downwards, leading to erroneous conclusions of statistical significance of the coefficients.,[1]

Most statistical packages make it easy to cluster standard errors,[2] but the researcher needs to choose the level, dimension, or subset of data for clustering.

The general rules for clustering standard errors correctly are:

  • If explanatory variables are at the higher level of aggregation, standard errors need to be clustered at this same level of aggregation.[1] For example, if the analysis is conducted at firm-year level and an explanatory variable of interest is at industry-year level, the correct clustering is at industry-year level.
  • If there is a suspicion that errors might be correlated across observations for some other reason, clustering at a more aggregated level might be warranted. In an example above, if one believes that errors may be correlated across time, the appropriate clustering would be at industry level. This type of clustering also provides the test for the correlation in standard errors. If with increased aggregation level of clustering standard errors do not increase, this could be used as an indication of independence of standard errors across this dimension and therefore no need for clustering.

The more aggregated clustering level is, the more conservative it is. Continuing with the previous example, firm-level clustering (with correlation across years) is assuming more zeros in the variance-covariance matrix and therefore is less conservative than industry-level clustering.

In recent years,[when?] double-clustering, or clustering among more than one dimension was developed for multidimensional arrays.[3][4] This is useful if correlation is suspected in multiple dimensions or explanatory variables of interest are aggregated in different dimensions. In our example, if some explanatory variables of interest are aggregated at industry level and others aggregated by firm over time, appropriate clustering is at (industry, year) level. This double-clustering is more conservative than industry-year clustering.

References

  1. 1.0 1.1 Moulton, Brent R, 1990. "An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Unit," The Review of Economics and Statistics, MIT Press, vol. 72(2), pages 334-338, May.
  2. "vce_options – variance estimators" (PDF). Stata.com. Retrieved November 24, 2018.
  3. Cameron, A., Gelbach, J., & Miller, D. (2011). Robust Inference With Multiway Clustering. Journal of Business & Economic Statistics, 29(2), 238-249. Retrieved from https://www.jstor.org/stable/25800796
  4. Samuel B. Thompson, 2011, Simple formulas for standard errors that cluster by both firm and time, Journal of Financial Economics, Volume 99, Issue 1, Pages 1-10,


This article "Clustering standard errors" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Clustering standard errors. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.