1.1Background and Motivation Impact evaluation studies are designed to get at the impact of a policy or treatment. The aim is often to assess the viability or success of an intervention. These studies are also useful in cost-benefit analyses for gauging the size of the benefit attributable to an intervention. It is therefore important for an estimated treatment effect to be as free from bias as possible. The object of interest is often the Average Treatment Effect on the Treated (ATT), which focuses explicitly on the effect of treatment on those for whom the intervention is intended. To investigate this, an estimation strategy must solve a missing data problem (Rosenbaum & Rubin, 1983). The central question in this regard is: What would have been the outcome of the treated observations had they not been treated? This is known as the counterfactual outcome. Since the counterfactual outcome is not observable, the estimation process must be able to predict the counterfactual outcome as accurately as possible, from a control sample. In order for a control sample to accurately predict the counterfactual outcome, it must be a good match for the treatment sample. This requires comparability in the covariates of the treatment and the control group, which is referred to as the balancing condition. The ideal way to achieve balance is through a Randomized Control Trial (RCT). This is because, with randomization, the treated and untreated units are drawn from the same population, at random. This ensures that the treatment and control samples have identical distributions of covariates (or are balanced in expectation) in both observed and unobserved covariates. The control sample in this scenario therefore provides the appropriate counterfactual for the treatment group (provided there are no attrition/compliance problems). Randomization is, however, not always possible so that, in most cases, estimation is based on an observational study or quasi-experiment. The key challenge for observational studies, therefore, is to replicate the kind of result one would expect from a randomized experiment. For this to be successful, a balanced sample is key, because it guarantees that like is compared with like in observables which, by extension, suggests that the same is true for unobservables. This will be the case when the unobservables are correlated with the observables (Imai et. al, 2008). It is important to note here that this balance (under randomization) should be in terms of distribution and not just in some moments, like mean and variance. When a control group that balances the 2 distribution of covariates in the treatment group is used in evaluation, the treatment effect will be unbiased and robust across econometric methods, as one would expect from randomized data. The implication of this is that any inference is based solely on the data, and that it does not rely on model assumption (or model specification). Broadly speaking, imbalance refers to any difference in the distribution of covariates across treatment arms. However, the term imbalance is commonly used to refer to differences in averages (Gelman & Hill, 2007). The evaluation literature in the fields of statistics and economics often uses terms like lack of support/ lack of complete overlap/ violation of common support and imbalance in different ways. For example, Gelman & Hill (2007; chapter 10) refer to two sorts of departures from comparability in the distribution of covariates as imbalance and lack of complete overlap separating the two concepts1 . They note that imbalance does not necessarily imply lack of complete overlap, and vice versa. Hill & Su (2013) on the other hand, note that failure to satisfy the common support condition can lead to unresolved imbalance (for matching methods), suggesting that one can think of lack of support as a form of imbalance. Furthermore, Imbens & Rubin (2009; chapter 15) refer to lack of support as an extreme case of imbalance. In this thesis, I use the term imbalance to refer to any difference in covariate distributions across treatment arms. This can therefore be differences in mean, variance, differences in other moments apart from the first and second moments, or differences in support. This may manifest as thin/ no support problems (Lechner & Strittmatter, 2009) in finite samples. I note that, if the problem is thin support or no support, it may not become evident when the mean or mean and variance of distributions are compared.
Africa, P. & Oyenubi, A (2021). Quantifying balance for causal inference: An information theoretic perspective. Afribary. Retrieved from https://afribary.com/works/quantifying-balance-for-causal-inference-an-information-theoretic-perspective
Africa, PSN, and Adeola Oyenubi "Quantifying balance for causal inference: An information theoretic perspective" Afribary. Afribary, 19 Apr. 2021, https://afribary.com/works/quantifying-balance-for-causal-inference-an-information-theoretic-perspective. Accessed 23 Nov. 2024.
Africa, PSN, and Adeola Oyenubi . "Quantifying balance for causal inference: An information theoretic perspective". Afribary, Afribary, 19 Apr. 2021. Web. 23 Nov. 2024. < https://afribary.com/works/quantifying-balance-for-causal-inference-an-information-theoretic-perspective >.
Africa, PSN and Oyenubi, Adeola . "Quantifying balance for causal inference: An information theoretic perspective" Afribary (2021). Accessed November 23, 2024. https://afribary.com/works/quantifying-balance-for-causal-inference-an-information-theoretic-perspective