Robust estimation and inference in single-index varying coefficient regression models with responses missing at random

Otlaadisa, Masego 119 PAGES (37241 WORDS) Statistics Paper

Abstract:

Nowadays, when collecting data, it has become almost unavoidable to end up with missing data due to various reasons. Very often, these reasons are out of the control of the investigator. For regression settings, data can be missing in the response space, covariate

space, or in both the response and covariate spaces. Such missingness can be related to

different missing data mechanisms. The most commonly encountered missingness mechanisms in the literature include missing completely at random (MCAR), missing at random

(MAR), and missing not at random (MNAR). In data analytics, the handling of missing

data has captured much attention in the statistical community. The classical approach

for handling missing data involves the complete case analysis, which ignores observations

with missing information in the study. This approach has been demonstrated to result

in biased and/or less efficient estimates, especially when the proportion of missing data

is significant. Hence, it is of utmost importance to develop methodologies for handling

missing data for better statistical inference.

The main objective of this thesis is to derive robust and efficient estimates and make

inferences in a single-index varying coefficient regression model (SIVCM) and its special

case when some responses are assumed to be missing at random. The SIVCM has gained

popularity in recent years due to its flexibility, interpretability and ability to overcome the

curse of dimensionality. It has been used in many areas such as ecology, medical science,

epidemiology, economics, finance, politics, and so on, to capture and model changing

patterns.

This thesis considers two approaches of handling point and interval estimations of

parameters in SIVCM with responses missing at random: the rank-based normal approx imation approach and the rank-based empirical likelihood approach. The consistency

and asymptotic normality of the rank-based normal approximation estimators are es tablished under certain mild regularity conditions. On the other hand, under the same

mild regularity conditions, asymptotic chi-square distributions of the rank-based empiri cal likelihood functions are established. Furthermore, robust confidence regions/intervals

of the true model parameters are derived. Monte Carlo simulation studies are carried

out and show that the proposed methods result in robust and more efficient estimators

for the rank-based normal approximation approach when compared to the least squares

iv

and least absolute deviations methods, when dealing with heavy tailed, contaminated

model error distributions and/or when data contain gross outliers in the response space.

Generally, the same experiments show that the proposed empirical likelihood approaches

for interval estimation perform better when compared to their normal approximations

counterpart.