Small area estimation methods under complex sampling designs

  1. Guadarrama Sanz, María
Dirixida por:
  1. Isabel Molina Peralta Director

Universidade de defensa: Universidad Carlos III de Madrid

Fecha de defensa: 04 de decembro de 2017

Tribunal:
  1. María José Lombardía Presidente/a
  2. Juan Miguel Marín Díazaraque Secretario
  3. Carolina Franco Vogal

Tipo: Tese

Resumo

The aim of this thesis is the study of small area estimation methods under outcome-dependent sampling designs, that is, when the selection of the units to the sample depends on their values of the variable of interest. More precisely, we consider two types of informative sampling designs. A first type, in which the inclusion probabilities are strictly positive for all population units and cut-off sampling, in which a grouping variable related with the variable of interest divides the population in two strata, with one of the strata being deliverately excluded from selection to the sample, that is, where inclusion probabilities are zero. We are specially interested in the estimation of general non-linear parameters, including poverty indicators, in areas or domains of the population with small sample sizes. Due to the small area sample sizes, we will use model-based methods, which borrow strength from all the domains through the assumption of models with common parameters for all the domains. First, we review the main model-based small area estimation methods for the estimation of general nonlinear parameters, focusing for illustration purposes on particular poverty indicators. We describe direct estimation, which uses data only of the area of study, the empirical best linear unbiased predictor (EBLUP) under the Fay-Herriot at area level model (Fay and Herriot, 1979) and three methods based on unit-level models, namely the method of Elbers et al. (2003) used traditionally by the World Bank, the empirical best/Bayes (EB) method of Molina and Rao (2010) and the hierarchical Bayes proposal of Molina et al. (2014). We put ourselves in the point of view of a practitioner and discuss, as objectively as possible, the benefits and drawbacks of each method, illustrating some of them through simulation studies and also by an application with real data. In one of the mentioned simulation experiments, we study the performance of the considered estimators under informative sampling. Under informative selection, individuals with certain outcome values appear more often in the sample and, as a consequence, usual inference based on the actual sample without appropriate weighting might be strongly biased. In this dissertation, we propose an extension of the EB method, called pseudo EB (PEB) method, for estimation of general non-linear parameters in small areas that handles the informative selection by incorporating the sampling weights. We analyze the properties of this method under complex sampling designs, including informative selection. Results confirm that the PEB estimators reduce significantly the bias of unweighted EB estimators under informative sampling, and compare favorably under non-informative selection. We illustrate the procedure through an application to poverty mapping in a Mexican state. Additionally, we study small area estimation methods under cut-off sampling. This sampling technique consists of excluding a set of units from the selection to the sample due to difficulty in obtaining information from them. In that situation, naïve estimators, obtained by ignoring the cut-off sampling, may be severely design-biased. Calibration estimators using auxiliary information have been proposed to reduce this design-bias. However, the resulting estimators may have large variances when estimating in small domains. Similarly as calibration, model-based small area estimation methods might also help decreasing this bias if the assumed model holds for the whole population. At the same time, these methods provide more efficient estimators than calibration when estimating in small domains. We compare the performance of calibration estimators with the EBLUP or the EB predictors for estimation in small domains under cut-off sampling through simulation studies and a real data application. Our results confirm that the EBLUP under simple random sampling without replacement applied to the non-excluded units helps to reduce the bias due to cut-off sampling. The EBLUP also performs significantly better than na¨ıve direct and calibration estimators in terms of mean squared error. Our results with real data suggest similar conclusions for the EB estimators of nonlinear domain parameters.