首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
To estimate model parameters from complex sample data. we apply maximum likelihood techniques to the complex sample data from the finite population, which is treated as a sample from an i nfinite superpopulation. General asymptotic distribution theory is developed and then applied to both logistic regression and discrete proportional hazards models. Data from the Lipid Research Clinics Program areused to illustrate each model, demonstrating the effects on inference of neglecting the sampling design during parameter estimation. These empirical results also shed light on the issue of model-based vs. design-based inferences.  相似文献   

2.
The problem of sample selection, when a one-stage superpopulation model-based approach is used to predict individual variate values for each unit in a finite population based on a sample of only some of the units, is investigated. The model framework is discussed and a sample selection scheme based on the model is derived. The sample selection scheme is evaluated using actual data. Future research topics including multiple predictions per unit are suggested.  相似文献   

3.
Suppose that a finite population consists of N distinct units. Associated with the ith unit is a polychotomous response vector, d i , and a vector of auxiliary variable x i . The values x i ’s are known for the entire population but d i ’s are known only for the units selected in the sample. The problem is to estimate the finite population proportion vector P. One of the fundamental questions in finite population sampling is how to make use of the complete auxiliary information effectively at the estimation stage. In this article a predictive estimator is proposed which incorporates the auxiliary information at the estimation stage by invoking a superpopulation model. However, the use of such estimators is often criticized since the working superpopulation model may not be correct. To protect the predictive estimator from the possible model failure, a nonparametric regression model is considered in the superpopulation. The asymptotic properties of the proposed estimator are derived and also a bootstrap-based hybrid re-sampling method for estimating the variance of the proposed estimator is developed. Results of a simulation study are reported on the performances of the predictive estimator and its re-sampling-based variance estimator from the model-based viewpoint. Finally, a data survey related to the opinions of 686 individuals on the cause of addiction is used for an empirical study to investigate the performance of the nonparametric predictive estimator from the design-based viewpoint.  相似文献   

4.
Exact and approximate Bayesian inference is developed for the prediction problem in finite populations under a linear functional superpopulation model. The models considered are the usual regression models involving two variables, X and Y, where the independent variable X is measured with error. The approach is based on the conditional distribution of Y given X and our predictor is the posterior mean of the quantity of interest (population total and population variance) given the observed data. Empirical investigations about optimal purposive samples and possible model misspecifications based on comparisons with the corresponding models where X is measured without error are also reported.  相似文献   

5.
In this article, we consider Bayes prediction in a finite population under the simple location error-in-variables superpopulation model. Bayes predictor of the finite population mean under Zellner's balanced loss function and the corresponding relative losses and relative savings loss are derived. The prior distribution of the unknown location parameter of the model is assumed to have a non-normal distribution belonging to the class of Edgeworth series distributions. Effects of non normality of the “true” prior distribution and that of a possible misspecification of the loss function on the Bayes predictor are illustrated for a hypothetical population.  相似文献   

6.
This paper considers the effects of informative two-stage cluster sampling on estimation and prediction. The aims of this article are twofold: first to estimate the parameters of the superpopulation model for two-stage cluster sampling from a finite population, when the sampling design for both stages is informative, using maximum likelihood estimation methods based on the sample-likelihood function; secondly to predict the finite population total and to predict the cluster-specific effects and the cluster totals for clusters in the sample and for clusters not in the sample. To achieve this we derive the sample and sample-complement distributions and the moments of the first and second stage measurements. Also we derive the conditional sample and conditional sample-complement distributions and the moments of the cluster-specific effects given the cluster measurements. It should be noted that classical design-based inference that consists of weighting the sample observations by the inverse of sample selection probabilities cannot be applied for the prediction of the cluster-specific effects for clusters not in the sample. Also we give an alternative justification of the Royall [1976. The linear least squares prediction approach to two-stage sampling. Journal of the American Statistical Association 71, 657–664] predictor of the finite population total under two-stage cluster population. Furthermore, small-area models are studied under informative sampling.  相似文献   

7.
This paper considers residuals for time series regression. Despite much literature on visual diagnostics for uncorrelated data, there is little on the autocorrelated case. To examine various aspects of the fitted time series regression model, three residuals are considered. The fitted regression model can be checked using orthogonal residuals; the time series error model can be analysed using marginal residuals; and the white noise error component can be tested using conditional residuals. When used together, these residuals allow identification of outliers, model mis‐specification and mean shifts. Due to the sensitivity of conditional residuals to model mis‐specification, it is suggested that the orthogonal and marginal residuals be examined first.  相似文献   

8.
When the finite population ‘totals’ are estimated for individual areas, they do not necessarily add up to the known ‘total’ for all areas. Benchmarking (BM) is a technique used to ensure that the totals for all areas match the grand total, which can be obtained from an independent source. BM is desirable to practitioners of survey sampling. BM shifts the small-area estimators to accommodate the constraint. In doing so, it can provide increased precision to the small-area estimators of the finite population means or totals. The Scott–Smith model is used to benchmark the finite population means of small areas. This is a one-way random effects model for a superpopulation, and it is computationally convenient to use a Bayesian approach. We illustrate our method by estimating body mass index using data in the third National Health and Nutrition Examination Survey. Several properties of the benchmarked small-area estimators are obtained using a simulation study.  相似文献   

9.
Linear mixed models are widely used when multiple correlated measurements are made on each unit of interest. In many applications, the units may form several distinct clusters, and such heterogeneity can be more appropriately modelled by a finite mixture linear mixed model. The classical estimation approach, in which both the random effects and the error parts are assumed to follow normal distribution, is sensitive to outliers, and failure to accommodate outliers may greatly jeopardize the model estimation and inference. We propose a new mixture linear mixed model using multivariate t distribution. For each mixture component, we assume the response and the random effects jointly follow a multivariate t distribution, to conveniently robustify the estimation procedure. An efficient expectation conditional maximization algorithm is developed for conducting maximum likelihood estimation. The degrees of freedom parameters of the t distributions are chosen data adaptively, for achieving flexible trade-off between estimation robustness and efficiency. Simulation studies and an application on analysing lung growth longitudinal data showcase the efficacy of the proposed approach.  相似文献   

10.
This article considers optimal prediction of the finite population distribution function under Gaussian superpopulation models, which allows auxiliary prior information to be incorporated into the estimation process. Large sample approximations for the variance of the optimal predictors are derived in some special important cases. A small scale Monte Carlo study illustrates comparisons between the optimal predictor and some others which are proposed in the literature. The conclusion is that the optimal predictor can be considerably more efficient in situations where the normal superpopulation model is adequate.  相似文献   

11.
This paper examines strategies for estimating the mean of a finite population in the following situation: A linear regression model is assumed to describe the population scatter. Various estimators β for the vector of regression parameters β are considered. Several ways of transforming each estimator β into a model-based estimator for the population mean are considered. Some estimators constructed in this way become sensitive to correctness of the assumed model. The estimators favoured in this paper are the ones in which the observations are weighted to reflect the sampling design, so that asymptotic design unbiasedness is achieved. For these estimators, the randomization distribution gives protection against model breakdown.  相似文献   

12.
Prediction of random effects is an important problem with expanding applications. In the simplest context, the problem corresponds to prediction of the latent value (the mean) of a realized cluster selected via two-stage sampling. Recently, Stanek and Singer [Predicting random effects from finite population clustered samples with response error. J. Amer. Statist. Assoc. 99, 119–130] developed best linear unbiased predictors (BLUP) under a finite population mixed model that outperform BLUPs from mixed models and superpopulation models. Their setup, however, does not allow for unequally sized clusters. To overcome this drawback, we consider an expanded finite population mixed model based on a larger set of random variables that span a higher dimensional space than those typically applied to such problems. We show that BLUPs for linear combinations of the realized cluster means derived under such a model have considerably smaller mean squared error (MSE) than those obtained from mixed models, superpopulation models, and finite population mixed models. We motivate our general approach by an example developed for two-stage cluster sampling and show that it faithfully captures the stochastic aspects of sampling in the problem. We also consider simulation studies to illustrate the increased accuracy of the BLUP obtained under the expanded finite population mixed model.  相似文献   

13.
The Generalized regression estimator (GREG) of a finite population mean or total has been shown to be asymptotically optimal when the working linear regression model upon which it is based includes variables related to the sampling design. In this paper a regression estimator assisted by a linear mixed superpopulation model is proposed. It accounts for the extra information coming from the design in the random component of the model and saves degrees of freedom in finite sample estimation. This procedure combines the larger asymptotic efficiency of the optimal estimator and the greater finite sample stability of the GREG. Design based properties of the proposed estimator are discussed and a small simulation study is conducted to explore its finite sample performance.  相似文献   

14.
Abstract.  The goodness-of-fit of the distribution of random effects in a generalized linear mixed model is assessed using a conditional simulation of the random effects conditional on the observations. Provided that the specified joint model for random effects and observations is correct, the marginal distribution of the simulated random effects coincides with the assumed random effects distribution. In practice, the specified model depends on some unknown parameter which is replaced by an estimate. We obtain a correction for this by deriving the asymptotic distribution of the empirical distribution function obtained from the conditional sample of the random effects. The approach is illustrated by simulation studies and data examples.  相似文献   

15.
Kernel density estimation has been used with great success with data that may be assumed to be generated from independent and identically distributed (iid) random variables. The methods and theoretical results for iid data, however, do not directly apply to data from stratified multistage samples. We present finite-sample and asymptotic properties of a modified density estimator introduced in Buskirk (Proceedings of the Survey Research Methods Section, American Statistical Association (1998), pp. 799–801) and Bellhouse and Stafford (Statist. Sin. 9 (1999) 407–424); this estimator incorporates both the sampling weights and the kernel weights. We present regularity conditions which lead the sample estimator to be consistent and asymptotically normal under various modes of inference used with sample survey data. We also introduce a superpopulation structure for model-based inference that allows the population model to reflect naturally occurring clustering. The estimator, and confidence bands derived from the sampling design, are illustrated using data from the US National Crime Victimization Survey and the US National Health and Nutrition Examination Survey.  相似文献   

16.
The main goal in small area estimation is to use models to ‘borrow strength’ from the ensemble because the direct estimates of small area parameters are generally unreliable. However, model-based estimates from the small areas do not usually match the value of the single estimate for the large area. Benchmarking is done by applying a constraint, internally or externally, to ensure that the ‘total’ of the small areas matches the ‘grand total’. This is particularly useful because it is difficult to check model assumptions owing to the sparseness of the data. We use a Bayesian nested error regression model, which incorporates unit-level covariates and sampling weights, to develop a method to internally benchmark the finite population means of small areas. We use two examples to illustrate our method. We also perform a simulation study to further assess the properties of our method.  相似文献   

17.
In this article, utilizing a scale mixture of skew-normal distribution in which mixing random variable is assumed to follow a mixture model with varying weights for each observation, we introduce a generalization of skew-normal linear regression model with the aim to provide resistant results. This model, which also includes the skew-slash distribution in a particular case, allows us to accommodate and detect outlying observations under the skew-normal linear regression model. Inferences about the model are carried out through the empirical Bayes approach. The conditions for propriety of the posterior and for existence of posterior moments are given under the standard noninformative priors for regression and scale parameters as well as proper prior for skewness parameter. Then, for Bayesian inference, a Markov chain Monte Carlo method is described. Since posterior results depend on the prior hyperparameters, we estimate them adopting the empirical Bayes method as well as using a Monte Carlo EM algorithm. Furthermore, to identify possible outliers, we also apply the Bayes factor obtained through the generalized Savage-Dickey density ratio. Examining the proposed approach on simulated instance and real data, it is found to provide not only satisfactory parameter estimates rather allow identifying outliers favorably.  相似文献   

18.
Conventional approaches for inference about efficiency in parametric stochastic frontier (PSF) models are based on percentiles of the estimated distribution of the one-sided error term, conditional on the composite error. When used as prediction intervals, coverage is poor when the signal-to-noise ratio is low, but improves slowly as sample size increases. We show that prediction intervals estimated by bagging yield much better coverages than the conventional approach, even with low signal-to-noise ratios. We also present a bootstrap method that gives confidence interval estimates for (conditional) expectations of efficiency, and which have good coverage properties that improve with sample size. In addition, researchers who estimate PSF models typically reject models, samples, or both when residuals have skewness in the “wrong” direction, i.e., in a direction that would seem to indicate absence of inefficiency. We show that correctly specified models can generate samples with “wrongly” skewed residuals, even when the variance of the inefficiency process is nonzero. Both our bagging and bootstrap methods provide useful information about inefficiency and model parameters irrespective of whether residuals have skewness in the desired direction.  相似文献   

19.
Count data are routinely assumed to have a Poisson distribution, especially when there are no straightforward diagnostic procedures for checking this assumption. We reanalyse two data sets from crossover trials of treatments for angina pectoris , in which the outcomes are counts of anginal attacks. Standard analyses focus on treatment effects, averaged over subjects; we are also interested in the dispersion of these effects (treatment heterogeneity). We set up a log-Poisson model with random coefficients to estimate the distribution of the treatment effects and show that the analysis is very sensitive to the distributional assumption; the population variance of the treatment effects is confounded with the (variance) function that relates the conditional variance of the outcomes, given the subject's rate of attacks, to the conditional mean. Diagnostic model checks based on resampling from the fitted distribution indicate that the default choice of the Poisson distribution for the analysed data sets is poorly supported. We propose to augment the data sets with observations of the counts, made possibly outside the clinical setting, so that the conditional distribution of the counts could be established.  相似文献   

20.
We propose alternative approaches to analyze residuals in binary regression models based on random effect components. Our preferred model does not depend upon any tuning parameter, being completely automatic. Although the focus is mainly on accommodation of outliers, the proposed methodology is also able to detect them. Our approach consists of evaluating the posterior distribution of random effects included in the linear predictor. The evaluation of the posterior distributions of interest involves cumbersome integration, which is easily dealt with through stochastic simulation methods. We also discuss different specifications of prior distributions for the random effects. The potential of these strategies is compared in a real data set. The main finding is that the inclusion of extra variability accommodates the outliers, improving the adjustment of the model substantially, besides correctly indicating the possible outliers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号