首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In outcome‐dependent sampling, the continuous or binary outcome variable in a regression model is available in advance to guide selection of a sample on which explanatory variables are then measured. Selection probabilities may either be a smooth function of the outcome variable or be based on a stratification of the outcome. In many cases, only data from the final sample is accessible to the analyst. A maximum likelihood approach for this data configuration is developed here for the first time. The likelihood for fully general outcome‐dependent designs is stated, then the special case of Poisson sampling is examined in more detail. The maximum likelihood estimator differs from the well‐known maximum sample likelihood estimator, and an information bound result shows that the former is asymptotically more efficient. A simulation study suggests that the efficiency difference is generally small. Maximum sample likelihood estimation is therefore recommended in practice when only sample data is available. Some new smooth sample designs show considerable promise.  相似文献   

2.
The Generalized regression estimator (GREG) of a finite population mean or total has been shown to be asymptotically optimal when the working linear regression model upon which it is based includes variables related to the sampling design. In this paper a regression estimator assisted by a linear mixed superpopulation model is proposed. It accounts for the extra information coming from the design in the random component of the model and saves degrees of freedom in finite sample estimation. This procedure combines the larger asymptotic efficiency of the optimal estimator and the greater finite sample stability of the GREG. Design based properties of the proposed estimator are discussed and a small simulation study is conducted to explore its finite sample performance.  相似文献   

3.
随着大数据和网络的不断发展,网络调查越来越广泛,大部分网络调查样本属于非概率样本,难以采用传统的抽样推断理论进行推断,如何解决网络调查样本的推断问题是大数据背景下网络调查发展的迫切需求。本文首次从建模的角度提出了解决该问题的基本思路:一是入样概率的建模推断,可以考虑构建基于机器学习与变量选择的倾向得分模型来估计入样概率推断总体;二是目标变量的建模推断,可以考虑直接对目标变量建立参数、非参数或半参数超总体模型进行估计;三是入样概率与目标变量的双重建模推断,可以考虑进行倾向得分模型与超总体模型的加权估计与混合推断。最后,以基于广义Boosted模型的入样概率建模推断为例演示了具体解决方法。  相似文献   

4.
Under the, notion of superpopulation models, the concept of minimum expected variance is adopted as an optimality criterion for design-unbiased estimators, i.e. unbiased under repeated sampling. In this article, it is shown that the Horvitz-Thompson estimator is optimal among such estimators if and only if it is model-unbiased, i.e. unbiased under the model. The family of linear models is considered and a sample design is suggested to preserve the model-unbiasedness (and hence the optimality) of the Horvitz-Thompson estimator. It is also shown that under these models the Horvitz-Thompson estimator together with the suggested sample design is optimal among design-unbiased estimators with any sample design (of fixed size n ) having non-zero probabilities of inclusion for all population units.  相似文献   

5.
容越彦  陈光慧 《统计研究》2015,32(12):88-94
在总结现有模型辅助估计方法的基础上,本文通过构造一种半参数超总体模型,同时结合广义差分估计思想提出一种新型的模型辅助估计量。该估计量比传统的非参数和半参数回归估计利用更少、更易得到的辅助信息,即只需利用和广义回归估计相同的辅助信息,但一般会比广义回归估计拥有更高的估计精度。理论证明了该估计量是渐近设计无偏和设计一致的,其渐近设计均方误差为广义差分估计量的方差。模拟结果显示:其至少与广义回归估计一样好;对于线性程度越低的超总体模型,其估计精度比广义回归估计有越明显的提高;就本文模拟而言,光滑参数在0.04~0.12间适当取值时其会取到相对较好的估计效果。  相似文献   

6.
Modeling survey data often requires having the knowledge of design and weighting variables. With public-use survey data, some of these variables may not be available for confidentiality reasons. The proposed approach can be used in this situation, as long as calibrated weights and variables specifying the strata and primary sampling units are available. It gives consistent point estimation and a pivotal statistics for testing and confidence intervals. The proposed approach does not rely on with-replacement sampling, single-stage, negligible sampling fractions, or noninformative sampling. Adjustments based on design effects, eigenvalues, joint-inclusion probabilities or bootstrap, are not needed. The inclusion probabilities and auxiliary variables do not have to be known. Multistage designs with unequal selection of primary sampling units are considered. Nonresponse can be easily accommodated if the calibrated weights include reweighting adjustment for nonresponse. We use an unconditional approach, where the variables and sample are random variables. The design can be informative.  相似文献   

7.
When auxiliary information is available at the design stage, samples may be selected by means of balanced sampling. The variance of the Horvitz-Thompson estimator is then reduced, since it is approximately given by that of the residuals of the variable of interest on the balancing variables. In this paper, a method for computing optimal inclusion probabilities for balanced sampling on given auxiliary variables is studied. We show that the method formerly suggested by Tillé and Favre (2005) enables the computation of inclusion probabilities that lead to a decrease in variance under some conditions on the set of balancing variables. A disadvantage is that the target optimal inclusion probabilities depend on the variable of interest. If the needed quantities are unknown at the design stage, we propose to use estimates instead (e.g., arising from a previous wave of the survey). A limited simulation study suggests that, under some conditions, our method performs better than the method of Tillé and Favre (2005).  相似文献   

8.
This paper considers the problem of estimating a nonlinear statistical model subject to stochastic linear constraints among unknown parameters. These constraints represent prior information which originates from a previous estimation of the same model using an alternative database. One feature of this specification allows for the disign matrix of stochastic linear restrictions to be estimated. The mixed regression technique and the maximum likelihood approach are used to derive the estimator for both the model coefficients and the unknown elements of this design matrix. The proposed estimator whose asymptotic properties are studied, contains as a special case the conventional mixed regression estimator based on a fixed design matrix. A new test of compatibility between prior and sample information is also introduced. Thesuggested estimator is tested empirically with both simulated and actual marketing data.  相似文献   

9.
This paper considers the problem of estimating a nonlinear statistical model subject to stochastic linear constraints among unknown parameters. These constraints represent prior information which originates from a previous estimation of the same model using an alternative database. One feature of this specification allows for the disign matrix of stochastic linear restrictions to be estimated. The mixed regression technique and the maximum likelihood approach are used to derive the estimator for both the model coefficients and the unknown elements of this design matrix. The proposed estimator whose asymptotic properties are studied, contains as a special case the conventional mixed regression estimator based on a fixed design matrix. A new test of compatibility between prior and sample information is also introduced. Thesuggested estimator is tested empirically with both simulated and actual marketing data.  相似文献   

10.
Suppose that the conditional density of a response variable given a vector of explanatory variables is parametrically modelled, and that data are collected by a two-phase sampling design. First, a simple random sample is drawn from the population. The stratum membership in a finite number of strata of the response and explanatory variables is recorded for each unit. Second, a subsample is drawn from the phase-one sample such that the selection probability is determined by the stratum membership. The response and explanatory variables are fully measured at this phase. We synthesize existing results on nonparametric likelihood estimation and present a streamlined approach for the computation and the large sample theory of profile likelihood in four different situations. The amount of information in terms of data and assumptions varies depending on whether the phase-one data are retained, the selection probabilities are known, and/or the stratum probabilities are known. We establish and illustrate numerically the order of efficiency among the maximum likelihood estimators, according to the amount of information utilized, in the four situations.  相似文献   

11.
Kernel density estimation has been used with great success with data that may be assumed to be generated from independent and identically distributed (iid) random variables. The methods and theoretical results for iid data, however, do not directly apply to data from stratified multistage samples. We present finite-sample and asymptotic properties of a modified density estimator introduced in Buskirk (Proceedings of the Survey Research Methods Section, American Statistical Association (1998), pp. 799–801) and Bellhouse and Stafford (Statist. Sin. 9 (1999) 407–424); this estimator incorporates both the sampling weights and the kernel weights. We present regularity conditions which lead the sample estimator to be consistent and asymptotically normal under various modes of inference used with sample survey data. We also introduce a superpopulation structure for model-based inference that allows the population model to reflect naturally occurring clustering. The estimator, and confidence bands derived from the sampling design, are illustrated using data from the US National Crime Victimization Survey and the US National Health and Nutrition Examination Survey.  相似文献   

12.
《统计学通讯:理论与方法》2012,41(16-17):3278-3300
Under complex survey sampling, in particular when selection probabilities depend on the response variable (informative sampling), the sample and population distributions are different, possibly resulting in selection bias. This article is concerned with this problem by fitting two statistical models, namely: the variance components model (a two-stage model) and the fixed effects model (a single-stage model) for one-way analysis of variance, under complex survey design, for example, two-stage sampling, stratification, and unequal probability of selection, etc. Classical theory underlying the use of the two-stage model involves simple random sampling for each of the two stages. In such cases the model in the sample, after sample selection, is the same as model for the population; before sample selection. When the selection probabilities are related to the values of the response variable, standard estimates of the population model parameters may be severely biased, leading possibly to false inference. The idea behind the approach is to extract the model holding for the sample data as a function of the model in the population and of the first order inclusion probabilities. And then fit the sample model, using analysis of variance, maximum likelihood, and pseudo maximum likelihood methods of estimation. The main feature of the proposed techniques is related to their behavior in terms of the informativeness parameter. We also show that the use of the population model that ignores the informative sampling design, yields biased model fitting.  相似文献   

13.
Data from large surveys are often supplemented with sampling weights that are designed to reflect unequal probabilities of response and selection inherent in complex survey sampling methods. We propose two methods for Bayesian estimation of parametric models in a setting where the survey data and the weights are available, but where information on how the weights were constructed is unavailable. The first approach is to simply replace the likelihood with the pseudo likelihood in the formulation of Bayes theorem. This is proven to lead to a consistent estimator but also leads to credible intervals that suffer from systematic undercoverage. Our second approach involves using the weights to generate a representative sample which is integrated into a Markov chain Monte Carlo (MCMC) or other simulation algorithms designed to estimate the parameters of the model. In the extensive simulation studies, the latter methodology is shown to achieve performance comparable to the standard frequentist solution of pseudo maximum likelihood, with the added advantage of being applicable to models that require inference via MCMC. The methodology is demonstrated further by fitting a mixture of gamma densities to a sample of Australian household income.  相似文献   

14.
A means for utilizing auxiliary information in surveys is to sample with inclusion probabilities proportional to given size values, to use a πps design, preferably with fixed sample size. A novel candidate in that context is Pareto πps. This sampling scheme was derived by limit considerations and it works with a degree of approximation for finite samples. Desired and factual inclusion probabilities do not agree exactly, which in turn leads to some estimator bias. The central topic in this paper is to derive conditions for the bias to be negligible.Practically useful information on small sample behavior of Pareto πps can, to the best of our understanding, be gained only by numerical studies. Earlier investigations to that end have been too limited to allow general conclusions, while this paper reports on findings from an extensive numerical study. The chief conclusion is that the estimator bias is negligible in almost all situations met in survey practice.  相似文献   

15.
Suppose that a finite population consists of N distinct units. Associated with the ith unit is a polychotomous response vector, d i , and a vector of auxiliary variable x i . The values x i ’s are known for the entire population but d i ’s are known only for the units selected in the sample. The problem is to estimate the finite population proportion vector P. One of the fundamental questions in finite population sampling is how to make use of the complete auxiliary information effectively at the estimation stage. In this article a predictive estimator is proposed which incorporates the auxiliary information at the estimation stage by invoking a superpopulation model. However, the use of such estimators is often criticized since the working superpopulation model may not be correct. To protect the predictive estimator from the possible model failure, a nonparametric regression model is considered in the superpopulation. The asymptotic properties of the proposed estimator are derived and also a bootstrap-based hybrid re-sampling method for estimating the variance of the proposed estimator is developed. Results of a simulation study are reported on the performances of the predictive estimator and its re-sampling-based variance estimator from the model-based viewpoint. Finally, a data survey related to the opinions of 686 individuals on the cause of addiction is used for an empirical study to investigate the performance of the nonparametric predictive estimator from the design-based viewpoint.  相似文献   

16.
The problem of determining minimum sample size for the estimation of a binomial parameter with prescribed margin of error and confidence level is considered. It is assumed that available auxiliary information allows to restrict the parameter space to some interval whose left boundary is above zero. A range-preserving estimator resulting from the conditional maximization of the likelihood function is considered. A method for exact computation of minimum sample size controlling for the relative error is proposed. Several tables of minimum sample sizes for typical situations are also presented. The range-preserving estimator achieves the same precision and confidence level as the unrestricted maximum likelihood estimator but with a smaller sample.  相似文献   

17.
The weighted least squares (WLS) estimator is often employed in linear regression using complex survey data to deal with the bias in ordinary least squares (OLS) arising from informative sampling. In this paper a 'quasi-Aitken WLS' (QWLS) estimator is proposed. QWLS modifies WLS in the same way that Cragg's quasi-Aitken estimator modifies OLS. It weights by the usual inverse sample inclusion probability weights multiplied by a parameterized function of covariates, where the parameters are chosen to minimize a variance criterion. The resulting estimator is consistent for the superpopulation regression coefficient under fairly mild conditions and has a smaller asymptotic variance than WLS.  相似文献   

18.
A two-phase approach for sampling with unequal inclusions probabilities and fixed sample size is presented. The expansion estimator using target inclusion probabilities is suggested for estimation of population parameters. As an alternative, the estimator for two-phase sampling can be used for estimation. Inclusion probabilities are shown to be asymptotically equivalent to the targeted inclusion probabilities. By means of simulation associated estimators are shown to work well with respect to bias and precision.  相似文献   

19.
We consider the problem of the effect of sample designs on discriminant analysis. The selection of the learning sample is assumed to depend on the population values of auxiliary variables. Under a superpopulation model with a multivariate normal distribution, unbiasedness and consistency are examined for the conventional estimators (derived under the assumptions of simple random sampling), maximum likelihood estimators, probability-weighted estimators and conditionally unbiased estimators of parameters. Four corresponding sampled linear discriminant functions are examined. The rates of misclassification of these four discriminant functions and the effect of sample design on these four rates of misclassification are discussed. The performances of these four discriminant functions are assessed in a simulation study.  相似文献   

20.
In this paper, we extend the focused information criterion (FIC) to copula models. Copulas are often used for applications where the joint tail behavior of the variables is of particular interest, and selecting a copula that captures this well is then essential. Traditional model selection methods such as the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) aim at finding the overall best‐fitting model, which is not necessarily the one best suited for the application at hand. The FIC, on the other hand, evaluates and ranks candidate models based on the precision of their point estimates of a context‐given focus parameter. This could be any quantity of particular interest, for example, the mean, a correlation, conditional probabilities, or measures of tail dependence. We derive FIC formulae for the maximum likelihood estimator, the two‐stage maximum likelihood estimator, and the so‐called pseudo‐maximum‐likelihood (PML) estimator combined with parametric margins. Furthermore, we confirm the validity of the AIC formula for the PML estimator combined with parametric margins. To study the numerical behavior of FIC, we have carried out a simulation study, and we have also analyzed a multivariate data set pertaining to abalones. The results from the study show that the FIC successfully ranks candidate models in terms of their performance, defined as how well they estimate the focus parameter. In terms of estimation precision, FIC clearly outperforms AIC, especially when the focus parameter relates to only a specific part of the model, such as the conditional upper‐tail probability.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号