首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 437 毫秒
1.
In this paper we propose a new nonparametric estimator of the conditional distribution function under a semiparametric censorship model. We establish an asymptotic representation of the estimator as a sum of iid random variables, balanced by some kernel weights. This representation is used for obtaining large sample results such as the rate of uniform convergence of the estimator, or its limit distributional law. We prove that the new estimator outperforms the conditional Kaplan–Meier estimator for censored data, in the sense that it exhibits lower asymptotic variance. Illustration through real data analysis is provided.  相似文献   

2.
In this article, we extend smoothing splines to model the regression mean structure when data are sampled through a complex survey. Smoothing splines are evaluated both with and without sample weights, and are compared with local linear estimator. Simulation studies find that nonparametric estimators perform better when sample weights are incorporated, rather than being treated as if iid. They also find that smoothing splines perform better than local linear estimator through completely data-driven bandwidth selection methods.  相似文献   

3.
金勇进  刘展 《统计研究》2016,33(3):11-17
利用大数据进行抽样,很多情况下抽样框的构造比较困难,使得抽取的样本属于非概率样本,难以将传统的抽样推断理论应用到非概率样本中,如何解决非概率抽样的统计推断问题,是大数据背景下抽样调查面临的严重挑战。本文提出了解决非概率抽样统计推断问题的基本思路:一是抽样方法,可以考虑基于样本匹配的样本选择、链接跟踪抽样方法等,使得到的非概率样本近似于概率样本,从而可采用概率样本的统计推断理论;二是权数的构造与调整,可以考虑基于伪设计、模型和倾向得分等方法得到类似于概率样本的基础权数;三是估计,可以考虑基于伪设计、模型和贝叶斯的混合概率估计。最后,以基于样本匹配的样本选择为例探讨了具体解决方法。  相似文献   

4.
Summary. Protection against disclosure is important for statistical agencies releasing microdata files from sample surveys. Simple measures of disclosure risk can provide useful evidence to support decisions about release. We propose a new measure of disclosure risk: the probability that a unique match between a microdata record and a population unit is correct. We argue that this measure has at least two advantages. First, we suggest that it may be a more realistic measure of risk than two measures that are currently used with census data. Second, we show that consistent inference (in a specified sense) may be made about this measure from sample data without strong modelling assumptions. This is a surprising finding, in its contrast with the properties of the two 'similar' established measures. As a result, this measure has potentially useful applications to sample surveys. In addition to obtaining a simple consistent predictor of the measure, we propose a simple variance estimator and show that it is consistent. We also consider the extension of inference to allow for certain complex sampling schemes. We present a numerical study based on 1991 census data for about 450 000 enumerated individuals in one area of Great Britain. We show that the theoretical results on the properties of the point predictor of the measure of risk and its variance estimator hold to a good approximation for these data.  相似文献   

5.
Data from large surveys are often supplemented with sampling weights that are designed to reflect unequal probabilities of response and selection inherent in complex survey sampling methods. We propose two methods for Bayesian estimation of parametric models in a setting where the survey data and the weights are available, but where information on how the weights were constructed is unavailable. The first approach is to simply replace the likelihood with the pseudo likelihood in the formulation of Bayes theorem. This is proven to lead to a consistent estimator but also leads to credible intervals that suffer from systematic undercoverage. Our second approach involves using the weights to generate a representative sample which is integrated into a Markov chain Monte Carlo (MCMC) or other simulation algorithms designed to estimate the parameters of the model. In the extensive simulation studies, the latter methodology is shown to achieve performance comparable to the standard frequentist solution of pseudo maximum likelihood, with the added advantage of being applicable to models that require inference via MCMC. The methodology is demonstrated further by fitting a mixture of gamma densities to a sample of Australian household income.  相似文献   

6.
With a growing interest in using non-representative samples to train prediction models for numerous outcomes it is necessary to account for the sampling design that gives rise to the data in order to assess the generalized predictive utility of a proposed prediction rule. After learning a prediction rule based on a non-uniform sample, it is of interest to estimate the rule's error rate when applied to unobserved members of the population. Efron (1986) proposed a general class of covariance penalty inflated prediction error estimators that assume the available training data are representative of the target population for which the prediction rule is to be applied. We extend Efron's estimator to the complex sample context by incorporating Horvitz–Thompson sampling weights and show that it is consistent for the true generalization error rate when applied to the underlying superpopulation. The resulting Horvitz–Thompson–Efron estimator is equivalent to dAIC, a recent extension of Akaike's information criteria to survey sampling data, but is more widely applicable. The proposed methodology is assessed with simulations and is applied to models predicting renal function obtained from the large-scale National Health and Nutrition Examination Study survey. The Canadian Journal of Statistics 48: 204–221; 2020 © 2019 Statistical Society of Canada  相似文献   

7.
Outliers that commonly occur in business sample surveys can have large impacts on domain estimates. The authors consider an outlier‐robust design and smooth estimation approach, which can be related to the so‐called “Surprise stratum” technique [Kish, “Survey Sampling,” Wiley, New York (1965)]. The sampling design utilizes a threshold sample consisting of previously observed outliers that are selected with probability one, together with stratified simple random sampling from the rest of the population. The domain predictor is an extension of the Winsorization‐based estimator proposed by Rivest and Hidiroglou [Rivest and Hidiroglou, “Outlier Treatment for Disaggregated Estimates,” in “Proceedings of the Section on Survey Research Methods,” American Statistical Association (2004), pp. 4248–4256], and is similar to the estimator for skewed populations suggested by Fuller [Fuller, Statistica Sinica 1991;1:137–158]. It makes use of a domain Winsorized sample mean plus a domain‐specific adjustment of the estimated overall mean of the excess values on top of that. The methods are studied in theory from a design‐based perspective and by simulations based on the Norwegian Research and Development Survey data. Guidelines for choosing the threshold values are provided. The Canadian Journal of Statistics 39: 147–164; 2011 © 2010 Statistical Society of Canada  相似文献   

8.
When sampling from a continuous population (or distribution), we often want a rather small sample due to some cost attached to processing the sample or to collecting information in the field. Moreover, a probability sample that allows for design‐based statistical inference is often desired. Given these requirements, we want to reduce the sampling variance of the Horvitz–Thompson estimator as much as possible. To achieve this, we introduce different approaches to using the local pivotal method for selecting well‐spread samples from multidimensional continuous populations. The results of a simulation study clearly indicate that we succeed in selecting spatially balanced samples and improve the efficiency of the Horvitz–Thompson estimator.  相似文献   

9.
Sarjinder Singh 《Statistics》2013,47(3):566-574
In this note, a dual problem to the calibration of design weights of the Deville and Särndal [Calibration estimators in survey sampling, J. Amer. Statist. Assoc. 87 (1992), pp. 376–382] method has been considered. We conclude that the chi-squared distance between the design weights and the calibrated weights equals the square of the standardized Z-score obtained by the difference between the known population total of the auxiliary variable and its corresponding Horvitz and Thompson [A generalization of sampling without replacement from a finite universe, J. Amer. Statist. Assoc. 47 (1952), pp. 663–685] estimator divided by the sample standard deviation of the auxiliary variable to obtain the linear regression estimator in survey sampling.  相似文献   

10.
Summary.  Multilevel modelling is sometimes used for data from complex surveys involving multistage sampling, unequal sampling probabilities and stratification. We consider generalized linear mixed models and particularly the case of dichotomous responses. A pseudolikelihood approach for accommodating inverse probability weights in multilevel models with an arbitrary number of levels is implemented by using adaptive quadrature. A sandwich estimator is used to obtain standard errors that account for stratification and clustering. When level 1 weights are used that vary between elementary units in clusters, the scaling of the weights becomes important. We point out that not only variance components but also regression coefficients can be severely biased when the response is dichotomous. The pseudolikelihood methodology is applied to complex survey data on reading proficiency from the American sample of the 'Program for international student assessment' 2000 study, using the Stata program gllamm which can estimate a wide range of multilevel and latent variable models. Performance of pseudo-maximum-likelihood with different methods for handling level 1 weights is investigated in a Monte Carlo experiment. Pseudo-maximum-likelihood estimators of (conditional) regression coefficients perform well for large cluster sizes but are biased for small cluster sizes. In contrast, estimators of marginal effects perform well in both situations. We conclude that caution must be exercised in pseudo-maximum-likelihood estimation for small cluster sizes when level 1 weights are used.  相似文献   

11.
We present a multi-level rotation sampling design which includes most of the existing rotation designs as special cases. When an estimator is defined under this sampling design, its variance and bias remain the same over survey months, but it is not so under other existing rotation designs. Using the properties of this multi-level rotation design, we derive the mean squared error (MSE) of the generalized composite estimator (GCE), incorporating the two types of correlations arising from rotating sample units. We show that the MSEs of other existing composite estimators currently used can be expressed as special cases of the GCE. Furthermore, since the coefficients of the GCE are unknown and difficult to determine, we present the minimum risk window estimator (MRWE) as an alternative estimator. This MRWE has the smallest MSE under this rotation design and yet, it is easy to calculate. The MRWE is unbiased for monthly and yearly changes and preserves the internal consistency in total. Our numerical study shows that the MRWE is as efficient as GCE and more efficient than the existing composite estimators and does not suffer from the drift problem [Fuller W.A., Rao J.N.K., 2001. A regression composite estimator with application to the Canadian Labour Force Survey. Surv. Methodol. 27 (2001) 45–51] unlike the regression composite estimators.  相似文献   

12.
如何解决网络访问固定样本调查的统计推断问题,是大数据背景下网络调查面临的严重挑战。针对此问题,提出将网络访问固定样本的调查样本与概率样本结合,利用倾向得分逆加权和加权组调整构造伪权数来估计目标总体,进一步采用基于有放回概率抽样的Vwr方法、基于广义回归估计的Vgreg方法与Jackknife方法来估计方差,并比较不同方法估计的效果。研究表明:无论概率样本的样本量较大还是较小,本研究所提出的总体均值估计方法效果较好,并且在方差估计中Jackknife方法的估计效果最好。  相似文献   

13.
Ranked set sampling (RSS) is a cost-efficient technique for data collection when the units in a population can be easily judgment ranked by any cheap method other than actual measurements. Using auxiliary information in developing statistical procedures for inference about different population characteristics is a well-known approach. In this work, we deal with quantile estimation from a population with known mean when data are obtained according to RSS scheme. Through the simple device of mean-correction (subtract off the sample mean and add on the known population mean), a modified estimator is constructed from the standard quantile estimator. Asymptotic normality of the new estimator and its asymptotic efficiency relative to the original estimator are derived. Simulation results for several underlying distributions show that the proposed estimator is more efficient than the traditional one.  相似文献   

14.
ABSTRACT. In this paper we consider logspline density estimation for data that may be left-truncated or right-censored. For randomly left-truncated and right-censored data the product-limit estimator is known to be a consistent estimator of the survivor function, having a faster rate of convergence than many density estimators. The product-limit estimator and B-splines are used to construct the logspline density estimate for possibly censored or truncated data. Rates of convergence are established when the log-density function is assumed to be in a Besov space. An algorithm involving a procedure similar to maximum likelihood, stepwise knot addition, and stepwise knot deletion is proposed for the estimation of the density function based upon sample data. Numerical examples are used to show the finite-sample performance of inference based on the logspline density estimation.  相似文献   

15.
The theoretical literature on quantile and distribution function estimation in infinite populations is very rich, and invariance plays an important role in these studies. This is not the case for the commonly occurring problem of estimation of quantiles in finite populations. The latter is more complicated and interesting because an optimal strategy consists not only of an estimator, but also of a sampling design, and the estimator may depend on the design and on the labels of sampled individuals, whereas in iid sampling, design issues and labels do not exist.We study the estimation of finite population quantiles, with emphasis on estimators that are invariant under the group of monotone transformations of the data, and suitable invariant loss functions. Invariance under the finite group of permutation of the sample is also considered. We discuss nonrandomized and randomized estimators, best invariant and minimax estimators, and sampling strategies relative to different classes. Invariant loss functions and estimators in finite population sampling have a nonparametric flavor, and various natural combinatorial questions and tools arise as a result.  相似文献   

16.
The weighted least squares (WLS) estimator is often employed in linear regression using complex survey data to deal with the bias in ordinary least squares (OLS) arising from informative sampling. In this paper a 'quasi-Aitken WLS' (QWLS) estimator is proposed. QWLS modifies WLS in the same way that Cragg's quasi-Aitken estimator modifies OLS. It weights by the usual inverse sample inclusion probability weights multiplied by a parameterized function of covariates, where the parameters are chosen to minimize a variance criterion. The resulting estimator is consistent for the superpopulation regression coefficient under fairly mild conditions and has a smaller asymptotic variance than WLS.  相似文献   

17.
金勇进  张喆 《统计研究》2014,31(9):79-84
用样本数据推断总体,权数的作用十分重要。使用权数,不仅能将样本还原到总体,还能调整样本结构,使其与总体结构相一致,因此正确的使用权数是我们进行统计推断的基础。本文系统阐述了抽样调查分析中权数的获取过程,以及后期对初始权数调整过程。由于权数是把双刃剑,在提高精度的同时,有可能提高估计量的误差,本文提出了对权数进行评估的方法,研讨如何对权数进行控制,最后根据我国综合社会调查项目(CGSS)的数据进行实证分析,按照所给方法不仅能提高估计精度,而且能够降低抽样推断中的权效应。  相似文献   

18.
This article analyses diffusion-type processes from a new point-of-view. Consider two statistical hypotheses on a diffusion process. We do not use a classical test to reject or accept one hypothesis using the Neyman–Pearson procedure and do not involve Bayesian approach. As an alternative, we propose using a likelihood paradigm to characterizing the statistical evidence in support of these hypotheses. The method is based on evidential inference introduced and described by Royall [Royall R. Statistical evidence: a likelihood paradigm. London: Chapman and Hall; 1997]. In this paper, we extend the theory of Royall to the case when data are observations from a diffusion-type process instead of iid observations. The empirical distribution of likelihood ratio is used to formulate the probability of strong, misleading and weak evidences. Since the strength of evidence can be affected by the sampling characteristics, we present a simulation study that demonstrates these effects. Also we try to control misleading evidence and reduce them by adjusting these characteristics. As an illustration, we apply the method to the Microsoft stock prices.  相似文献   

19.
We give a formal definition of a representative sample, but roughly speaking, it is a scaled‐down version of the population, capturing its characteristics. New methods for selecting representative probability samples in the presence of auxiliary variables are introduced. Representative samples are needed for multipurpose surveys, when several target variables are of interest. Such samples also enable estimation of parameters in subspaces and improved estimation of target variable distributions. We describe how two recently proposed sampling designs can be used to produce representative samples. Both designs use distance between population units when producing a sample. We propose a distance function that can calculate distances between units in general auxiliary spaces. We also propose a variance estimator for the commonly used Horvitz–Thompson estimator. Real data as well as illustrative examples show that representative samples are obtained and that the variance of the Horvitz–Thompson estimator is reduced compared with simple random sampling.  相似文献   

20.
Calibration on the available auxiliary variables is widely used to increase the precision of the estimates of parameters. Singh and Sedory [Two-step calibration of design weights in survey sampling. Commun Stat Theory Methods. 2016;45(12):3510–3523.] considered the problem of calibration of design weights under two-step for single auxiliary variable. For a given sample, design weights and calibrated weights are set proportional to each other, in the first step. While, in the second step, the value of proportionality constant is determined on the basis of objectives of individual investigator/user for, for example, to get minimum mean squared error or reduction of bias. In this paper, we have suggested to use two auxiliary variables for two-step calibration of the design weights and compared the results with single auxiliary variable for different sample sizes based on simulated and real-life data set. The simulated and real-life application results show that two-auxiliary variables based two-step calibration estimator outperforms the estimator under single auxiliary variable in terms of minimum mean squared error.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号