首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The author compares aspects of voluntary and involuntary sample surveys in West Germany. "The German microcensus as a non-voluntary survey draws a random sample from the total population which includes persons that would also respond in a voluntary survey (respondents) and persons that would not respond (non-respondents). The population of a voluntary survey, however, includes only respondents. Hence, statistical inference from a voluntary sample survey is only valid for the total population, if the population of respondents does not differ from the total population. This null hypothesis must be rejected from the comparisons of data from the German microcensus of 1985, 1986 and 1987 and corresponding voluntary test sample surveys. The discrepancies are great in central demographic and socio-economic variables such as region of residence, community size, age, marital status, income and social security." (SUMMARY IN ENG)  相似文献   

2.
Summary. Protection against disclosure is important for statistical agencies releasing microdata files from sample surveys. Simple measures of disclosure risk can provide useful evidence to support decisions about release. We propose a new measure of disclosure risk: the probability that a unique match between a microdata record and a population unit is correct. We argue that this measure has at least two advantages. First, we suggest that it may be a more realistic measure of risk than two measures that are currently used with census data. Second, we show that consistent inference (in a specified sense) may be made about this measure from sample data without strong modelling assumptions. This is a surprising finding, in its contrast with the properties of the two 'similar' established measures. As a result, this measure has potentially useful applications to sample surveys. In addition to obtaining a simple consistent predictor of the measure, we propose a simple variance estimator and show that it is consistent. We also consider the extension of inference to allow for certain complex sampling schemes. We present a numerical study based on 1991 census data for about 450 000 enumerated individuals in one area of Great Britain. We show that the theoretical results on the properties of the point predictor of the measure of risk and its variance estimator hold to a good approximation for these data.  相似文献   

3.
4.
刘展等 《统计研究》2021,38(11):130-140
随着大数据与互联网技术的迅猛发展,网络调查的应用越来越广泛。本文提出网络调查样本的随机森林倾向得分模型推断方法,通过构建若干棵分类决策树组成随机森林,对网络调查样本单元的倾向得分进行估计,从而实现对总体的推断。模拟分析和实证研究结果表明:基于随机森林倾向得分模型的总体均值估计的相对偏差、方差与均方误差均比基于Logistic倾向得分模型的总体均值估计的相对偏差、方差与均方误差小,提出的方法估计效果更好。  相似文献   

5.
This paper is essentially an explanation of how the fundamental concepts of andkrity and conditioaality apply to the data of sample surveys that are carried out by the method of mterpeaetoting samples. The resultkg theory is iMustraled by an kterestiag example on deriving exact statements of conditional probability limits for a laear estimand with data from an agricultural sample survey. The paper also suggests that the study of modem statistical inference may profit from an appreciation of the Syadvada lope of ancient India.  相似文献   

6.
A difference-based variance estimator is proposed for nonparametric regression in complex surveys. By using a combined inference framework, the estimator is shown to be asymptotically normal and to converge to the true variance at a parametric rate. Simulation studies show that the proposed variance estimator works well for complex survey data and also reveals some finite sample properties of the estimator.  相似文献   

7.
ABSTRACT

Such is the grip of formal methods of statistical inference—that is, frequentist methods for generalizing from sample to population in enumerative studies—in the drawing of scientific inferences that the two are routinely deemed equivalent in the social, management, and biomedical sciences. This, despite the fact that legitimate employment of said methods is difficult to implement on practical grounds alone. But supposing the adoption of these procedures were simple does not get us far; crucially, methods of formal statistical inference are ill-suited to the analysis of much scientific data. Even findings from the claimed gold standard for examination by the latter, randomized controlled trials, can be problematic.

Scientific inference is a far broader concept than statistical inference. Its authority derives from the accumulation, over an extensive period of time, of both theoretical and empirical knowledge that has won the (provisional) acceptance of the scholarly community. A major focus of scientific inference can be viewed as the pursuit of significant sameness, meaning replicable and empirically generalizable results among phenomena. Regrettably, the obsession with users of statistical inference to report significant differences in data sets actively thwarts cumulative knowledge development.

The manifold problems surrounding the implementation and usefulness of formal methods of statistical inference in advancing science do not speak well of much teaching in methods/statistics classes. Serious reflection on statistics' role in producing viable knowledge is needed. Commendably, the American Statistical Association is committed to addressing this challenge, as further witnessed in this special online, open access issue of The American Statistician.  相似文献   

8.
对复杂样本进行推断通常有两种体系,一种是传统的基于随机化理论的统计推断,另一种是基于模型的统计推断。传统的抽样理论以随机化理论为基础,将总体取值视为固定,随机性仅体现在样本的选取上,对总体的推断依赖于抽样设计。该方法在大样本情况下具有稳健估计量,但在小样本、数据缺失等情况下失效。基于模型的抽样推断认为总体是超总体模型中抽取的一个随机样本,对总体的推断取决于模型的建立,但在不可忽略抽样设计下估计量是有偏估计。在对这两类推断方法分析的基础上,提出抽样设计辅助的模型推断,并指出该方法在复杂抽样中具有重要的应用价值。  相似文献   

9.
随着大数据和网络的不断发展,网络调查越来越广泛,大部分网络调查样本属于非概率样本,难以采用传统的抽样推断理论进行推断,如何解决网络调查样本的推断问题是大数据背景下网络调查发展的迫切需求。本文首次从建模的角度提出了解决该问题的基本思路:一是入样概率的建模推断,可以考虑构建基于机器学习与变量选择的倾向得分模型来估计入样概率推断总体;二是目标变量的建模推断,可以考虑直接对目标变量建立参数、非参数或半参数超总体模型进行估计;三是入样概率与目标变量的双重建模推断,可以考虑进行倾向得分模型与超总体模型的加权估计与混合推断。最后,以基于广义Boosted模型的入样概率建模推断为例演示了具体解决方法。  相似文献   

10.
如何解决网络访问固定样本调查的统计推断问题,是大数据背景下网络调查面临的严重挑战。针对此问题,提出将网络访问固定样本的调查样本与概率样本结合,利用倾向得分逆加权和加权组调整构造伪权数来估计目标总体,进一步采用基于有放回概率抽样的Vwr方法、基于广义回归估计的Vgreg方法与Jackknife方法来估计方差,并比较不同方法估计的效果。研究表明:无论概率样本的样本量较大还是较小,本研究所提出的总体均值估计方法效果较好,并且在方差估计中Jackknife方法的估计效果最好。  相似文献   

11.
Data from large surveys are often supplemented with sampling weights that are designed to reflect unequal probabilities of response and selection inherent in complex survey sampling methods. We propose two methods for Bayesian estimation of parametric models in a setting where the survey data and the weights are available, but where information on how the weights were constructed is unavailable. The first approach is to simply replace the likelihood with the pseudo likelihood in the formulation of Bayes theorem. This is proven to lead to a consistent estimator but also leads to credible intervals that suffer from systematic undercoverage. Our second approach involves using the weights to generate a representative sample which is integrated into a Markov chain Monte Carlo (MCMC) or other simulation algorithms designed to estimate the parameters of the model. In the extensive simulation studies, the latter methodology is shown to achieve performance comparable to the standard frequentist solution of pseudo maximum likelihood, with the added advantage of being applicable to models that require inference via MCMC. The methodology is demonstrated further by fitting a mixture of gamma densities to a sample of Australian household income.  相似文献   

12.
This paper reviews methods for determining sample sizes when inference is required on the difference between two proportions from independent samples. The sample sizes for Fisher's exact test are generally regarded as the desired values, but they are very tedious to calculate, so many approximations have been introduced. These are described and their relative merits assessed.  相似文献   

13.
In non‐randomized biomedical studies using the proportional hazards model, the data often constitute an unrepresentative sample of the underlying target population, which results in biased regression coefficients. The bias can be avoided by weighting included subjects by the inverse of their respective selection probabilities, as proposed by Horvitz & Thompson (1952) and extended to the proportional hazards setting for use in surveys by Binder (1992) and Lin (2000). In practice, the weights are often estimated and must be treated as such in order for the resulting inference to be accurate. The authors propose a two‐stage weighted proportional hazards model in which, at the first stage, weights are estimated through a logistic regression model fitted to a representative sample from the target population. At the second stage, a weighted Cox model is fitted to the biased sample. The authors propose estimators for the regression parameter and cumulative baseline hazard. They derive the asymptotic properties of the parameter estimators, accounting for the difference in the variance introduced by the randomness of the weights. They evaluate the accuracy of the asymptotic approximations in finite samples through simulation. They illustrate their approach in an analysis of renal transplant patients using data obtained from the Scientific Registry of Transplant Recipients  相似文献   

14.
The logistic regression model has been widely used in the social and natural sciences and results from studies using this model can have significant policy impacts. Thus, confidence in the reliability of inferences drawn from these models is essential. The robustness of such inferences is dependent on sample size. The purpose of this article is to examine the impact of alternative data sets on the mean estimated bias and efficiency of parameter estimation and inference for the logistic regression model with observational data. A number of simulations are conducted examining the impact of sample size, nonlinear predictors, and multicollinearity on substantive inferences (e.g. odds ratios, marginal effects) when using logistic regression models. Findings suggest that small sample size can negatively affect the quality of parameter estimates and inferences in the presence of rare events, multicollinearity, and nonlinear predictor functions, but marginal effects estimates are relatively more robust to sample size.  相似文献   

15.
The paper provides theoretical and experimental results concerning the distribution of the sample lead. They apply to sample lead of one party over another in political opinion polls, one product over another in market research surveys, share of market of one company over another in industrial studies, one class over another in social investigations, one programme over another in TV viewing surveys, etc. The theoretical results include the derivation of the moment and cumulant generating functions. it is shown formally that the sampling distribution approaches normality as sample size increases. Experimental results provide guidelines for the choice of sample size to ensure adequacy of the normal approximation. Calculations based on one of the Central Limit Theorems support the results of the simulation.  相似文献   

16.
A new algorithm is presented for exact simulation from the conditional distribution of the genealogical history of a sample, given the composition of the sample, for population genetics models with general diploid selection. The method applies to the usual diffusion approximation of evolution at a single locus, in a randomly mating population of constant size, for mutation models in which the distribution of the type of a mutant does not depend on the type of the progenitor allele; this includes any model with only two alleles. The new method is applied to ancestral inference for the two‐allele case, both with genic selection and heterozygote advantage and disadvantage, where one of the alleles is assumed to have resulted from a unique mutation event. The paper describes how the method could be used for inference when data are also available at neutral markers linked to the locus under selection. It also informally describes and constructs the non‐neutral Fleming–Viot measure‐valued diffusion.  相似文献   

17.
Asymptotic inference results for the coefficients of variation of normal populations are presented in this article. This includes formulas for test statistics, power, confidence intervals, and simultaneous inference. The results are based on the asymptotic normality of the sample coefficient of variation as derived by Miller (1991). An example which compares the homogeneity of bone test samples produced from two different methods is presented.  相似文献   

18.
Summary.  The first British National Survey of Sexual Attitudes and Lifestyles (NATSAL) was conducted in 1990–1991 and the second in 1999–2001. When surveys are repeated, the changes in population parameters are of interest and are generally estimated from a comparison of the data between surveys. However, since all surveys may be subject to bias, such comparisons may partly reflect a change in bias. Typically limited external data are available to estimate the change in bias directly. However, one approach, which is often possible, is to define in each survey a sample of participants who are eligible for both surveys, and then to compare the reporting of selected events that occurred before the earlier survey time point. A difference in reporting suggests a change in overall survey bias between time points, although other explanations are possible. In NATSAL, changes in bias are likely to be similar for groups of sexual experiences. The grouping of experiences allows the information that is derived from the selected events to be incorporated into inference concerning population changes in other sexual experiences. We use generalized estimating equations, which incorporate weighting for differential probabilities of sampling and non-response in a relatively straightforward manner. The results, combined with estimates of the change in reporting, are used to derive minimum established population changes, based on NATSAL data. For some key population parameters, the change in reporting is seen to be consistent with a change in bias alone. Recommendations are made for the design of future surveys.  相似文献   

19.
This paper surveys the fundamental principles of subjective Bayesian inference in econometrics and the implementation of those principles using posterior simulation methods. The emphasis is on the combination of models and the development of predictive distributions. Moving beyond conditioning on a fixed number of completely specified models, the paper introduces subjective Bayesian tools for formal comparison of these models with as yet incompletely specified models. The paper then shows how posterior simulators can facilitate communication between investigators (for example, econometricians) on the one hand and remote clients (for example, decision makers) on the other, enabling clients to vary the prior distributions and functions of interest employed by investigators. A theme of the paper is the practicality of subjective Bayesian methods. To this end, the paper describes publicly available software for Bayesian inference, model development, and communication and provides illustrations using two simple econometric models.  相似文献   

20.
The occurrence of nonresponse is very much plebeian in surveys, which troubles the analysis, and hence, an inappropriate inference is left out. To counterbalance the sour effects of the incompleteness, fresh imputation techniques have been proposed with the aid of multi-auxiliary variates for the estimation of population mean on successive waves. Properties of the proposed estimators have been elaborated, and they have been compared with the work of Priyanka et al. (2015). Detailed simulation study is carried out to substantiate the empirical and theoretical results. Several possible cases have been addressed in which nonresponse can occur.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号