期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

大数据背景下网络调查样本的建模推断问题研究——以广义Boosted模型的倾向得分推断为例

刘展潘莹丽《统计研究》2019,36(9):93

随着大数据和网络的不断发展,网络调查越来越广泛,大部分网络调查样本属于非概率样本,难以采用传统的抽样推断理论进行推断,如何解决网络调查样本的推断问题是大数据背景下网络调查发展的迫切需求。本文首次从建模的角度提出了解决该问题的基本思路：一是入样概率的建模推断,可以考虑构建基于机器学习与变量选择的倾向得分模型来估计入样概率推断总体;二是目标变量的建模推断,可以考虑直接对目标变量建立参数、非参数或半参数超总体模型进行估计;三是入样概率与目标变量的双重建模推断,可以考虑进行倾向得分模型与超总体模型的加权估计与混合推断。最后,以基于广义Boosted模型的入样概率建模推断为例演示了具体解决方法。相似文献

2.

网络访问固定样本调查的统计推断研究

刘展金勇进《统计与信息论坛》2017,(2):3-10

如何解决网络访问固定样本调查的统计推断问题,是大数据背景下网络调查面临的严重挑战。针对此问题,提出将网络访问固定样本的调查样本与概率样本结合,利用倾向得分逆加权和加权组调整构造伪权数来估计目标总体,进一步采用基于有放回概率抽样的Vwr方法、基于广义回归估计的Vgreg方法与Jackknife方法来估计方差,并比较不同方法估计的效果。研究表明:无论概率样本的样本量较大还是较小,本研究所提出的总体均值估计方法效果较好,并且在方差估计中Jackknife方法的估计效果最好。相似文献

3.

权数对基于模型推断的影响分析

《统计与信息论坛》2022,(3):3-13

利用抽样调查数据对总体参数进行推断通常分为两种途径：一种是基于设计的推断体系；另一种是基于模型的推断体系。基于设计的推断以随机化理论为基础，推断依赖于抽样设计，在大样本下估计量具有无偏性和一致性，但在样本量较小或存在非抽样误差等情况下效率较低。基于模型的推断认为有限总体是一个来自无限超总体的随机样本，推断依赖于模型假设，构建超总体模型具有很大的灵活性，有利于充分利用总体辅助信息并提高估计精度，但在模型假定有误或样本的入样过程不具有无信息性时存在估计误差。如何将两种推断途径相结合，在体现样本对总体代表性的同时，保证估计效率和估计量的优良性质，尚待研究。权数在基于设计的推断中起着核心作用，能够反映抽样设计对样本的影响，实现样本对总体的还原。将权数引入基于模型的推断，可以使基于模型推断的结果具有总体代表性，能更好地发挥两种推断体系的组合优势，并削弱模型假定对推断效果的影响。据此，从权数对于模型推断的影响入手，针对因果推断问题，提出将权数同时引入倾向得分模型和预测模型的建模过程，来构造双稳健估计的方法，并通过模拟研究加以验证。最终结果表明，根据文章所提出的方法进行处理效应的估计，能够充分发挥权数的作用，得到更准确、更稳健的估计结果。实证部分采用2017年CGSS调查数据进行分析，进一步说明在基于调查数据进行模型推断时应充分考虑抽样设计的影响，为科研人员进行因果推断以及其他基于调查数据开展的研究提供参考。相似文献

4.

复杂抽样推断方法体系的比较研究

金勇进贺本岚《统计与信息论坛》2011,26(10):3-8

对复杂样本进行推断通常有两种体系,一种是传统的基于随机化理论的统计推断,另一种是基于模型的统计推断。传统的抽样理论以随机化理论为基础,将总体取值视为固定,随机性仅体现在样本的选取上,对总体的推断依赖于抽样设计。该方法在大样本情况下具有稳健估计量,但在小样本、数据缺失等情况下失效。基于模型的抽样推断认为总体是超总体模型中抽取的一个随机样本,对总体的推断取决于模型的建立,但在不可忽略抽样设计下估计量是有偏估计。在对这两类推断方法分析的基础上,提出抽样设计辅助的模型推断,并指出该方法在复杂抽样中具有重要的应用价值。相似文献

5.

基于倾向得分匹配与加权调整的非概率抽样统计推断方法研究

刘展金勇进《统计与决策》2016,(21):4-8

文章针对非概率抽样统计推断问题,提出了一种解决方法:首先采用倾向得分匹配选择样本,然后采用倾向得分逆加权、加权组调整和事后分层调整三种方法对匹配样本进行加权调整来估计目标总体,并比较不同方法估计的效果.蒙特卡罗模拟与实证研究表明:当网络访问固定样本大小与目标样本大小的比率小于3对,三种加权方法估计的效果均比未加权时匹配样本的估计效果好;当网络访问固定样本大小与目标样本大小的比率不小于3时,倾向得分事后分层调整与未加权的匹配样本估计效果较好. 相似文献

6.

不均等选择概率下的加权调整研究

罗薇《统计与信息论坛》2017,(4):16-21

从不均等选择概率的角度,提出两类常见的权数调整类型及其调整方法:一是规模调整,使得样本单元权数之和等于总体规模;二是结构调整,使得样本结构和总体结构一致,并构造出加权调整的设计效应模型,应用于复杂样本设计。案例分析显示,加权调整往往导致设计效应变大,带来负的效应,但校准调整能降低设计效应,提高估计精度。相似文献

7.

基于平衡样本的最优模型辅助抽样策略

巩红禹《统计与决策》2016,(7):9-12

模型辅助方法的思想是基于抽样设计借助于超总体模型获得对总体参数的有效推断.满足辅助变量的HT估计等于总体总量真值的样本被称为平衡样本.对于平衡样本,如果超总体模型的异方差性可以通过辅助变量解释,由此得出最优抽样策略:平衡抽样设计与HT估计结合是最优策略,包含概率正比于模型残差的标准差. 相似文献

8.

抽样调查中的权数问题研究

金勇进张喆《统计研究》2014,31(9):79-84

用样本数据推断总体,权数的作用十分重要。使用权数,不仅能将样本还原到总体,还能调整样本结构,使其与总体结构相一致,因此正确的使用权数是我们进行统计推断的基础。本文系统阐述了抽样调查分析中权数的获取过程,以及后期对初始权数调整过程。由于权数是把双刃剑,在提高精度的同时,有可能提高估计量的误差,本文提出了对权数进行评估的方法,研讨如何对权数进行控制,最后根据我国综合社会调查项目（CGSS）的数据进行实证分析,按照所给方法不仅能提高估计精度,而且能够降低抽样推断中的权效应。相似文献

9.

基于排序集抽样方法的调查费用分析

乔松珊张建军《统计与决策》2013,(10):37-39

排序集抽样是利用辅助信息收集数据的一种有效方法,基于该抽样方法进行统计推断越来越受到人们的重视。然而,已有的研究结果仅考虑统计推断的效率而忽视了调查费用,鉴于此,文章考虑估计精度和调查费用两个方面,基于排序集样本建立了总体均值的估计量,证明了该估计量在给定的估计的精度下,降低了调查费用,并通过实例进一步说明了该抽样方案的优良性。相似文献

10.

基于倾向性得分匹配法的平均处理效应的自助法推断

《统计与信息论坛》2019,(8):12-19

倾向性得分匹配法是估计平均处理效应的常见方法,但是经典的自助法不能直接用于固定匹配个数时平均处理效应的匹配法估计量的统计推断。把每个个体的被匹配次数视为观测值,这解决了重抽样样本中个体被匹配数不是原样本的一致估计问题,基于此提出了两种解决倾向性得分匹配估计的自助法推断方法,一是将基于欧氏距离匹配法的加权自助法推广至倾向性得分匹配法,二是进一步提出了比前者更简单的直接应用经典自助法的方法。由此提出的两种自助法可以正确估计倾向性得分匹配法的平均处理效应的方差及置信区间,同时更容易实现倾向性得分匹配法估计结果的渐正方差公式。数值模拟部分显示两种自助法随着样本量的增加而与样本误差平方和及Abadie和Imbens的渐近结果越来越接近。最后,将此方法用于2016年中国综合社会调查数据,分别得到了性别、婚姻状况、健康状况等对居民收入影响的平均处理效应。相似文献

11.

Regression analysis and ratio analysis for domains: A randomization-theory approach

Eva Elvers Carl Erik SRNDAL Jan H. Wretman Gran RNBERG 《Revue canadienne de statistique》1985,13(3):185-199

In most surveys, inference for domains poses a difficult problem because of data shortage. This paper presents a probability sampling theory approach to some common types of statistical analysis for domains of a surveyed population. Simple and multiple regression analysis, and analysis of ratios are considered. Two new methods are constructed and explored which can improve substantially over the common method based on sample-weighted sums of squares and products. These new methods use auxiliary variables whose importance depends on the extent to which they succeed in explaining certain patterns in the regression residuals. The theoretical conclusions are supported by empirical results from Monte Carlo experiments. 相似文献

12.

Bayesian weighted inference from surveys

David Gunawan Anastasios Panagiotelis William Griffiths Duangkamon Chotikapanich 《Australian & New Zealand Journal of Statistics》2020,62(1):71-94

Data from large surveys are often supplemented with sampling weights that are designed to reflect unequal probabilities of response and selection inherent in complex survey sampling methods. We propose two methods for Bayesian estimation of parametric models in a setting where the survey data and the weights are available, but where information on how the weights were constructed is unavailable. The first approach is to simply replace the likelihood with the pseudo likelihood in the formulation of Bayes theorem. This is proven to lead to a consistent estimator but also leads to credible intervals that suffer from systematic undercoverage. Our second approach involves using the weights to generate a representative sample which is integrated into a Markov chain Monte Carlo (MCMC) or other simulation algorithms designed to estimate the parameters of the model. In the extensive simulation studies, the latter methodology is shown to achieve performance comparable to the standard frequentist solution of pseudo maximum likelihood, with the added advantage of being applicable to models that require inference via MCMC. The methodology is demonstrated further by fitting a mixture of gamma densities to a sample of Australian household income. 相似文献

13.

样本追加——一个抽样技术难题的探析

艾小青金勇进《统计教育》2008,(11):24-26

样本追加是抽样调查的现实需求,也是一个抽样技术难题。文章从概率抽样的内涵出发定义相关基本概念,揭示样本追加的方法以及相应的估计方法,从一般意义上探析了如何破解样本追加这个技术难题。相似文献

14.

Fitting Variance Components Model and Fixed Effects Model for One-Way Analysis of Variance to Complex Survey Data

《统计学通讯:理论与方法》2012,41(16-17):3278-3300

Under complex survey sampling, in particular when selection probabilities depend on the response variable (informative sampling), the sample and population distributions are different, possibly resulting in selection bias. This article is concerned with this problem by fitting two statistical models, namely: the variance components model (a two-stage model) and the fixed effects model (a single-stage model) for one-way analysis of variance, under complex survey design, for example, two-stage sampling, stratification, and unequal probability of selection, etc. Classical theory underlying the use of the two-stage model involves simple random sampling for each of the two stages. In such cases the model in the sample, after sample selection, is the same as model for the population; before sample selection. When the selection probabilities are related to the values of the response variable, standard estimates of the population model parameters may be severely biased, leading possibly to false inference. The idea behind the approach is to extract the model holding for the sample data as a function of the model in the population and of the first order inclusion probabilities. And then fit the sample model, using analysis of variance, maximum likelihood, and pseudo maximum likelihood methods of estimation. The main feature of the proposed techniques is related to their behavior in terms of the informativeness parameter. We also show that the use of the population model that ignores the informative sampling design, yields biased model fitting. 相似文献

15.

The Limited Role of Formal Statistical Inference in Scientific Inference

Raymond Hubbard Brian D. Haig Rahul A. Parsa 《The American statistician》2019,73(1):91-98

ABSTRACT

Such is the grip of formal methods of statistical inference—that is, frequentist methods for generalizing from sample to population in enumerative studies—in the drawing of scientific inferences that the two are routinely deemed equivalent in the social, management, and biomedical sciences. This, despite the fact that legitimate employment of said methods is difficult to implement on practical grounds alone. But supposing the adoption of these procedures were simple does not get us far; crucially, methods of formal statistical inference are ill-suited to the analysis of much scientific data. Even findings from the claimed gold standard for examination by the latter, randomized controlled trials, can be problematic.

Scientific inference is a far broader concept than statistical inference. Its authority derives from the accumulation, over an extensive period of time, of both theoretical and empirical knowledge that has won the (provisional) acceptance of the scholarly community. A major focus of scientific inference can be viewed as the pursuit of significant sameness, meaning replicable and empirically generalizable results among phenomena. Regrettably, the obsession with users of statistical inference to report significant differences in data sets actively thwarts cumulative knowledge development.

The manifold problems surrounding the implementation and usefulness of formal methods of statistical inference in advancing science do not speak well of much teaching in methods/statistics classes. Serious reflection on statistics' role in producing viable knowledge is needed. Commendably, the American Statistical Association is committed to addressing this challenge, as further witnessed in this special online, open access issue of The American Statistician. 相似文献

16.

Inferences on the lognormal mean for complete samples

John E. Angus 《统计学通讯:模拟与计算》2013,42(4):1307-1331

In many engineering problems it is necessary to draw statistical inferences on the mean of a lognormal distribution based on a complete sample of observations. Statistical demonstration of mean time to repair (MTTR) is one example. Although optimum confidence intervals and hypothesis tests for the lognormal mean have been developed, they are difficult to use, requiring extensive tables and/or a computer. In this paper, simplified conservative methods for calculating confidence intervals or hypothesis tests for the lognormal mean are presented. In this paper, “conservative” refers to confidence intervals (hypothesis tests) whose infimum coverage probability (supremum probability of rejecting the null hypothesis taken over parameter values under the null hypothesis) equals the nominal level. The term “conservative” has obvious implications to confidence intervals (they are “wider” in some sense than their optimum or exact counterparts). Applying the term “conservative” to hypothesis tests should not be confusing if it is remembered that this implies that their equivalent confidence intervals are conservative. No implication of optimality is intended for these conservative procedures. It is emphasized that these are direct statistical inference methods for the lognormal mean, as opposed to the already well-known methods for the parameters of the underlying normal distribution. The method currently employed in MIL-STD-471A for statistical demonstration of MTTR is analyzed and compared to the new method in terms of asymptotic relative efficiency. The new methods are also compared to the optimum methods derived by Land (1971, 1973). 相似文献