首页 | 本学科首页   官方微博 | 高级检索  
     

大数据背景下网络调查样本的建模推断问题研究——以广义Boosted模型的倾向得分推断为例
引用本文:刘展 潘莹丽. 大数据背景下网络调查样本的建模推断问题研究——以广义Boosted模型的倾向得分推断为例[J]. 统计研究, 2019, 36(9): 93. DOI: 10.19343/j.cnki.11-1302/c.2019.09.008
作者姓名:刘展 潘莹丽
摘    要:
随着大数据和网络的不断发展,网络调查越来越广泛,大部分网络调查样本属于非概率样本,难以采用传统的抽样推断理论进行推断,如何解决网络调查样本的推断问题是大数据背景下网络调查发展的迫切需求。本文首次从建模的角度提出了解决该问题的基本思路:一是入样概率的建模推断,可以考虑构建基于机器学习与变量选择的倾向得分模型来估计入样概率推断总体;二是目标变量的建模推断,可以考虑直接对目标变量建立参数、非参数或半参数超总体模型进行估计;三是入样概率与目标变量的双重建模推断,可以考虑进行倾向得分模型与超总体模型的加权估计与混合推断。最后,以基于广义Boosted模型的入样概率建模推断为例演示了具体解决方法。

关 键 词:大数据  网络调查样本  入样概率  目标变量  建模推断  

Research on the Modeling Inference of Web Survey Samples In the Context of Big Data: Taking Propensity Score Inference of Generalized Boosted Model as an Example
Liu Zhan , Pan Yingli. Research on the Modeling Inference of Web Survey Samples In the Context of Big Data: Taking Propensity Score Inference of Generalized Boosted Model as an Example[J]. Statistical Research, 2019, 36(9): 93. DOI: 10.19343/j.cnki.11-1302/c.2019.09.008
Authors:Liu Zhan & Pan Yingli
Abstract:
With the development of big data and internet, web surveys are becoming more and more extensive. However, most of web survey samples belong to non-probability samples. It is difficult to apply the traditional inference theory of probability sampling to web survey samples. Therefore, how to solve inference problems of web survey samples is the urgent need for the development of web surveys in the context of big data. The research proposes some basic ideas to solve this problem from the perspective of modeling for the first time. First, inclusion probabilities can be estimated via modeling for inference. That is, propensity score models based on machine learning and variable selection can be constructed to estimate inclusion probabilities. Second, target variables can be estimated via modeling for inference. It can be considered to establish parametric, non-parametric or semi-parametric superpopulation models of target variables for estimating the population. Third, both inclusion probabilities and target variables can be estimated via modeling for inference. The weighted estimation and hybrid inference of propensity score models and superpopulation models can be considered. Finally, the modeling inference method of inclusion probabilities based on generalized boosted model is taken as an example to discuss concrete solutions to the modeling inference problem of web survey samples.
Keywords:Big Data  Web Survey Samples  Inclusion Probability  Target Variables  Modeling Inference  
点击此处可从《统计研究》浏览原始摘要信息
点击此处可从《统计研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号