首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 193 毫秒
1.
秦磊  熊巍  田茂再 《统计研究》2016,(8):101-105
大数据以其巨大的样本容量或超高的变量维度使得直接计算变得不再可能,如何有效地抽取一个合适的计算样本是值得思考的问题.本文借鉴Leverage重要性抽样的思想,提出了两种稳健的改进抽样算法,不仅有效地抽取了代表性高的计算样本进行回归估计,还规避了方差大和异质性导致协方差矩阵估计不准的问题.模拟数据的分析显示,相比于Ma (2015)的方法,本文提出的方法具有更为优良的估计结果.  相似文献   

2.
李向杰等 《统计研究》2018,35(7):115-124
经典的充分降维方法对解释变量存在异常值或者当其是厚尾分布时效果较差,为此,经过对充分降维理论中加权与累积切片的分析研究,本文提出了一种将两者有机结合的稳健降维方法:累积加权切片逆回归法(CWSIR)。该方法对自变量存在异常值以及小样本情况下表现比较稳健,并且有效避免了对切片数目的选择。数值模拟结果显示CWSIR要优于传统的切片逆回归(SIR)、累积切片估计(CUME)、基于等高线的切片逆回归估计(CPSIR)、加权典则相关估计(WCANCOR)、切片逆中位数估计(SIME)、加权逆回归估计(WIRE)等方法。最后,我们通过对某视频网站真实数据的分析也验证了CWSIR的有效性。  相似文献   

3.
金勇进  刘展 《统计研究》2016,33(3):11-17
利用大数据进行抽样,很多情况下抽样框的构造比较困难,使得抽取的样本属于非概率样本,难以将传统的抽样推断理论应用到非概率样本中,如何解决非概率抽样的统计推断问题,是大数据背景下抽样调查面临的严重挑战。本文提出了解决非概率抽样统计推断问题的基本思路:一是抽样方法,可以考虑基于样本匹配的样本选择、链接跟踪抽样方法等,使得到的非概率样本近似于概率样本,从而可采用概率样本的统计推断理论;二是权数的构造与调整,可以考虑基于伪设计、模型和倾向得分等方法得到类似于概率样本的基础权数;三是估计,可以考虑基于伪设计、模型和贝叶斯的混合概率估计。最后,以基于样本匹配的样本选择为例探讨了具体解决方法。  相似文献   

4.
在高维空间中进行统计建模通常会碰到"维数祸根"问题,解决办法之一是降维,充分降维是一种有效的降维方法。针对多维响应降维子空间提出一类矩生成函数估计方法及其改进估计量,并给出该类方法估计量的大样本性质:相合性、渐近正态性。通过随机模拟与实例分析,表明改进估计量估计效果有较大幅度提高。  相似文献   

5.
如何解决网络访问固定样本调查的统计推断问题,是大数据背景下网络调查面临的严重挑战。针对此问题,提出将网络访问固定样本的调查样本与概率样本结合,利用倾向得分逆加权和加权组调整构造伪权数来估计目标总体,进一步采用基于有放回概率抽样的Vwr方法、基于广义回归估计的Vgreg方法与Jackknife方法来估计方差,并比较不同方法估计的效果。研究表明:无论概率样本的样本量较大还是较小,本研究所提出的总体均值估计方法效果较好,并且在方差估计中Jackknife方法的估计效果最好。  相似文献   

6.
巩红禹  陈雅 《统计研究》2018,35(12):113-122
本文主要讨论样本代表性的改进和多目标调查两个问题。一,本文提出了一种新的改进样本代表性多目标抽样方法,增加样本量与调整样本结构相结合的方法-追加样本的平衡设计,即通过追加样本,使得补充的样本与原来的样本组合生成新的平衡样本,相对于初始样本,减少样本与总体的结构性偏差。平衡样本是指辅助变量总量的霍维茨汤普森估计量等于总体总量真值。二,平衡样本通过选择与多个目标参数相关的辅助变量,使得一套样本对不同的目标参数而言都具有良好的代表性,进而完成多目标调查。结合2010年第六次人口分县普查数据,通过选择多个目标参数,对追加样本后的平衡样本作事后评估结果表明,追加平衡设计能够有效改进样本结构,使得样本结构与总体结构相近,降低目标估计的误差;同时也说明平衡抽样设计能够实现多目标调查,提高样本的使用效率。  相似文献   

7.
秦磊等 《统计研究》2018,35(6):109-116
针对具有多个来源的异质性数据,文献中通常提出复杂程度较高的模型用于描述每个数据子总体的特征,而本文着眼于刻画不同数据子总体的共性进而建立一个简单的模型。在参数估计方面,本文借鉴了普通线性模型的Maximin估计思想,提出了适用于广义线性模型的Maximin似然比估计方法及稀疏结构下的惩罚估计。该方法通过最大化所有子总体中似然比统计量的最小值,构建成一个简单而保守的模型,以减少数据来源较多而呈现的复杂性。所提方法适用于因变量服从正态分布、两点分布、泊松分布等指数族分布的情形,丰富了前人的研究成果,具有更好的实践意义。模拟分析显示,相比于经典的估计方法,Maximin似然比估计方法不仅能够有效地探寻子总体的共性,而且具有较高的样本外预测精度。本文提出的方法也适用于政府统计和经济统计中具有异质性的大型数据集。  相似文献   

8.
文章考虑了大样本下线性回归中同时进行快速估计和变量选择的问题,即针对一个存在稀疏解的大样本线性模型,根据重要性抽样分布从全数据集抽取少量子样本,对该子样本进行自适应Lasso估计。通过随机模拟研究,将该算法分别应用在几种不同的数据集中,并从模型预测精度和可解释性两个方面比较了四种子抽样方法在该算法下的表现。模拟结果表明,所提出的算法具有良好表现,在计算开销上也具有一定优势。  相似文献   

9.
米子川 《统计研究》2015,32(6):99-104
滚雪球抽样是一种非概率抽样方法,一般采用线索触发的方式进行抽样,通过同伴推荐和再推荐来逐次抽取并组织样本。本文引入捕获再捕获抽样估计方法对并发式多样本滚雪球抽样的总体规模进行估计,比较了Peterson、Chapman等的估计方法,提出了以并发式加权线性模型去估计总体规模的基本思路,在本文设计的三种估计量中,共权估计量方差最小,离散性质局部最优。最后,对一个真实的滚雪球抽样调查过程进行了实证分析。  相似文献   

10.
正态总体下参数的优化极大似然估计方法   总被引:1,自引:0,他引:1  
文章讨论了一种新的抽样方法,基于这一抽样方法提出了样本参数的优化极大似然估计,并进一步与简单随机抽样下的参数的极大似然估计结果作比较,从估计渐进效率的角度说明了该方法的优良性。  相似文献   

11.
随着计算机的飞速发展,极大地便利了数据的获取和存储,很多企业积累了大量的数据,同时数据的维度也越来越高,噪声变量越来越多,因此在建模分析时面临的重要问题之一就是从高维的变量中筛选出少数的重要变量。针对因变量取值为(0,1)区间的比例数据提出了正则化Beta回归,研究了在LASSO、SCAD和MCP三种惩罚方法下的极大似然估计及其渐进性质。统计模拟表明MCP的方法会优于SCAD和LASSO,并且随着样本量的增大,SCAD的方法也将优于LASSO。最后,将该方法应用到中国上市公司股息率的影响因素研究中。  相似文献   

12.

Pairwise likelihood is a limited information estimation method that has also been used for estimating the parameters of latent variable and structural equation models. Pairwise likelihood is a special case of composite likelihood methods that uses lower-order conditional or marginal log-likelihoods instead of the full log-likelihood. The composite likelihood to be maximized is a weighted sum of marginal or conditional log-likelihoods. Weighting has been proposed for increasing efficiency, but the choice of weights is not straightforward in most applications. Furthermore, the importance of leaving out higher-order scores to avoid duplicating lower-order marginal information has been pointed out. In this paper, we approach the problem of weighting from a sampling perspective. More specifically, we propose a sampling method for selecting pairs based on their contribution to the total variance from all pairs. The sampling approach does not aim to increase efficiency but to decrease the estimation time, especially in models with a large number of observed categorical variables. We demonstrate the performance of the proposed methodology using simulated examples and a real application.

  相似文献   

13.
Estimation of a general multi-index model comprises determining the number of linear combinations of predictors (structural dimension) that are related to the response, estimating the loadings of each index vector, selecting the active predictors and estimating the underlying link function. These objectives are often achieved sequentially at different stages of the estimation process. In this study, we propose a unified estimation approach under a semi-parametric model framework to attain these estimation goals simultaneously. The proposed estimation method is more efficient and stable than many existing methods where the estimation error in the structural dimension may propagate to the estimation of the index vectors and variable selection stages. A detailed algorithm is provided to implement the proposed method. Comprehensive simulations and a real data analysis illustrate the effectiveness of the proposed method.  相似文献   

14.
A new estimation method for the dimension of a regression at the outset of an analysis is proposed. A linear subspace spanned by projections of the regressor vector X , which contains part or all of the modelling information for the regression of a vector Y on X , and its dimension are estimated via the means of parametric inverse regression. Smooth parametric curves are fitted to the p inverse regressions via a multivariate linear model. No restrictions are placed on the distribution of the regressors. The estimate of the dimension of the regression is based on optimal estimation procedures. A simulation study shows the method to be more powerful than sliced inverse regression in some situations.  相似文献   

15.
In a longitudinal study, an individual is followed up over a period of time. Repeated measurements on the response and some time-dependent covariates are taken at a series of sampling times. The sampling times are often irregular and depend on covariates. In this paper, we propose a sampling adjusted procedure for the estimation of the proportional mean model without having to specify a sampling model. Unlike existing procedures, the proposed method is robust to model misspecification of the sampling times. Large sample properties are investigated for the estimators of both regression coefficients and the baseline function. We show that the proposed estimation procedure is more efficient than the existing procedures. Large sample confidence intervals for the baseline function are also constructed by perturbing the estimation equations. A simulation study is conducted to examine the finite sample properties of the proposed estimators and to compare with some of the existing procedures. The method is illustrated with a data set from a recurrent bladder cancer study.  相似文献   

16.
Two-phase sampling is a cost-effective method of data collection using outcome-dependent sampling for the second-phase sample. In order to make efficient use of auxiliary information and to improve domain estimation, mass imputation can be used in two-phase sampling. Rao and Sitter (1995) introduce mass imputation for two-phase sampling and its variance estimation under simple random sampling in both phases. In this paper, we extend the Rao–Sitter method to general sampling design. The proposed method is further extended to mass imputation for categorical data. A limited simulation study is performed to examine the performance of the proposed methods.  相似文献   

17.
Traditionally, time series analysis involves building an appropriate model and using either parametric or nonparametric methods to make inference about the model parameters. Motivated by recent developments for dimension reduction in time series, an empirical application of sufficient dimension reduction (SDR) to nonlinear time series modelling is shown in this article. Here, we use time series central subspace as a tool for SDR and estimate it using mutual information index. Especially, in order to reduce the computational complexity in time series, we propose an efficient estimation method of minimal dimension and lag using a modified Schwarz–Bayesian criterion, when either of the dimensions and the lags is unknown. Through simulations and real data analysis, the approach presented in this article performs well in autoregression and volatility estimation.  相似文献   

18.
Length‐biased sampling data are often encountered in the studies of economics, industrial reliability, epidemiology, genetics and cancer screening. The complication of this type of data is due to the fact that the observed lifetimes suffer from left truncation and right censoring, where the left truncation variable has a uniform distribution. In the Cox proportional hazards model, Huang & Qin (Journal of the American Statistical Association, 107, 2012, p. 107) proposed a composite partial likelihood method which not only has the simplicity of the popular partial likelihood estimator, but also can be easily performed by the standard statistical software. The accelerated failure time model has become a useful alternative to the Cox proportional hazards model. In this paper, by using the composite partial likelihood technique, we study this model with length‐biased sampling data. The proposed method has a very simple form and is robust when the assumption that the censoring time is independent of the covariate is violated. To ease the difficulty of calculations when solving the non‐smooth estimating equation, we use a kernel smoothed estimation method (Heller; Journal of the American Statistical Association, 102, 2007, p. 552). Large sample results and a re‐sampling method for the variance estimation are discussed. Some simulation studies are conducted to compare the performance of the proposed method with other existing methods. A real data set is used for illustration.  相似文献   

19.
高维数据给传统的协方差阵估计方法带来了巨大的挑战,数据维度和噪声的影响使传统的CCCGARCH模型估计起来较为困难。将主成分和门限方法有效结合,应用到CCC-GARCH模型的估计中,提出基于主成分正交补门限方法的CCC-GARCH模型(PTCCC-GARCH)。PTCCC模型主要通过前K个最优主成分来刻画大维协方差阵的信息,并通过门限函数以剔除噪声的影响。通过模拟和实证研究发现:较CCCGARCH模型而言,PTCCC-GARCH模型明显提高了高维协方差阵的估计和预测效率;并且将其应用在投资组合时,投资者获得了更高的投资收益和经济福利。  相似文献   

20.
刘丽萍等 《统计研究》2015,32(6):105-112
大维数据给传统的协方差阵估计方法带来了巨大的挑战,数据维度和噪声的影响不容忽视。本文将主成分和门限方法有效的结合,应用到DCC模型的估计中,提出了基于主成分正交补门限方法的DCC模型(poetDCC)。poetDCC模型主要通过前K个主成分来刻画高维动态条件协方差阵的信息,然后将门限函数应用在矩阵的正交补中,有效的降低了数据的维度并剔除了噪声的影响。通过模拟和实证研究发现:较DCC模型而言,poetDCC模型明显提高了高维协方差阵的估计和预测效率;并且将其应用在投资组合时,投资者获得了更高的投资收益和经济福利。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号