首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 250 毫秒
1.
高维面板数据降维与变量选择方法研究   总被引:1,自引:0,他引:1  
从介绍高维面板数据的一般特征入手,在总结高维面板数据在实际应用中所表现出的各种不同类型及其研究理论与方法的同时,主要介绍高维面板数据因子模型和混合效应模型;对混合效应模型随机效应和边际效应中的高维协方差矩阵以及经济数据中出现的多指标大维数据的研究进展进行述评;针对高维面板数据未来的发展方向、理论与应用中尚待解决的一些关键问题进行分析与展望。  相似文献   

2.
孙怡帆等 《统计研究》2021,38(5):136-146
随着信息技术的发展,高维数据日益丰富。现实中,很多高维数据由多个主体各异的数据集融合而成。如何准确识别出高维数据集间的异同性成为大数据分析的目标之一。本文提出了变系数模型下的高维数据整合分析方法。该方法可以同时对多个数据集进行变量选择和系数估计,并且能 够自动识别出变量系数在数据集间的异同性。模拟结果表明本文方法在异同性识别、变量选择、系数估 计和预测等方面明显优于对比方法。在肺癌致病基因识别的应用研究中,本文方法能够识别出具有生物解释的致病基因并发现了两种亚型之间的异同性。  相似文献   

3.
刘丽萍等 《统计研究》2015,32(6):105-112
大维数据给传统的协方差阵估计方法带来了巨大的挑战,数据维度和噪声的影响不容忽视。本文将主成分和门限方法有效的结合,应用到DCC模型的估计中,提出了基于主成分正交补门限方法的DCC模型(poetDCC)。poetDCC模型主要通过前K个主成分来刻画高维动态条件协方差阵的信息,然后将门限函数应用在矩阵的正交补中,有效的降低了数据的维度并剔除了噪声的影响。通过模拟和实证研究发现:较DCC模型而言,poetDCC模型明显提高了高维协方差阵的估计和预测效率;并且将其应用在投资组合时,投资者获得了更高的投资收益和经济福利。  相似文献   

4.
缺失数据问题在抽样调查、社会科学、流行病等领域普遍存在,这一现象在高维情形下更为凸显;而与高维数据相伴的信息海量化、复杂化、异质化、缺失化等问题,给高维缺失数据理论建立及应用研究带来极大的挑战。如何建立一种稳健高效的高维缺失数据插补方法,已成为当今学者研究的焦点。为解决上述难题,创新性地将增强的逆概率加权(IPW)与加法模型融合,应用协变量平衡倾向评分法(CBPS)估计缺失概率,提出一种适用于高维缺失数据的可加协变量平衡倾向评分插补方法(CBPS-AM),期望对高维缺失问题提供更为有效的解决方案。CBPS-AM方法不仅具有多重稳健性,避免了模型误设带来的严重风险,还能够有效规避高维缺失数据具有厚尾分布而使得传统插补方法失效的问题,起到双重降维的作用,实现建模的灵活性与广泛适用性。其次借鉴广义矩估计方法和Backfitting算法给出了CBPS估计算法,该算法简洁有效,能够提高数据使用效率与插补精度,同时研究了估计量的理论性质,对比了所提方法与传统方法在数值模拟中的表现。最后将CBPS-AM方法分别应用于存在缺失的HIV临床试验数据和中国新冠病毒感染疫情数据中,建立科学的综合评价以及针对...  相似文献   

5.
高维数据给传统的协方差阵估计方法带来了巨大的挑战,数据维度和噪声的影响使传统的CCCGARCH模型估计起来较为困难。将主成分和门限方法有效结合,应用到CCC-GARCH模型的估计中,提出基于主成分正交补门限方法的CCC-GARCH模型(PTCCC-GARCH)。PTCCC模型主要通过前K个最优主成分来刻画大维协方差阵的信息,并通过门限函数以剔除噪声的影响。通过模拟和实证研究发现:较CCCGARCH模型而言,PTCCC-GARCH模型明显提高了高维协方差阵的估计和预测效率;并且将其应用在投资组合时,投资者获得了更高的投资收益和经济福利。  相似文献   

6.
韩猛  白仲林 《统计研究》2021,38(8):121-131
门限因子模型设定载荷具有阈值型区制转换结构,可以同时刻画高维时间序列的共变性和区制转换特征。针对高维门限因子模型,本文基于自适应组LASSO技术给出了一种一致模型选择过程。这一模型选择过程将因子个数设定、门限效应推断纳入统一的分析框架,不仅解决了模型选择的一致性问题,还同时实现了模型选择误差的统一控制,这对于高维门限因子模型而言是非常重要的。理论研究和随机模拟结论表明本文给出的一致模型选择过程具有良好的大样本性质和有限样本表现。最后,本文将门限因子模型应用于我国金融市场分析,实证结果进一步验证了本文理论的有效性。  相似文献   

7.
文章针对面板数据的聚类问题的高维复杂性,利用线性投影技术将其转换为关于投影特征向量的线性聚类问题;从而实现在低维空间对高维数据样本的聚类分析。最后实证分析验证了面板数据聚类分析的投影寻踪模型的可行性与有效性。  相似文献   

8.
随着金融科技的巨大进步,机器学习在金融风控领域的应用也逐渐深化起来。信用评分卡模型作为一种应用最为广泛的风险评估模型,在大数据时代存在着不能对高维、复杂、非线性的个人征信数据进行全面分析的局限性。从中国的互联网金融发展的实际情况出发,提出一种基于XGBoost机器学习算法的互联网金融风控模型,并与传统的统计评分卡模型进行了对比试验,同时给出了将机器学习模型预测结果转化为传统信用评分的解决方法。研究结果表明,机器学习模型能更好地预测个人信用风险,从而构建更加有效的风控体系。  相似文献   

9.
本文通过自组织映射网络(SOM),处理1999-2003年我国各个地区城镇居民的可支配收入与农村居民纯收入的数据,并将生成的聚类分析结果直观显示在二维图形上,进行了简单分析和对比。在我国各地区居民高等教育支付能力的研究中,自组织映射网络对高维、复杂数据的处理能力,还有更广阔的应用。  相似文献   

10.
非参数加法模型的估计困难限制了其应用范围。对此,本文提出首先采用分片逆回归(SIR)方法提取高维数据中的有效成分,进而根据回退拟合算法对模型进行迭代估计。在实证中,将这一模型应用于我国外贸货物吞吐量的预测建模中,取得了较好效果。  相似文献   

11.
High dimensional models are getting much attention from diverse research fields involving very many parameters with a moderate size of data. Model selection is an important issue in such a high dimensional data analysis. Recent literature on theoretical understanding of high dimensional models covers a wide range of penalized methods including LASSO and SCAD. This paper presents a systematic overview of the recent development in high dimensional statistical models. We provide a brief review on the recent development of theory, methods, and guideline on applications of several penalized methods. The review includes appropriate settings to be implemented and limitations along with potential solution for each of the reviewed method. In particular, we provide a systematic review of statistical theory of the high dimensional methods by considering a unified high-dimensional modeling framework together with high level conditions. This framework includes (generalized) linear regression and quantile regression as its special cases. We hope our review helps researchers in this field to have a better understanding of the area and provides useful information to future study.  相似文献   

12.
We develop a novel estimation algorithm for a dynamic factor model (DFM) applied to panel data with a short time dimension and a large cross sectional dimension. Current DFMs usually require panels with a minimum of 20 years of quarterly data (80 time observations per panel). In contrast, the application we consider includes panels with a median of 8 annual observations. As a result, the time dimension in our paper is substantially shorter than previous work in the DFM literature. This difference increases the computational challenges of the estimation process which we address by developing the “Two-Cycle Conditional Expectation - Maximization” (2CCEM) algorithm which is a variant of the EM algorithm and its extensions. We analyze the conditions under which our model is identified and provide simulation results demonstrating consistency of our 2CCEM estimator. We apply the DFM to a dataset of 802 water and sanitation utilities from 43 countries and use the 2CCEM algorithm in order to estimate dynamic performance trajectories for each utility.  相似文献   

13.
Principal component analysis is a popular dimension reduction technique often used to visualize high‐dimensional data structures. In genomics, this can involve millions of variables, but only tens to hundreds of observations. Theoretically, such extreme high dimensionality will cause biased or inconsistent eigenvector estimates, but in practice, the principal component scores are used for visualization with great success. In this paper, we explore when and why the classical principal component scores can be used to visualize structures in high‐dimensional data, even when there are few observations compared with the number of variables. Our argument is twofold: First, we argue that eigenvectors related to pervasive signals will have eigenvalues scaling linearly with the number of variables. Second, we prove that for linearly increasing eigenvalues, the sample component scores will be scaled and rotated versions of the population scores, asymptotically. Thus, the visual information of the sample scores will be unchanged, even though the sample eigenvectors are biased. In the case of pervasive signals, the principal component scores can be used to visualize the population structures, even in extreme high‐dimensional situations.  相似文献   

14.
Recently, mixture distribution becomes more and more popular in many scientific fields. Statistical computation and analysis of mixture models, however, are extremely complex due to the large number of parameters involved. Both EM algorithms for likelihood inference and MCMC procedures for Bayesian analysis have various difficulties in dealing with mixtures with unknown number of components. In this paper, we propose a direct sampling approach to the computation of Bayesian finite mixture models with varying number of components. This approach requires only the knowledge of the density function up to a multiplicative constant. It is easy to implement, numerically efficient and very practical in real applications. A simulation study shows that it performs quite satisfactorily on relatively high dimensional distributions. A well-known genetic data set is used to demonstrate the simplicity of this method and its power for the computation of high dimensional Bayesian mixture models.  相似文献   

15.
In this paper we review some of recent developments in high dimensional data analysis, especially in the estimation of covariance and precision matrix, asymptotic results on the eigenstructure in the principal components analysis, and some relevant issues such as test on the equality of two covariance matrices, determination of the number of principal components, and detection of hubs in a complex network.  相似文献   

16.
Abstract

This paper examines the high dimensional asymptotics of the naive Hotelling T2 statistic. Naive Bayes has been utilized in high dimensional pattern recognition as a method to avoid singularities in the estimated covariance matrix. The naive Hotelling T2 statistic, which is equivalent to the estimator of the naive canonical correlation, is a statistically important quantity in naive Bayes and its high dimensional behavior has been studied under several conditions. In this paper, asymptotic normality of the naive Hotelling T2 statistic under a high dimension low sample size setting is developed using the central limit theorem of a martingale difference sequence.  相似文献   

17.
Fan J  Lv J 《Statistica Sinica》2010,20(1):101-148
High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods.  相似文献   

18.
秦磊  王奕丹  苏治 《统计研究》2020,37(3):114-128
随着信息技术的飞速发展,大规模数据在短时间内搜集并储存下来,为分析决策提供了巨大的信息量,也给统计建模带来了一定难度。对于样本容量大、变量个数少的数据,Leverage重要性抽样是一个简便可行的方法。本文发现,该方法中度量样本重要性的Leverage分数与因变量无关,而且在维度较大的情形下对样本没有区分程度,使得估计结果较差。为了同时考虑因变量和维度的影响,本文提出了基于充分降维的Leverage重要性抽样方法。该方法以不损失信息为前提,在充分降维的空间内重新计算Leverage分数,使得抽样更具有代表性。模拟数据分析显示,在样本容量较大的复杂数据中,相比于原始的Leverage重要性抽样方法,本文提出的方法可以降低估计的均方误差。三个实际数据也证实了该方法的可行性和有效性。  相似文献   

19.
数据分布密度划分的聚类算法是数据挖掘聚类算法的主要方法之一。针对传统密度划分聚类算法存在运算复杂、运行效率不高等缺陷,设计高维分步投影的多重分区聚类算法;以高维分布投影密度为依据,对数据集进行多重分区,产生数据集的子簇空间,并进行子簇合并,形成理想的聚类结果;依据该算法进行实验,结果证明该算法具有运算简单和运行效率高等优良性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号