期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

《统计与信息论坛》2018,(4):6-12

针对超高维变量筛选问题,提出一种新的稳健秩条件特征筛选方法,简称为RRCSIS。该方法不依赖于模型设定,并且可以同时处理条件特征筛选和特征筛选。数值模拟表明,RRCSIS在因变量或者自变量含有厚尾分布或者含有异常值时表现都很稳健,并且明显优于其他筛选方法。此外,为了识别出联合相关而边际不相关的变量,还提出了一种迭代的筛选过程,即IRRCSIS。最后,通过一个实例分析说明了该方法的有效性。相似文献

2.

响应变量随机缺失下超高维模型特征筛选方法

来鹏季静雯刘一鸣《统计与决策》2017,(13):20-23

文章研究了响应变量随机缺失下超高维数据的特征筛选方法,Kolmogorov过滤方法被用于筛选构建倾向得分函数的重要协变量,据此推广逆概率加权技术构建响应变量随机缺失下的边际特征筛选过程.通过大样本理论证明了所提出的筛选方法在一些常规条件下具有确定性筛选性质,利用蒙特卡罗模拟研究了其有限样本性质,并将其应用于实际数据问题来验证评估其实用价值. 相似文献

3.

基于秩能量距离的超高维特征筛选研究

何胜美李高荣许王莉《统计研究》2020,(8):117-128

特征筛选是超高维数据分析中常用的快速降维方法。本文首先基于秩能量距离提出了一种新的适用于超高维判别分析的特征筛选方法(RED-SIS)。该方法无需假定模型结构和有限矩条件,对厚尾协变量数据具有较好的稳健性。其次,本文研究了该方法的理论性质,并在几个较为宽松的正则条件下,证明了确定筛选性质和排序相合性。结果表明,RED-SIS能有效处理变量维数p和样本量n满足logp=O(n^α)的超高维判别分析特征筛选问题,且随着样本量的增加,筛选出的特征集合包含全部真实重要特征集合的概率趋近于1。最后,蒙特卡罗模拟研究该方法的有限样本性质,并和现有的超高维特征筛选方法进行比较。数值模拟结果表明,该方法在厚尾数据情况下具有明显的优越性,同时,实际数据分析的研究结果也说明RED-SIS方法的有效性。相似文献

4.

超高维数据下部分线性可加分位数回归模型的变量选择

白永昕钱曼玲田茂再《统计与决策》2024,(9):43-48

在超高维数据中,一方面,协变量的维数可能远远大于样本量,甚至随着样本量以指数级的速度增长;另一方面,超高维数据通常是异质的,协变量对条件分布中心的影响可能与他们对尾部的影响大不相同,甚至会出现重尾以及异常点的复杂情况。文章在协变量维度发散且为超高维的情况下研究了部分线性可加分位数回归模型的变量选择和稳健估计问题。首先,为了实现模型的稀疏性和非参数光滑性,引入了一种非凸Atan双惩罚,并采用分位迭代坐标下降算法来解决所提方法的优化问题。在选择适当正则化参数的情况下,证明了所提双惩罚估计量的理论性质。其次,通过模拟研究对所提方法的性能进行验证。模拟结果表明,所提方法比其他惩罚方法具有更好的表现,尤其是在数据存在重尾的情况下。最后,通过基于癌症筛查病人血液样本数据的实证来验证所提方法的实用性。相似文献

5.

高维面板数据降维与变量选择方法研究 总被引：2，自引：1，他引：2

张波方国斌《统计与信息论坛》2012,27(6):21-28

从介绍高维面板数据的一般特征入手,在总结高维面板数据在实际应用中所表现出的各种不同类型及其研究理论与方法的同时,主要介绍高维面板数据因子模型和混合效应模型;对混合效应模型随机效应和边际效应中的高维协方差矩阵以及经济数据中出现的多指标大维数据的研究进展进行述评;针对高维面板数据未来的发展方向、理论与应用中尚待解决的一些关键问题进行分析与展望。相似文献

6.

超高维数据降维与Logistic广义线性拟合分析

王桂芝宋迎曦来鹏陈纪波《统计与决策》2016,(7):38-41

文章以美国威斯康星州的乳腺癌调查数据为例,分别采用SIS和TCS算法对高维数据进行降维处理,尝试将改进的Logistic广义线性模型对降维后的变量进行拟合.再与传统的一般线性模型、Logistic广义线性模型相比,结果表明,基于算法降维后的Logistic广义线性模型预测误差更小,其中基于TCS算法降维后的广义线性模型在拟合中要明显优于SIS算法降维后的广义线性模型. 相似文献

7.

GSIS超高维变量选择

《统计与信息论坛》2015,(8):16-19

变量选择在超高维统计模型中非常重要。Fan和Lv基于简单相关系数提出确保独立筛选法(SIS),但当自变量被分成组时,SIS就会失效。因为SIS只能对单个变量进行选择,不能对组变量进行选择。为此,基于边际组回归提出组确保独立筛选法(GSIS),该方法不仅对组变量有效,对单个变量也有效,或者两者的混合也同样有效。Monte Carlo模拟结果显示,GSIS的表现优于SIS。相似文献

8.

基于互信息的变量选择方法

周生彬黄叶金《统计与决策》2020,(1):20-23

文章基于解释变量与被解释变量之间的互信息提出一种新的变量选择方法:MI-SIS。该方法可以处理解释变量数目p远大于观测样本量n的超高维问题,即p=O(exp(nε))ε>0。另外,该方法是一种不依赖于模型假设的变量选择方法。数值模拟和实证研究表明,MI-SIS方法在小样本情形下能够有效地发现微弱信号。相似文献

9.

纵向数据下半参数Logistic模型的变量选择

高仙立姜玉英《统计与决策》2017,(20):26-29

文章研究了纵向数据半参数Logistic回归模型的估计问题,给出了模型中未知参数和未知函数的估计方法,探讨了参数部分的变量选择问题,并对不同的变量选择方法进行比较分析.从模拟结果可以看到,文中给出的方法具有很好的估计效果. 相似文献

10.

基于潜变量模型的多元有序数据轮廓分析法

《统计与信息论坛》2019,(5):3-9

提出了一种适用于多元有序数据的轮廓分析方法。鉴于有序数据无法满足轮廓分析对数据正态性的要求,采用潜变量模型对有序变量进行赋值,利用Bootstrap方法重构样本,使重构后的新数据满足正态性且总体均值与原样本一致,因而可以将轮廓分析法应用于有序数据均值向量的比较问题。讨论了单样本情形的同水平假设、两样本和多样本情形的平行、同水平和平坦性假设,并给出相应的检验统计量和拒绝域。最后,通过随机模拟来检验该方法的合理性,并得到结论:样本质量较高时,该方法在控制第一类错误和提高检验的功效上效果很好;对于一般样本而言,该方法的实际第一类错误较名义值有所增大,可通过提高原始样本量、降低名义第一类错误和进行多次试验来解决。相似文献

11.

Variable screening for ultrahigh dimensional censored quantile regression

Jing Pan Shucong Zhang Yong Zhou 《Journal of Statistical Computation and Simulation》2019,89(3):395-413

Quantile regression is a flexible approach to assessing covariate effects on failure time, which has attracted considerable interest in survival analysis. When the dimension of covariates is much larger than the sample size, feature screening and variable selection become extremely important and indispensable. In this article, we introduce a new feature screening method for ultrahigh dimensional censored quantile regression. The proposed method can work for a general class of survival models, allow for heterogeneity of data and enjoy desirable properties including the sure screening property and the ranking consistency property. Moreover, an iterative version of screening algorithm has also been proposed to accommodate more complex situations. Monte Carlo simulation studies are designed to evaluate the finite sample performance under different model settings. We also illustrate the proposed methods through an empirical analysis. 相似文献

12.

Model-Free Feature Screening for Ultrahigh Dimensional Data

Zhu L Li L Li R Zhu L 《Journal of the American Statistical Association》2011,106(496):1464-1475

With the recent explosion of scientific data of unprecedented size and complexity, feature ranking and screening are playing an increasingly important role in many scientific studies. In this article, we propose a novel feature screening procedure under a unified model framework, which covers a wide variety of commonly used parametric and semiparametric models. The new method does not require imposing a specific model structure on regression functions, and thus is particularly appealing to ultrahigh-dimensional regressions, where there are a huge number of candidate predictors but little information about the actual model forms. We demonstrate that, with the number of predictors growing at an exponential rate of the sample size, the proposed procedure possesses consistency in ranking, which is both useful in its own right and can lead to consistency in selection. The new procedure is computationally efficient and simple, and exhibits a competent empirical performance in our intensive simulations and real data analysis. 相似文献

13.

New Robust Variable Selection Methods for Linear Regression Models

Ziqi Chen Man‐Lai Tang Wei Gao Ning‐Zhong Shi 《Scandinavian Journal of Statistics》2014,41(3):725-741

Motivated by an entropy inequality, we propose for the first time a penalized profile likelihood method for simultaneously selecting significant variables and estimating unknown coefficients in multiple linear regression models in this article. The new method is robust to outliers or errors with heavy tails and works well even for error with infinite variance. Our proposed approach outperforms the adaptive lasso in both theory and practice. It is observed from the simulation studies that (i) the new approach possesses higher probability of correctly selecting the exact model than the least absolute deviation lasso and the adaptively penalized composite quantile regression approach and (ii) exact model selection via our proposed approach is robust regardless of the error distribution. An application to a real dataset is also provided. 相似文献

14.

Robust feature screening for high-dimensional survival data

Meiling Hao Xianhui Liu Wenlu Tang 《Journal of applied statistics》2019,46(6):979-994

Ultra-high dimensional data arise in many fields of modern science, such as medical science, economics, genomics and imaging processing, and pose unprecedented challenge for statistical analysis. With such rapid-growth size of scientific data in various disciplines, feature screening becomes a primary step to reduce the high dimensionality to a moderate scale that can be handled by the existing penalized methods. In this paper, we introduce a simple and robust feature screening method without any model assumption to tackle high dimensional censored data. The proposed method is model-free and hence applicable to a general class of survival models. The sure screening and ranking consistency properties without any finite moment condition of the predictors and the response are established. The computation of the proposed method is rather straightforward. Finite sample performance of the newly proposed method is examined via extensive simulation studies. An application is illustrated with the gene association study of the mantle cell lymphoma. 相似文献

15.

Feature Screening for Ultrahigh Dimensional Categorical Data With Applications

Danyang Huang Runze Li Hansheng Wang 《商业与经济统计学杂志》2014,32(2):237-244

Ultrahigh dimensional data with both categorical responses and categorical covariates are frequently encountered in the analysis of big data, for which feature screening has become an indispensable statistical tool. We propose a Pearson chi-square based feature screening procedure for categorical response with ultrahigh dimensional categorical covariates. The proposed procedure can be directly applied for detection of important interaction effects. We further show that the proposed procedure possesses screening consistency property in the terminology of Fan and Lv (2008). We investigate the finite sample performance of the proposed procedure by Monte Carlo simulation studies and illustrate the proposed method by two empirical datasets. 相似文献

16.

On correlation rank screening for ultra-high dimensional competing risks data

Xiaolin Chen Chenguang Li Tao Zhang Zhenlong Gao 《Journal of applied statistics》2022,49(7):1848

In recent years, numerous feature screening schemes have been developed for ultra-high dimensional standard survival data with only one failure event. Nevertheless, existing literature pays little attention to related investigations for competing risks data, in which subjects suffer from multiple mutually exclusive failures. In this article, we develop a new marginal feature screening for ultra-high dimensional time-to-event data to allow for competing risks. The proposed procedure is model-free, and robust against heavy-tailed distributions and potential outliers for time to the type of failure of interest. Apart from this, it is invariant to any monotone transformation of event time of interest. Under rather mild assumptions, it is shown that the newly suggested approach possesses the ranking consistency and sure independence screening properties. Some numerical studies are conducted to evaluate the finite-sample performance of our method and make a comparison with its competitor, while an application to a real data set is provided to serve as an illustration. 相似文献