首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 618 毫秒
1.
孙怡帆等 《统计研究》2019,36(3):124-128
从大量基因中识别出致病基因是大数据下的一个十分重要的高维统计问题。基因间网络结构的存在使得对于致病基因的识别已从单个基因识别扩展到基因模块识别。从基因网络中挖掘出基因模块就是所谓的社区发现(或节点聚类)问题。绝大多数社区发现方法仅利用网络结构信息,而忽略节点本身的信息。Newman和Clauset于2016年提出了一个将二者有机结合的基于统计推断的社区发现方法(简称为NC方法)。本文以NC方法为案例,介绍统计方法在实际基因网络中的应用和取得的成果,并从统计学角度提出了改进措施。通过对NC方法的分析可以看出对于以基因网络为代表的非结构化数据,统计思想和原理在数据分析中仍然处于核心地位。而相应的统计方法则需要针对数据的特点及关心的问题进行相应的调整和优化。  相似文献   

2.
多指标面板数据能够较全面的提供研究对象的信息和数据特征,但复杂的数据结构也给其聚类分析带来了一定的困难.针对这一问题,文章提出了基于特征提取的多指标面板数据聚类方法,该方法将能够表征面板数据动态变化的“绝对量”特征、“波动”特征、“偏度”特征、“峰度”特征及“趋势”特征引入动态聚类算法中,可以避免以往采用欧式距离进行聚类的局限性,还可以处理带有缺失数据的面板数据,同时大大提高了聚类效率,并最大限度地保证时间维度信息不受损失.利用该方法分析了2001至2013年我国不同省份道路交通事故的不平衡状况,通过实证分析表明该方法能够解决多指标面板数据聚类的问题.  相似文献   

3.
本文在传统统计回归方法的基础上,构建了一种新的特征样本重复抽样回归(FSR)建模方法.该方法是依据变量特征采用机器抽样方法重复抽样,形成多个特征样本,然后对多个样本进行参数估计,形成参数的抽样分布;最后依据抽样分布,在多个优化目标要求下建立最优化模型.FSR方法能够作为社会科学研究中一种通用的建模方法.  相似文献   

4.
生物统计学是以解决生物学、医学、公共卫生学、农学等领域科学问题为目标的应用型学科,近年来在精准医疗的背景下得以快速发展.另一方面,生物统计研究面对的数据存在海量化、复杂化和异质化的大数据特征,对理论与应用研究者都提出了新的挑战.本文围绕生物统计研究中的流行病学研究、临床试验设计、生存数据分析和基因数据分析展开讨论,在介绍基本思路的基础上对最新挑战及前沿发展方向进行展望.  相似文献   

5.
房地产价格指数的计算和分析,需要充分考虑房屋质量特征变化会产生的影响,应将房屋质量特征因素剔除,从而使房地产价格指数反映房屋的"纯"价格变动.文章引入引入反映质量特征的交叉变量对交互影响进行分析,并且和传统的价格指教计算方法进行比较.同时,试探性的利用Chow检验方法进行了结构变化检验,用来验证房屋特征质量效应对房地产价格指数的影响.  相似文献   

6.
变动权重组合预测方法使预测精度在过去常规的固定权重预测法的基础上大幅度提高.文章在分析了季节变动数据特征的基础上,对季节变动数据提出了变权重动态调整预测的方法.该方法不仅具备变动权重的大幅度降低预测误差的优势,同时基于季节变动数据的特征,采用了动态规划的思想调整预测数据,使误差进一步降低.  相似文献   

7.
在多重假设检验中,真正原假设的个数m0是未知的,但是它有着很重要的影响,因此,它在最近的统计文献中备受关注.文章综述了三种主要的估计方法:最低斜率法、三次样条法、均值估计方法.然后将上述三种方法结合起来,提出了新的估计方法:均值三次样条法,并主要研究了其在微阵列数据上的应用.大量的模拟研究表明,和其他方法相比,新的估计方法具有较小的偏差和标准差.最后利用真实数据来对估计方法进行评估,并找出了差异表达性基因.模拟和实际数据表明此方法具有显著性提高.  相似文献   

8.
文章针对多元线性回归分析中经常遇到小样本、多自变量以及所产生的多重共线性这一问题,提出了基于投影寻踪(PP-RAGA)提取自变量综合特征因素的多元线性回归分析方法.该方法在自变量过多时,根据自变量所表现出的共同特性,将符合共同特征的自变量通过投影寻踪方法划归所隶属综合特征因素,最后将得到的综合特征因素作为新的自变量进行多元回归分析.通过实例验证,该方法可以解决小样本情况下自变量过多以及多重共线性问题,使回归模型更具有研究意义,是值得借鉴一种新型多元回归分析方法.  相似文献   

9.
文章采用了一种新的方法--非参数核密度估计,对恒生大盘指数的收益率分布函数进行了研究.这种新方法不仅很好地刻画了收益率分布的尖峰肥尾等非线性特征,而且比一般的正态分布更能捕捉市场的风险特征,结论也更加准确.  相似文献   

10.
房屋价格指数的计算和分析,需要充分考虑房屋质量特征变化产生的影响,应将房屋质量特征因素剔除,从而使房屋价格指数反映房屋的"纯"价格变动.然而中国现行的房屋价格指数计算方法基本上没有考虑对房屋质量特征因素的系统调整问题.文章采用发达国家通行的Hedonic模型对房屋价格指数进行计算,并且和传统的价格指数计算方法进行比较.同时,进一步引入质量特征的交叉变量对交互影响进行分析,尝试性地利用Chow检验方法对Hedonic模型进行结构变化检验.研究显示,房屋特征质量效应的变化对房屋价格指数的影响确实可以从房屋价格指数中分离出来.  相似文献   

11.
Selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes. In this paper, we propose a flexible rank-based nonparametric procedure for gene selection from microarray data. In the method we propose a statistic for testing whether area under receiver operating characteristic curve (AUC) for each gene is equal to 0.5 allowing different variance for each gene. The contribution to this “single gene” statistic is the studentization of the empirical AUC, which takes into account the variances associated with each gene in the experiment. Delong et al. proposed a nonparametric procedure for calculating a consistent variance estimator of the AUC. We use their variance estimation technique to get a test statistic, and we focus on the primary step in the gene selection process, namely, the ranking of genes with respect to a statistical measure of differential expression. Two real datasets are analyzed to illustrate the methods and a simulation study is carried out to assess the relative performance of different statistical gene ranking measures. The work includes how to use the variance information to produce a list of significant targets and assess differential gene expressions under two conditions. The proposed method does not involve complicated formulas and does not require advanced programming skills. We conclude that the proposed methods offer useful analytical tools for identifying differentially expressed genes for further biological and clinical analysis.  相似文献   

12.
The paper presents a new approach to interrelated two-way clustering of gene expression data. Clustering of genes has been effected using entropy and a correlation measure, whereas the samples have been clustered using the fuzzy C-means. The efficiency of this approach has been tested on two well known data sets: the colon cancer data set and the leukemia data set. Using this approach, we were able to identify the important co-regulated genes and group the samples efficiently at the same time.  相似文献   

13.
A Bayesian mixture model for differential gene expression   总被引:3,自引:0,他引:3  
Summary.  We propose model-based inference for differential gene expression, using a nonparametric Bayesian probability model for the distribution of gene intensities under various conditions. The probability model is a mixture of normal distributions. The resulting inference is similar to a popular empirical Bayes approach that is used for the same inference problem. The use of fully model-based inference mitigates some of the necessary limitations of the empirical Bayes method. We argue that inference is no more difficult than posterior simulation in traditional nonparametric mixture-of-normal models. The approach proposed is motivated by a microarray experiment that was carried out to identify genes that are differentially expressed between normal tissue and colon cancer tissue samples. Additionally, we carried out a small simulation study to verify the methods proposed. In the motivating case-studies we show how the nonparametric Bayes approach facilitates the evaluation of posterior expected false discovery rates. We also show how inference can proceed even in the absence of a null sample of known non-differentially expressed scores. This highlights the difference from alternative empirical Bayes approaches that are based on plug-in estimates.  相似文献   

14.
15.
ABSTRACT

In many longitudinal studies, there may exist informative observation times and a dependent terminal event that stops the follow-up. In this paper, we propose a joint model for analysis of longitudinal data with informative observation times and a dependent terminal event via two latent variables. Estimation procedures are developed for parameter estimation, and asymptotic properties of the proposed estimators are derived. Simulation studies demonstrate that the proposed method performs well for practical settings. An application to a bladder cancer study is illustrated.  相似文献   

16.
The aim of this study is to determine the effect of informative priors for variables with missing value and to compare Bayesian Cox regression and Cox regression analysis. For this purpose, firstly simulated data sets with different sample size within different missing rate were generated and each of data sets were analysed by Cox regression and Bayesian Cox regression with informative prior. Secondly lung cancer data set as real data set was used for analysis. Consequently, using informative priors for variables with missing value solved the missing data problem.  相似文献   

17.
In the analysis of time-to-event data with multiple causes using a competing risks Cox model, often the cause of failure is unknown for some of the cases. The probability of a missing cause is typically assumed to be independent of the cause given the time of the event and covariates measured before the event occurred. In practice, however, the underlying missing-at-random assumption does not necessarily hold. Motivated by colorectal cancer molecular pathological epidemiology analysis, we develop a method to conduct valid analysis when additional auxiliary variables are available for cases only. We consider a weaker missing-at-random assumption, with missing pattern depending on the observed quantities, which include the auxiliary covariates. We use an informative likelihood approach that will yield consistent estimates even when the underlying model for missing cause of failure is misspecified. The superiority of our method over naive methods in finite samples is demonstrated by simulation study results. We illustrate the use of our method in an analysis of colorectal cancer data from the Nurses’ Health Study cohort, where, apparently, the traditional missing-at-random assumption fails to hold.  相似文献   

18.
Summary.  Statistical methods of ecological analysis that attempt to reduce ecological bias are empirically evaluated to determine in which circumstances each method might be practicable. The method that is most successful at reducing ecological bias is stratified ecological regression. It allows individual level covariate information to be incorporated into a stratified ecological analysis, as well as the combination of disease and risk factor information from two separate data sources, e.g. outcomes from a cancer registry and risk factor information from the census sample of anonymized records data set. The aggregated individual level model compares favourably with this model but has convergence problems. In addition, it is shown that the large areas that are covered by local authority districts seem to reduce between-area variability and may therefore not be as informative as conducting a ward level analysis. This has policy implications because access to ward level data is restricted.  相似文献   

19.
Penalised likelihood methods, such as the least absolute shrinkage and selection operator (Lasso) and the smoothly clipped absolute deviation penalty, have become widely used for variable selection in recent years. These methods impose penalties on regression coefficients to shrink a subset of them towards zero to achieve parameter estimation and model selection simultaneously. The amount of shrinkage is controlled by the regularisation parameter. Popular approaches for choosing the regularisation parameter include cross‐validation, various information criteria and bootstrapping methods that are based on mean square error. In this paper, a new data‐driven method for choosing the regularisation parameter is proposed and the consistency of the method is established. It holds not only for the usual fixed‐dimensional case but also for the divergent setting. Simulation results show that the new method outperforms other popular approaches. An application of the proposed method to motif discovery in gene expression analysis is included in this paper.  相似文献   

20.
We propose a novel interpretation for a recently proposed Box–Cox transformation cure model, which leads to a natural extension of the cure model. Based on the extended model, we consider an important issue of model selection between the mixture cure model and the bounded cumulative hazard cure model via the likelihood ratio test, score test and Akaike’s Information Criterion (AIC). Our empirical study shows that AIC is informative and both the score test and the likelihood ratio test have adequate power to differentiate between the mixture cure model and the bounded cumulative hazard cure model when the sample size is large. We apply the tests and AIC methods to leukemia and colon cancer data to examine the appropriateness of the cure models considered for them in the literature.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号