首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The estimation of Bayesian networks given high‐dimensional data, in particular gene expression data, has been the focus of much recent research. Whilst there are several methods available for the estimation of such networks, these typically assume that the data consist of independent and identically distributed samples. It is often the case, however, that the available data have a more complex mean structure, plus additional components of variance, which must then be accounted for in the estimation of a Bayesian network. In this paper, score metrics that take account of such complexities are proposed for use in conjunction with score‐based methods for the estimation of Bayesian networks. We propose first, a fully Bayesian score metric, and second, a metric inspired by the notion of restricted maximum likelihood. We demonstrate the performance of these new metrics for the estimation of Bayesian networks using simulated data with known complex mean structures. We then present the analysis of expression levels of grape‐berry genes adjusting for exogenous variables believed to affect the expression levels of the genes. Demonstrable biological effects can be inferred from the estimated conditional independence relationships and correlations amongst the grape‐berry genes.  相似文献   

2.
A joint estimation approach for multiple high‐dimensional Gaussian copula graphical models is proposed, which achieves estimation robustness by exploiting non‐parametric rank‐based correlation coefficient estimators. Although we focus on continuous data in this paper, the proposed method can be extended to deal with binary or mixed data. Based on a weighted minimisation problem, the estimators can be obtained by implementing second‐order cone programming. Theoretical properties of the procedure are investigated. We show that the proposed joint estimation procedure leads to a faster convergence rate than estimating the graphs individually. It is also shown that the proposed procedure achieves an exact graph structure recovery with probability tending to 1 under certain regularity conditions. Besides theoretical analysis, we conduct numerical simulations to compare the estimation performance and graph recovery performance of some state‐of‐the‐art methods including both joint estimation methods and estimation methods for individuals. The proposed method is then applied to a gene expression data set, which illustrates its practical usefulness.  相似文献   

3.
Selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes. In this paper, we propose a flexible rank-based nonparametric procedure for gene selection from microarray data. In the method we propose a statistic for testing whether area under receiver operating characteristic curve (AUC) for each gene is equal to 0.5 allowing different variance for each gene. The contribution to this “single gene” statistic is the studentization of the empirical AUC, which takes into account the variances associated with each gene in the experiment. Delong et al. proposed a nonparametric procedure for calculating a consistent variance estimator of the AUC. We use their variance estimation technique to get a test statistic, and we focus on the primary step in the gene selection process, namely, the ranking of genes with respect to a statistical measure of differential expression. Two real datasets are analyzed to illustrate the methods and a simulation study is carried out to assess the relative performance of different statistical gene ranking measures. The work includes how to use the variance information to produce a list of significant targets and assess differential gene expressions under two conditions. The proposed method does not involve complicated formulas and does not require advanced programming skills. We conclude that the proposed methods offer useful analytical tools for identifying differentially expressed genes for further biological and clinical analysis.  相似文献   

4.
This paper presents a new Bayesian, infinite mixture model based, clustering approach, specifically designed for time-course microarray data. The problem is to group together genes which have “similar” expression profiles, given the set of noisy measurements of their expression levels over a specific time interval. In order to capture temporal variations of each curve, a non-parametric regression approach is used. Each expression profile is expanded over a set of basis functions and the sets of coefficients of each curve are subsequently modeled through a Bayesian infinite mixture of Gaussian distributions. Therefore, the task of finding clusters of genes with similar expression profiles is then reduced to the problem of grouping together genes whose coefficients are sampled from the same distribution in the mixture. Dirichlet processes prior is naturally employed in such kinds of models, since it allows one to deal automatically with the uncertainty about the number of clusters. The posterior inference is carried out by a split and merge MCMC sampling scheme which integrates out parameters of the component distributions and updates only the latent vector of the cluster membership. The final configuration is obtained via the maximum a posteriori estimator. The performance of the method is studied using synthetic and real microarray data and is compared with the performances of competitive techniques.  相似文献   

5.
A multistage variable selection method is introduced for detecting association signals in structured brain‐wide and genome‐wide association studies (brain‐GWAS). Compared to conventional methods that link one voxel to one single nucleotide polymorphism (SNP), our approach is more efficient and powerful in selecting the important signals by integrating anatomic and gene grouping structures in the brain and the genome, respectively. It avoids resorting to a large number of multiple comparisons while effectively controlling the false discoveries. Validity of the proposed approach is demonstrated by both theoretical investigation and numerical simulations. We apply our proposed method to a brain‐GWAS using Alzheimer's Disease Neuroimaging Initiative positron emission tomography (ADNI PET) imaging and genomic data. We confirm previously reported association signals and also uncover several novel SNPs and genes that are either associated with brain glucose metabolism or have their association significantly modified by Alzheimer's disease status.  相似文献   

6.
In clinical trials, missing data commonly arise through nonadherence to the randomized treatment or to study procedure. For trials in which recurrent event endpoints are of interests, conventional analyses using the proportional intensity model or the count model assume that the data are missing at random, which cannot be tested using the observed data alone. Thus, sensitivity analyses are recommended. We implement the control‐based multiple imputation as sensitivity analyses for the recurrent event data. We model the recurrent event using a piecewise exponential proportional intensity model with frailty and sample the parameters from the posterior distribution. We impute the number of events after dropped out and correct the variance estimation using a bootstrap procedure. We apply the method to an application of sitagliptin study.  相似文献   

7.
The posterior predictive p value (ppp) was invented as a Bayesian counterpart to classical p values. The methodology can be applied to discrepancy measures involving both data and parameters and can, hence, be targeted to check for various modeling assumptions. The interpretation can, however, be difficult since the distribution of the ppp value under modeling assumptions varies substantially between cases. A calibration procedure has been suggested, treating the ppp value as a test statistic in a prior predictive test. In this paper, we suggest that a prior predictive test may instead be based on the expected posterior discrepancy, which is somewhat simpler, both conceptually and computationally. Since both these methods require the simulation of a large posterior parameter sample for each of an equally large prior predictive data sample, we furthermore suggest to look for ways to match the given discrepancy by a computation‐saving conflict measure. This approach is also based on simulations but only requires sampling from two different distributions representing two contrasting information sources about a model parameter. The conflict measure methodology is also more flexible in that it handles non‐informative priors without difficulty. We compare the different approaches theoretically in some simple models and in a more complex applied example.  相似文献   

8.
Clustering algorithms are used in the analysis of gene expression data to identify groups of genes with similar expression patterns. These algorithms group genes with respect to a predefined dissimilarity measure without using any prior classification of the data. Most of the clustering algorithms require the number of clusters as input, and all the objects in the dataset are usually assigned to one of the clusters. We propose a clustering algorithm that finds clusters sequentially, and allows for sporadic objects, so there are objects that are not assigned to any cluster. The proposed sequential clustering algorithm has two steps. First it finds candidates for centers of clusters. Multiple candidates are used to make the search for clusters more efficient. Secondly, it conducts a local search around the candidate centers to find the set of objects that defines a cluster. The candidate clusters are compared using a predefined score, the best cluster is removed from data, and the procedure is repeated. We investigate the performance of this algorithm using simulated data and we apply this method to analyze gene expression profiles in a study on the plasticity of the dendritic cells.  相似文献   

9.
The Lasso has sparked interest in the use of penalization of the log‐likelihood for variable selection, as well as for shrinkage. We are particularly interested in the more‐variables‐than‐observations case of characteristic importance for modern data. The Bayesian interpretation of the Lasso as the maximum a posteriori estimate of the regression coefficients, which have been given independent, double exponential prior distributions, is adopted. Generalizing this prior provides a family of hyper‐Lasso penalty functions, which includes the quasi‐Cauchy distribution of Johnstone and Silverman as a special case. The properties of this approach, including the oracle property, are explored, and an EM algorithm for inference in regression problems is described. The posterior is multi‐modal, and we suggest a strategy of using a set of perfectly fitting random starting values to explore modes in different regions of the parameter space. Simulations show that our procedure provides significant improvements on a range of established procedures, and we provide an example from chemometrics.  相似文献   

10.
In RNA-Seq experiments, the number of mapped reads for a given gene is proportional to its expression level and length. Because longer genes contribute more sequencible fragments than do shorter ones, it is expected that even if two genes have the same expression level, the longer gene will have a greater number of total reads. This characteristic creates a length bias such that the proportion of significant genes increases with the gene length. However, genes with a long length are not more biologically meaningful than genes with a short length. Therefore, the length bias should be properly corrected to determine the accurate list of significant genes in RNA-Seq. For this purpose, we proposed two multiple-testing procedures based on a weighted-FDR and a separate-FDR approach. These two methods use prior information on differential gene length while keeping the false-discovery rate (FDR) controlled at α. In the weighted-FDR controlling procedure, we incorporated prior weights for the length of each gene. These weights increase the power when the gene’s length is short and decrease the power when its length is long. In the separate-FDR controlling procedure, we sequentially ordered all genes according to their lengths and then split these genes into two subgroups of short and long genes. The adaptive Benjamini–Hochberg procedure was then performed separately for each subgroup. The proposed procedures were compared with existing methods and evaluated in two numerical examples and one simulation study. We concluded that the weighted p-value procedure properly reduced the length bias of RNA-Seq.  相似文献   

11.
The Quermass‐interaction model allows to generalize the classical germ‐grain Boolean model in adding a morphological interaction between the grains. It enables to model random structures with specific morphologies, which are unlikely to be generated from a Boolean model. The Quermass‐interaction model depends in particular on an intensity parameter, which is impossible to estimate from classical likelihood or pseudo‐likelihood approaches because the number of points is not observable from a germ‐grain set. In this paper, we present a procedure based on the Takacs–Fiksel method, which is able to estimate all parameters of the Quermass‐interaction model, including the intensity. An intensive simulation study is conducted to assess the efficiency of the procedure and to provide practical recommendations. It also illustrates that the estimation of the intensity parameter is crucial in order to identify the model. The Quermass‐interaction model is finally fitted by our method to P. Diggle's heather data set.  相似文献   

12.
Time‐to‐event data have been extensively studied in many areas. Although multiple time scales are often observed, commonly used methods are based on a single time scale. Analysing time‐to‐event data on two time scales can offer a more extensive insight into the phenomenon. We introduce a non‐parametric Bayesian intensity model to analyse two‐dimensional point process on Lexis diagrams. After a simple discretization of the two‐dimensional process, we model the intensity by a one‐dimensional piecewise constant hazard functions parametrized by the change points and corresponding hazard levels. Our prior distribution incorporates a built‐in smoothing feature in two dimensions. We implement posterior simulation using the reversible jump Metropolis–Hastings algorithm and demonstrate the applicability of the method using both simulated and empirical survival data. Our approach outperforms commonly applied models by borrowing strength in two dimensions.  相似文献   

13.
Case‐cohort design has been demonstrated to be an economical and efficient approach in large cohort studies when the measurement of some covariates on all individuals is expensive. Various methods have been proposed for case‐cohort data when the dimension of covariates is smaller than sample size. However, limited work has been done for high‐dimensional case‐cohort data which are frequently collected in large epidemiological studies. In this paper, we propose a variable screening method for ultrahigh‐dimensional case‐cohort data under the framework of proportional model, which allows the covariate dimension increases with sample size at exponential rate. Our procedure enjoys the sure screening property and the ranking consistency under some mild regularity conditions. We further extend this method to an iterative version to handle the scenarios where some covariates are jointly important but are marginally unrelated or weakly correlated to the response. The finite sample performance of the proposed procedure is evaluated via both simulation studies and an application to a real data from the breast cancer study.  相似文献   

14.
In this paper, we present an innovative method for constructing proper priors for the skewness (shape) parameter in the skew‐symmetric family of distributions. The proposed method is based on assigning a prior distribution on the perturbation effect of the shape parameter, which is quantified in terms of the total variation distance. We discuss strategies to translate prior beliefs about the asymmetry of the data into an informative prior distribution of this class. We show via a Monte Carlo simulation study that our non‐informative priors induce posterior distributions with good frequentist properties, similar to those of the Jeffreys prior. Our informative priors yield better results than their competitors from the literature. We also propose a scale‐invariant and location‐invariant prior structure for models with unknown location and scale parameters and provide sufficient conditions for the propriety of the corresponding posterior distribution. Illustrative examples are presented using simulated and real data.  相似文献   

15.
Subgroup detection has received increasing attention recently in different fields such as clinical trials, public management and market segmentation analysis. In these fields, people often face time‐to‐event data, which are commonly subject to right censoring. This paper proposes a semiparametric Logistic‐Cox mixture model for subgroup analysis when the interested outcome is event time with right censoring. The proposed method mainly consists of a likelihood ratio‐based testing procedure for testing the existence of subgroups. The expectation–maximization iteration is applied to improve the testing power, and a model‐based bootstrap approach is developed to implement the testing procedure. When there exist subgroups, one can also use the proposed model to estimate the subgroup effect and construct predictive scores for the subgroup membership. The large sample properties of the proposed method are studied. The finite sample performance of the proposed method is assessed by simulation studies. A real data example is also provided for illustration.  相似文献   

16.
The authors propose a procedure for determining the unknown number of components in mixtures by generalizing a Bayesian testing method proposed by Mengersen & Robert (1996). The testing criterion they propose involves a Kullback‐Leibler distance, which may be weighted or not. They give explicit formulas for the weighted distance for a number of mixture distributions and propose a stepwise testing procedure to select the minimum number of components adequate for the data. Their procedure, which is implemented using the BUGS software, exploits a fast collapsing approach which accelerates the search for the minimum number of components by avoiding full refitting at each step. The performance of their method is compared, using both distances, to the Bayes factor approach.  相似文献   

17.
Abstract. We study the Bayesian solution of a linear inverse problem in a separable Hilbert space setting with Gaussian prior and noise distribution. Our contribution is to propose a new Bayes estimator which is a linear and continuous estimator on the whole space and is stronger than the mean of the exact Gaussian posterior distribution which is only defined as a measurable linear transformation. Our estimator is the mean of a slightly modified posterior distribution called regularized posterior distribution. Frequentist consistency of our estimator and of the regularized posterior distribution is proved. A Monte Carlo study and an application to real data confirm good small‐sample properties of our procedure.  相似文献   

18.
Abstract. The conditional score approach is proposed to the analysis of errors‐in‐variable current status data under the proportional odds model. Distinct from the conditional scores in other applications, the proposed conditional score involves a high‐dimensional nuisance parameter, causing challenges in both asymptotic theory and computation. We propose a composite algorithm combining the Newton–Raphson and self‐consistency algorithms for computation and develop an efficient conditional score, analogous to the efficient score from a typical semiparametric likelihood, for building an asymptotic linear expression and hence the asymptotic distribution of the conditional‐score estimator for the regression parameter. Our proposal is shown to perform well in simulation studies and is applied to a zebrafish basal cell carcinoma data involving measurement errors in gene expression levels.  相似文献   

19.
The authors examine several aspects of cross‐validation for Bayesian models. In particular, they propose a computational scheme which does not require a separate posterior sample for each training sample.  相似文献   

20.
It is well known that many industrial experiments have split‐plot structures. Compared to completely randomised experiments, split‐plot designs are more economical and thus have received much attention among researchers. Much work has been done for two‐level split‐plot designs. In this article, we consider split‐plot designs with factors of three, more than three, or mixed levels and with both qualitative and quantitative factors. We show that if two designs with both qualitative and quantitative factors are geometrically isomorphic, then their generalised wordlength patterns are identical. Three design scenarios are considered for optimal designs. The corresponding wordlength patterns are defined and the minimum aberration mixed‐level split‐plot designs having 18 and 36 runs are tabulated.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号