首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A Bayesian mixture model for differential gene expression   总被引:3,自引:0,他引:3  
Summary.  We propose model-based inference for differential gene expression, using a nonparametric Bayesian probability model for the distribution of gene intensities under various conditions. The probability model is a mixture of normal distributions. The resulting inference is similar to a popular empirical Bayes approach that is used for the same inference problem. The use of fully model-based inference mitigates some of the necessary limitations of the empirical Bayes method. We argue that inference is no more difficult than posterior simulation in traditional nonparametric mixture-of-normal models. The approach proposed is motivated by a microarray experiment that was carried out to identify genes that are differentially expressed between normal tissue and colon cancer tissue samples. Additionally, we carried out a small simulation study to verify the methods proposed. In the motivating case-studies we show how the nonparametric Bayes approach facilitates the evaluation of posterior expected false discovery rates. We also show how inference can proceed even in the absence of a null sample of known non-differentially expressed scores. This highlights the difference from alternative empirical Bayes approaches that are based on plug-in estimates.  相似文献   

2.
For the nonparametric estimation of multivariate finite mixture models with the conditional independence assumption, we propose a new formulation of the objective function in terms of penalised smoothed Kullback–Leibler distance. The nonlinearly smoothed majorisation-minimisation (NSMM) algorithm is derived from this perspective. An elegant representation of the NSMM algorithm is obtained using a novel projection-multiplication operator, a more precise monotonicity property of the algorithm is discovered, and the existence of a solution to the main optimisation problem is proved for the first time.  相似文献   

3.
It is well known that the log-likelihood function for samples coming from normal mixture distributions may present spurious maxima and singularities. For this reason here we reformulate some Hathaways results and we propose two constrained estimation procedures for multivariate normal mixture modelling according to the likelihood approach. Their perfomances are illustrated on the grounds of some numerical simulations based on the EM algorithm. A comparison between multivariate normal mixtures and the hot-deck approach in missing data imputation is also considered.Salvatore Ingrassia: S. Ingrassia carried out the research as part of the project Metodi Statistici e Reti Neuronali per lAnalisi di Dati Complessi (PRIN 2000, resp. G. Lunetta).  相似文献   

4.
A mixture of the MANOVA and GMANOVA models is presented. The expected value of the response matrix in this model is the sum of two matrix components. The first component represents the GMANOVA portion and the second component represents the MANOVA portion. Maximum likelihood estimators are derived for the parameters in this model, and goodness-of-fit tests are constructed for fuller models via the likelihood ration criterion. Finally, likelihood ration tests for general liinear hypotheses are developed and a numerical example is presented.  相似文献   

5.
A nonparametric mixture model specifies that observations arise from a mixture distribution, ∫ f(x, θ) dG(θ), where the mixing distribution G is completely unspecified. A number of algorithms have been developed to obtain unconstrained maximum-likelihood estimates of G, but none of these algorithms lead to estimates when functional constraints are present. In many cases, there is a natural interest in functional ?(G), such as the mean and variance, of the mixing distribution, and profile likelihoods and confidence intervals for ?(G) are desired. In this paper we develop a penalized generalization of the ISDM algorithm of Kalbfleisch and Lesperance (1992) that can be used to solve the problem of constrained estimation. We also discuss its use in various different applications. Convergence results and numerical examples are given for the generalized ISDM algorithm, and asymptotic results are developed for the likelihood-ratio test statistics in the multinomial case.  相似文献   

6.
Abstract

In this article, we propose a new penalized-likelihood method to conduct model selection for finite mixture of regression models. The penalties are imposed on mixing proportions and regression coefficients, and hence order selection of the mixture and the variable selection in each component can be simultaneously conducted. The consistency of order selection and the consistency of variable selection are investigated. A modified EM algorithm is proposed to maximize the penalized log-likelihood function. Numerical simulations are conducted to demonstrate the finite sample performance of the estimation procedure. The proposed methodology is further illustrated via real data analysis.  相似文献   

7.
We propose frailty regression models in mixture distributions and assume the distribution of frailty as gamma or positive stable or power variance function distribution. We consider Weibull mixture as an example. There are some interesting situations like survival times in genetic epidemiology, dental implants of patients and twin births (both monozygotic and dizygotic) where genetic behavior (which is unknown and random) of patients follows a known frailty distribution. These are the situations which motivate to study this particular model.  相似文献   

8.
Parsimonious Gaussian mixture models   总被引:3,自引:0,他引:3  
Parsimonious Gaussian mixture models are developed using a latent Gaussian model which is closely related to the factor analysis model. These models provide a unified modeling framework which includes the mixtures of probabilistic principal component analyzers and mixtures of factor of analyzers models as special cases. In particular, a class of eight parsimonious Gaussian mixture models which are based on the mixtures of factor analyzers model are introduced and the maximum likelihood estimates for the parameters in these models are found using an AECM algorithm. The class of models includes parsimonious models that have not previously been developed. These models are applied to the analysis of chemical and physical properties of Italian wines and the chemical properties of coffee; the models are shown to give excellent clustering performance.  相似文献   

9.
Summary.  We propose a mixture of binomial and beta–binomial distributions for estimating the size of closed populations. The new mixture model is applied to several real capture–recapture data sets and is shown to provide a convenient, objective framework for model selection. The new model is compared with three alternative models in a simulation study, and the results shed light on the general performance of models in this area. The new model provides a robust flexible analysis, which automatically deals with small capture probabilities.  相似文献   

10.
In a Bayesian analysis of finite mixture models, parameter estimation and clustering are sometimes less straightforward than might be expected. In particular, the common practice of estimating parameters by their posterior mean, and summarizing joint posterior distributions by marginal distributions, often leads to nonsensical answers. This is due to the so-called 'label switching' problem, which is caused by symmetry in the likelihood of the model parameters. A frequent response to this problem is to remove the symmetry by using artificial identifiability constraints. We demonstrate that this fails in general to solve the problem, and we describe an alternative class of approaches, relabelling algorithms , which arise from attempting to minimize the posterior expected loss under a class of loss functions. We describe in detail one particularly simple and general relabelling algorithm and illustrate its success in dealing with the label switching problem on two examples.  相似文献   

11.
A mixture model for random graphs   总被引:1,自引:0,他引:1  
The Erdös–Rényi model of a network is simple and possesses many explicit expressions for average and asymptotic properties, but it does not fit well to real-world networks. The vertices of those networks are often structured in unknown classes (functionally related proteins or social communities) with different connectivity properties. The stochastic block structures model was proposed for this purpose in the context of social sciences, using a Bayesian approach. We consider the same model in a frequentest statistical framework. We give the degree distribution and the clustering coefficient associated with this model, a variational method to estimate its parameters and a model selection criterion to select the number of classes. This estimation procedure allows us to deal with large networks containing thousands of vertices. The method is used to uncover the modular structure of a network of enzymatic reactions.  相似文献   

12.
Summary.  We establish asymptotic theory for both the maximum likelihood and the maximum modified likelihood estimators in mixture regression models. Moreover, under specific and reasonable conditions, we show that the optimal convergence rate of n −1/4 for estimating the mixing distribution is achievable for both the maximum likelihood and the maximum modified likelihood estimators. We also derive the asymptotic distributions of two log-likelihood ratio test statistics for testing homogeneity and we propose a resampling procedure for approximating the p -value. Simulation studies are conducted to investigate the empirical performance of the two test statistics. Finally, two real data sets are analysed to illustrate the application of our theoretical results.  相似文献   

13.
The World Health Organization (WHO) diagnostic criteria for diabetes mellitus were determined in part by evidence that in some populations the plasma glucose level 2 h after an oral glucose load is a mixture of two distinct distributions. We present a finite mixture model that allows the two component densities to be generalized linear models and the mixture probability to be a logistic regression model. The model allows us to estimate the prevalence of diabetes and sensitivity and specificity of the diagnostic criteria as a function of covariates and to estimate them in the absence of an external standard. Sensitivity is the probability that a test indicates disease conditionally on disease being present. Specificity is the probability that a test indicates no disease conditionally on no disease being present. We obtained maximum likelihood estimates via the EM algorithm and derived the standard errors from the information matrix and by the bootstrap. In the application to data from the diabetes in Egypt project a two-component mixture model fits well and the two components are interpreted as normal and diabetes. The means and variances are similar to results found in other populations. The minimum misclassification cutpoints decrease with age, are lower in urban areas and are higher in rural areas than the 200 mg dl-1 cutpoint recommended by the WHO. These differences are modest and our results generally support the WHO criterion. Our methods allow the direct inclusion of concomitant data whereas past analyses were based on partitioning the data.  相似文献   

14.
In the analysis of competing risks data, cumulative incidence function is a useful summary of the overall crude risk for a failure type of interest. Mixture regression modeling has served as a natural approach to performing covariate analysis based on this quantity. However, existing mixture regression methods with competing risks data either impose parametric assumptions on the conditional risks or require stringent censoring assumptions. In this article, we propose a new semiparametric regression approach for competing risks data under the usual conditional independent censoring mechanism. We establish the consistency and asymptotic normality of the resulting estimators. A simple resampling method is proposed to approximate the distribution of the estimated parameters and that of the predicted cumulative incidence functions. Simulation studies and an analysis of a breast cancer dataset demonstrate that our method performs well with realistic sample sizes and is appropriate for practical use.  相似文献   

15.
In this paper, we consider a unified approach to stochastic comparisons of random vectors corresponding to two general multivariate mixture models. These stochastic comparisons are made with respect to multivariate hazard rate, reversed hazard rate and likelihood ratio orders. As an application, results are presented for stochastic comparisons of generalized multivariate frailty models.  相似文献   

16.
17.
We propose a mixture integer-valued ARCH model for modeling integer-valued time series with overdispersion. The model consists of a mixture of K stationary or non-stationary integer-valued ARCH components. The advantages of the mixture model over the single-component model include the ability to handle multimodality and non-stationary components. The necessary and sufficient first- and second-order stationarity conditions, the necessary arbitrary-order stationarity conditions, and the autocorrelation function are derived. The estimation of parameters is done through an EM algorithm, and the model is selected by three information criterions, whose performances are studied via simulations. Finally, the model is applied to a real dataset.  相似文献   

18.
An empirical Bayes problem has an unknown prior to be estimated from data. The predictive recursion (PR) algorithm provides fast nonparametric estimation of mixing distributions and is ideally suited for empirical Bayes applications. This article presents a general notion of empirical Bayes asymptotic optimality, and it is shown that PR-based procedures satisfy this property under certain conditions. As an application, the problem of in-season prediction of baseball batting averages is considered. There the PR-based empirical Bayes rule performs well in terms of prediction error and ability to capture the distribution of the latent features.  相似文献   

19.
A multicollinearity diagnostic is discussed for parametric models fit to censored data. The models considered include the Weibull, exponential and lognormal models as well as the Cox proportional hazards model. This diagnostic is an extension of the diagnostic proposed by Belsley, Kuh, and Welsch (1980). The diagnostic is based on the condition indicies and variance proportions of the variance covariance matrix. Its use and properties are studied through a series of examples. The effect of centering variables included in model is also discussed.  相似文献   

20.
We investigate the use of a dynamic form of the EM algorithm to estimate proportions in finite mixtures of known distributions. We prove a consistency result for this algorithm, which employs only a single EM update for each new observation. Our aim is to demonstrate that the slow convergence rate of the EM algorithm in many applications is of little practical consequence in a situation when data is frequently being updated.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号