首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.
Summary.  The importance of incorporating existing biological knowledge, such as gene functional annotations in gene ontology, in analysing high throughput genomic and proteomic data is being increasingly recognized. In the context of detecting differential gene expression, however, the current practice of using gene annotations is limited primarily to validations. Here we take a direct approach to incorporating gene annotations into mixture models for analysis. First, in contrast with a standard mixture model assuming that each gene of the genome has the same distribution, we study stratified mixture models allowing genes with different annotations to have different distributions, such as prior probabilities. Second, rather than treating parameters in stratified mixture models independently, we propose a hierarchical model to take advantage of the hierarchical structure of most gene annotation systems, such as gene ontology. We consider a simplified implementation for the proof of concept. An application to a mouse microarray data set and a simulation study demonstrate the improvement of the two new approaches over the standard mixture model.  相似文献   

2.
ABSTRACT

There is a growing interest to get a fully MR based radiotherapy. The most important development needed is to obtain improved bone tissue estimation. The existing model-based methods perform poorly on bone tissues. This paper was aimed at obtaining improved bone tissue estimation. Skew-Gaussian mixture model and Gaussian mixture model were proposed to investigate CT image estimation from MR images by partitioning the data into two major tissue types. The performance of the proposed models was evaluated using the leave-one-out cross-validation method on real data. In comparison with the existing model-based approaches, the model-based partitioning approach outperformed in bone tissue estimation, especially in dense bone tissue estimation.  相似文献   

3.
Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

4.
It is well known that there exist multiple roots of the likelihood equations for finite normal mixture models. Selecting a consistent root for finite normal mixture models has long been a challenging problem. Simply using the root with the largest likelihood will not work because of the spurious roots. In addition, the likelihood of normal mixture models with unequal variance is unbounded and thus its maximum likelihood estimate (MLE) is not well defined. In this paper, we propose a simple root selection method for univariate normal mixture models by incorporating the idea of goodness of fit test. Our new method inherits both the consistency properties of distance estimators and the efficiency of the MLE. The new method is simple to use and its computation can be easily done using existing R packages for mixture models. In addition, the proposed root selection method is very general and can be also applied to other univariate mixture models. We demonstrate the effectiveness of the proposed method and compare it with some other existing methods through simulation studies and a real data application.  相似文献   

5.
In most of the existing specialized literature, monitoring regression models are a special case of profile monitoring. However, not every regression model always represents appropriately a profile data structure. This is clearly the case of the Weibull regression model (WRM) with common shape parameter γ. Even though it might be thought that existing methodologies (especially likelihood-ratio (LRT)-based methods) for monitoring generalized linear profiles can also be successfully applied to monitoring regression models with time-to-event response, it will be shown in this paper that those methodologies work fairly acceptable just for data structures with 1000 observations at least approximately. It was found out that some corrections, often referred to as Bartlett's adjustments, are needed to be implemented in order to improve the accuracy of using the asymptotic distributional properties of the LRT statistic for carrying out the monitoring of WRM with relatively small and moderate dimensions of the available datasets. Simulation studies suggest that the use of the aforementioned corrections make the resulting charts work quite acceptable when available data structures contain 30 observations at least. Detection abilities of the proposed schemes improve as dataset dimension increases.  相似文献   

6.
Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components.  相似文献   

7.
Abstract. Generalized autoregressive conditional heteroscedastic (GARCH) models have been widely used for analyzing financial time series with time‐varying volatilities. To overcome the defect of the Gaussian quasi‐maximum likelihood estimator (QMLE) when the innovations follow either heavy‐tailed or skewed distributions, Berkes & Horváth (Ann. Statist., 32, 633, 2004) and Lee & Lee (Scand. J. Statist. 36, 157, 2009) considered likelihood methods that use two‐sided exponential, Cauchy and normal mixture distributions. In this paper, we extend their methods for Box–Cox transformed threshold GARCH model by allowing distributions used in the construction of likelihood functions to include parameters and employing the estimated quasi‐likelihood estimators (QELE) to handle those parameters. We also demonstrate that the proposed QMLE and QELE are consistent and asymptotically normal under regularity conditions. Simulation results are provided for illustration.  相似文献   

8.
It is generally assumed that the likelihood ratio statistic for testing the null hypothesis that data arise from a homoscedastic normal mixture distribution versus the alternative hypothesis that data arise from a heteroscedastic normal mixture distribution has an asymptotic χ 2 reference distribution with degrees of freedom equal to the difference in the number of parameters being estimated under the alternative and null models under some regularity conditions. Simulations show that the χ 2 reference distribution will give a reasonable approximation for the likelihood ratio test only when the sample size is 2000 or more and the mixture components are well separated when the restrictions suggested by Hathaway (Ann. Stat. 13:795–800, 1985) are imposed on the component variances to ensure that the likelihood is bounded under the alternative distribution. For small and medium sample sizes, parametric bootstrap tests appear to work well for determining whether data arise from a normal mixture with equal variances or a normal mixture with unequal variances.  相似文献   

9.
Partially linear regression models are semiparametric models that contain both linear and nonlinear components. They are extensively used in many scientific fields for their flexibility and convenient interpretability. In such analyses, testing the significance of the regression coefficients in the linear component is typically a key focus. Under the high-dimensional setting, i.e., “large p, small n,” the conventional F-test strategy does not apply because the coefficients need to be estimated through regularization techniques. In this article, we develop a new test using a U-statistic of order two, relying on a pseudo-estimate of the nonlinear component from the classical kernel method. Using the martingale central limit theorem, we prove the asymptotic normality of the proposed test statistic under some regularity conditions. We further demonstrate our proposed test's finite-sample performance by simulation studies and by analyzing some breast cancer gene expression data.  相似文献   

10.
Large-scale simultaneous hypothesis testing appears in many areas. A well-known inference method is to control the false discovery rate. One popular approach is to model the z-scores derived from the individual t-tests and then use this model to control the false discovery rate. We propose a heteroscedastic contaminated normal mixture to describe the distribution of z-scores and design an EM-test for testing homogeneity in this class of mixture models. The proposed EM-test can be used to investigate whether a collection of z-scores has arisen from a single normal distribution or whether a heteroscedastic contaminated normal mixture is more appropriate. We show that the EM-test statistic has a shifted mixture of chi-squared limiting distribution. Simulation results show that the proposed testing procedure has accurate type-I error and significantly larger power than its competitors under a variety of model specifications. A real-data example is analysed to exemplify the application of the proposed method.  相似文献   

11.
In order to describe or generate so-called outliers in univariate statistical data, contamination models are often used. These models assume that k out of n independent random variables are shifted or multiplicated by some constant, whereas the other observations still come i.i.d. from some common target distribution. Of course, these contaminants do not necessarily stick out as the extremes in the sample. Moreover, it is the amount and magnitude of ‘contamination” which determines the number of obvious outliers. Using the concept of Davies and Gather (1993) to formalize the outlier notion we quantify the amount of contamination needed to produce a prespecified expected number of ‘genuine’ outliers. In particular, we demonstrate that for sample of moderate size from a normal target distribution a rather large shift of the contaminants is necessary to yield a certain expected number of outliers. Such an insight is of interest when designing simulation studies where outliers shoulod occur as well as in theoretical investigations on outliers.  相似文献   

12.
Consider data (x 1,y 1),...,(x n,y n), where each x i may be vector valued, and the distribution of y i given x i is a mixture of linear regressions. This provides a generalization of mixture models which do not include covariates in the mixture formulation. This mixture of linear regressions formulation has appeared in the computer science literature under the name Hierarchical Mixtures of Experts model.This model has been considered from both frequentist and Bayesian viewpoints. We focus on the Bayesian formulation. Previously, estimation of the mixture of linear regression model has been done through straightforward Gibbs sampling with latent variables. This paper contributes to this field in three major areas. First, we provide a theoretical underpinning to the Bayesian implementation by demonstrating consistency of the posterior distribution. This demonstration is done by extending results in Barron, Schervish and Wasserman (Annals of Statistics 27: 536–561, 1999) on bracketing entropy to the regression setting. Second, we demonstrate through examples that straightforward Gibbs sampling may fail to effectively explore the posterior distribution and provide alternative algorithms that are more accurate. Third, we demonstrate the usefulness of the mixture of linear regressions framework in Bayesian robust regression. The methods described in the paper are applied to two examples.  相似文献   

13.
The majority of the existing literature on model-based clustering deals with symmetric components. In some cases, especially when dealing with skewed subpopulations, the estimate of the number of groups can be misleading; if symmetric components are assumed we need more than one component to describe an asymmetric group. Existing mixture models, based on multivariate normal distributions and multivariate t distributions, try to fit symmetric distributions, i.e. they fit symmetric clusters. In the present paper, we propose the use of finite mixtures of the normal inverse Gaussian distribution (and its multivariate extensions). Such finite mixture models start from a density that allows for skewness and fat tails, generalize the existing models, are tractable and have desirable properties. We examine both the univariate case, to gain insight, and the multivariate case, which is more useful in real applications. EM type algorithms are described for fitting the models. Real data examples are used to demonstrate the potential of the new model in comparison with existing ones.  相似文献   

14.
In this paper, we study the robust estimation for the order of hidden Markov model (HMM) based on a penalized minimum density power divergence estimator, which is obtained by utilizing the finite mixture marginal distribution of HMM. For this task, we adopt the locally conic parametrization method used in [D. Dacunha-Castelle and E. Gassiate, Testing in locally conic models and application to mixture models. ESAIM Probab. Stat. (1997), pp. 285–317; D. Dacunha-Castelle and E. Gassiate, Testing the order of a model using locally conic parametrization: population mixtures and stationary arma processes, Ann. Statist. 27 (1999), pp. 1178–1209; T. Lee and S. Lee, Robust and consistent estimation of the order of finite mixture models based on the minimizing a density power divergence estimator, Metrika 68 (2008), pp. 365–390] to avoid the difficulties that arise in handling mixture marginal models, such as the non-identifiability of the parameter space and the singularity problem with the asymptotic variance. We verify that the estimated order is consistent and simulation results are provided for illustration.  相似文献   

15.
We present an algorithm for multivariate robust Bayesian linear regression with missing data. The iterative algorithm computes an approximative posterior for the model parameters based on the variational Bayes (VB) method. Compared to the EM algorithm, the VB method has the advantage that the variance for the model parameters is also computed directly by the algorithm. We consider three families of Gaussian scale mixture models for the measurements, which include as special cases the multivariate t distribution, the multivariate Laplace distribution, and the contaminated normal model. The observations can contain missing values, assuming that the missing data mechanism can be ignored. A Matlab/Octave implementation of the algorithm is presented and applied to solve three reference examples from the literature.  相似文献   

16.
A nonparametric test for detecting changing conditional variances in stationary AR(p) time series is proposed in this paper. For AR(1) models, the test statistic is a Kolmogorov-Smirnov type statistic and the asymptotic theory is developed under both the null and the alternative hypotheses. For AR(p) models (p ≥ 2), an approximate test procedure is proposed. The empirical upper percentage points for our test are tabulated for both p = 1 and p = 2 cases and a bootstrap procedure is suggested for the p ≥ 3 case. Monte Carlo simulations demonstrate that the test has very good powers for finite samples under both normal and non-normal errors.  相似文献   

17.
Summary.  As biological knowledge accumulates rapidly, gene networks encoding genomewide gene–gene interactions have been constructed. As an improvement over the standard mixture model that tests all the genes identically and independently distributed a priori , Wei and co-workers have proposed modelling a gene network as a discrete or Gaussian Markov random field (MRF) in a mixture model to analyse genomic data. However, how these methods compare in practical applications is not well understood and this is the aim here. We also propose two novel constraints in prior specifications for the Gaussian MRF model and a fully Bayesian approach to the discrete MRF model. We assess the accuracy of estimating the false discovery rate by posterior probabilities in the context of MRF models. Applications to a chromatin immuno-precipitation–chip data set and simulated data show that the modified Gaussian MRF models have superior performance compared with other models, and both MRF-based mixture models, with reasonable robustness to misspecified gene networks, outperform the standard mixture model.  相似文献   

18.
Summary. The paper develops mixture models for spatially indexed data. We confine attention to the case of finite, typically irregular, patterns of points or regions with prescribed spatial relationships, and to problems where it is only the weights in the mixture that vary from one location to another. Our specific focus is on Poisson-distributed data, and applications in disease mapping. We work in a Bayesian framework, with the Poisson parameters drawn from gamma priors, and an unknown number of components. We propose two alternative models for spatially dependent weights, based on transformations of autoregressive Gaussian processes: in one (the logistic normal model), the mixture component labels are exchangeable; in the other (the grouped continuous model), they are ordered. Reversible jump Markov chain Monte Carlo algorithms for posterior inference are developed. Finally, the performances of both of these formulations are examined on synthetic data and real data on mortality from a rare disease.  相似文献   

19.
Extending previous work on hedge fund return predictability, this paper introduces the idea of modelling the conditional distribution of hedge fund returns using Student's t full-factor multivariate GARCH models. This class of models takes into account the stylized facts of hedge fund return series, that is, heteroskedasticity, fat tails and deviations from normality. For the proposed class of multivariate predictive regression models, we derive analytic expressions for the score and the Hessian matrix, which can be used within classical and Bayesian inferential procedures to estimate the model parameters, as well as to compare different predictive regression models. We propose a Bayesian approach to model comparison which provides posterior probabilities for various predictive models that can be used for model averaging. Our empirical application indicates that accounting for fat tails and time-varying covariances/correlations provides a more appropriate modelling approach of the underlying dynamics of financial series and improves our ability to predict hedge fund returns.  相似文献   

20.
Estimators derived from the expectation‐maximization (EM) algorithm are not robust since they are based on the maximization of the likelihood function. We propose an iterative proximal‐point algorithm based on the EM algorithm to minimize a divergence criterion between a mixture model and the unknown distribution that generates the data. The algorithm estimates in each iteration the proportions and the parameters of the mixture components in two separate steps. Resulting estimators are generally robust against outliers and misspecification of the model. Convergence properties of our algorithm are studied. The convergence of the introduced algorithm is discussed on a two‐component Weibull mixture entailing a condition on the initialization of the EM algorithm in order for the latter to converge. Simulations on Gaussian and Weibull mixture models using different statistical divergences are provided to confirm the validity of our work and the robustness of the resulting estimators against outliers in comparison to the EM algorithm. An application to a dataset of velocities of galaxies is also presented. The Canadian Journal of Statistics 47: 392–408; 2019 © 2019 Statistical Society of Canada  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号