首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
One of the fundamental issues in analyzing microarray data is to determine which genes are expressed and which ones are not for a given group of subjects. In datasets where many genes are expressed and many are not expressed (i.e., underexpressed), a bimodal distribution for the gene expression levels often results, where one mode of the distribution represents the expressed genes and the other mode represents the underexpressed genes. To model this bimodality, we propose a new class of mixture models that utilize a random threshold value for accommodating bimodality in the gene expression distribution. Theoretical properties of the proposed model are carefully examined. We use this new model to examine the problem of differential gene expression between two groups of subjects, develop prior distributions, and derive a new criterion for determining which genes are differentially expressed between the two groups. Prior elicitation is carried out using empirical Bayes methodology in order to estimate the threshold value as well as elicit the hyperparameters for the two component mixture model. The new gene selection criterion is demonstrated via several simulations to have excellent false positive rate and false negative rate properties. A gastric cancer dataset is used to motivate and illustrate the proposed methodology.  相似文献   

2.
Over the years, many papers used parametric distributions to model crop yields, such as: normal (N), Beta, Log-normal and the Skew-normal (SN). These models are well-defined, mathematically and also computationally, but its do not incorporate bimodality. Therefore, it is necessary to study distributions which are more flexible in modeling, since most of crop yield data in Brazil presents evidence of asymmetry or bimodality. Thus, the aim of this study was to model and forecast soybean yields for municipalities in the State of Paran, in the period from 1980 to 2014, using the Odd log normal logistic (OLLN) distribution for the bimodal data and the Beta, SN and Skew-t distributions for the symmetrical and asymmetrical series. The OLLN model was the one which best fit the data. The results were discussed in the context of crop insurance pricing.  相似文献   

3.
In this paper, we propose a new semiparametric heteroscedastic regression model allowing for positive and negative skewness and bimodal shapes using the B-spline basis for nonlinear effects. The proposed distribution is based on the generalized additive models for location, scale and shape framework in order to model any or all parameters of the distribution using parametric linear and/or nonparametric smooth functions of explanatory variables. We motivate the new model by means of Monte Carlo simulations, thus ignoring the skewness and bimodality of the random errors in semiparametric regression models, which may introduce biases on the parameter estimates and/or on the estimation of the associated variability measures. An iterative estimation process and some diagnostic methods are investigated. Applications to two real data sets are presented and the method is compared to the usual regression methods.  相似文献   

4.
We extend the standard multivariate mixed model by incorporating a smooth time effect and relaxing distributional assumptions. We propose a semiparametric Bayesian approach to multivariate longitudinal data using a mixture of Polya trees prior distribution. Usually, the distribution of random effects in a longitudinal data model is assumed to be Gaussian. However, the normality assumption may be suspect, particularly if the estimated longitudinal trajectory parameters exhibit multimodality and skewness. In this paper we propose a mixture of Polya trees prior density to address the limitations of the parametric random effects distribution. We illustrate the methodology by analyzing data from a recent HIV-AIDS study.  相似文献   

5.
This article proposes nonparametric Bayesian approaches to monotone function estimation. This approach uses a hierarchical Bayes framework and a characterization of stick-breaking process that allows unconstrained estimation of the monotone function. In order to avoid the limitation of parametric modeling, a general class of prior distributions, called stick-breaking priors, is considered. It accommodates much more flexible forms and can easily deal with skewness, multimodality, etc., of the dependent variable response. The proposed approach is incorporated to model the catch ratio based on automatic weather station (AWS) data.  相似文献   

6.
Abstract. This article combines the best of both objective and subjective Bayesian inference in specifying priors for inequality and equality constrained analysis of variance models. Objectivity can be found in the use of training data to specify a prior distribution, subjectivity can be found in restrictions on the prior to formulate models. The aim of this article is to find the best model in a set of models specified using inequality and equality constraints on the model parameters. For the evaluation of the models an encompassing prior approach is used. The advantage of this approach is that only a prior for the unconstrained encompassing model needs to be specified. The priors for all constrained models can be derived from this encompassing prior. Different choices for this encompassing prior will be considered and evaluated.  相似文献   

7.
Posterior distributions for mixture models often have multiple modes, particularly near the boundaries of the parameter space where the component variances are small. This multimodality results in predictive densities that are extremely rough. The authors propose an adjustment of the standard normal‐inverse‐gamma prior structure that directly controls the ratio of the largest component variance to the smallest component variance. The prior adjustment smooths out modes near the boundary of the parameter space, producing more reasonable estimates of the predictive density.  相似文献   

8.
This paper focuses on the development of a new extension of the generalized skew-normal distribution introduced in Gómez et al. [Generalized skew-normal models: properties and inference. Statistics. 2006;40(6):495–505]. To produce the generalization a new parameter is introduced, the signal of which has the flexibility of yielding unimodal as well as bimodal distributions. We study its properties, derive a stochastic representation and state some expressions that facilitate moments derivation. Maximum likelihood is implemented via the EM algorithm which is based on the stochastic representation derived. We show that the Fisher information matrix is singular and discuss ways of getting round this problem. An illustration using real data reveals that the model can capture well special data features such as bimodality and asymmetry.  相似文献   

9.
ABSTRACT

Recently, the Bayesian nonparametric approaches in survival studies attract much more attentions. Because of multimodality in survival data, the mixture models are very common. We introduce a Bayesian nonparametric mixture model with Burr distribution (Burr type XII) as the kernel. Since the Burr distribution shares good properties of common distributions on survival analysis, it has more flexibility than other distributions. By applying this model to simulated and real failure time datasets, we show the preference of this model and compare it with Dirichlet process mixture models with different kernels. The Markov chain Monte Carlo (MCMC) simulation methods to calculate the posterior distribution are used.  相似文献   

10.
Summary.  Advances in understanding the biological underpinnings of many cancers have led increasingly to the use of molecularly targeted anticancer therapies. Because the platelet-derived growth factor receptor (PDGFR) has been implicated in the progression of prostate cancer bone metastases, it is of great interest to examine possible relationships between PDGFR inhibition and therapeutic outcomes. We analyse the association between change in activated PDGFR (phosphorylated PDGFR) and progression-free survival time based on large within-patient samples of cell-specific phosphorylated PDGFR values taken before and after treatment from each of 88 prostate cancer patients. To utilize these paired samples as covariate data in a regression model for progression-free survival time, and be cause the phosphorylated PDGFR distributions are bimodal, we first employ a Bayesian hierarchical mixture model to obtain a deconvolution of the pretreatment and post-treatment within-patient phosphorylated PDGFR distributions. We evaluate fits of the mixture model and a non-mixture model that ignores the bimodality by using a supnorm metric to compare the empirical distribution of each phosphorylated PDGFR data set with the corresponding fitted distribution under each model. Our results show that first using the mixture model to account for the bimodality of the within-patient phosphorylated PDGFR distributions, and then using the posterior within-patient component mean changes in phosphorylated PDGFR so obtained as covariates in the regression model for progression-free survival time, provides an improved estimation.  相似文献   

11.
We propose zero-inflated statistical models based on the generalized Hermite distribution for simultaneously modelling of excess zeros, over/underdispersion, and multimodality. These new models are parsimonious yet remarkably flexible allowing the covariates to be introduced directly through the mean, dispersion, and zero-inflated parameters. To accommodate the interval inequality constraint for the dispersion parameter, we present a new link function for the covariate-dependent dispersion regression model. We derive score tests for zero inflation in both covariate-free and covariate-dependent models. Both the score test and the likelihood-ratio test are conducted to examine the validity of zero inflation. The score test provides a useful tool when computing the likelihood-ratio statistic proves to be difficult. We analyse several hotel booking cancellation datasets extracted from two recently published real datasets from a resort hotel and a city hotel. These extracted cancellation datasets reveal complex features of excess zeros, over/underdispersion, and multimodality simultaneously making them difficult to analyse with existing approaches. The application of the proposed methods to the cancellation datasets illustrates the usefulness and flexibility of the models.  相似文献   

12.
Categorical data frequently arise in applications in the Social Sciences. In such applications, the class of log-linear models, based on either a Poisson or (product) multinomial response distribution, is a flexible model class for inference and prediction. In this paper we consider the Bayesian analysis of both Poisson and multinomial log-linear models. It is often convenient to model multinomial or product multinomial data as observations of independent Poisson variables. For multinomial data, Lindley (1964) [20] showed that this approach leads to valid Bayesian posterior inferences when the prior density for the Poisson cell means factorises in a particular way. We develop this result to provide a general framework for the analysis of multinomial or product multinomial data using a Poisson log-linear model. Valid finite population inferences are also available, which can be particularly important in modelling social data. We then focus particular attention on multivariate normal prior distributions for the log-linear model parameters. Here, an improper prior distribution for certain Poisson model parameters is required for valid multinomial analysis, and we derive conditions under which the resulting posterior distribution is proper. We also consider the construction of prior distributions across models, and for model parameters, when uncertainty exists about the appropriate form of the model. We present classes of Poisson and multinomial models, invariant under certain natural groups of permutations of the cells. We demonstrate that, if prior belief concerning the model parameters is also invariant, as is the case in a ‘reference’ analysis, then the choice of prior distribution is considerably restricted. The analysis of multivariate categorical data in the form of a contingency table is considered in detail. We illustrate the methods with two examples.  相似文献   

13.
ABSTRACT

This article discusses two asymmetrization methods, Azzalini's representation and beta generation, to generate asymmetric bimodal models including two novel beta-generated models. The practical utility of these models is assessed with nine data sets from different fields of applied sciences. Besides this tutorial assessment, some methodological contributions are made: a random number generator for the asymmetric Rathie–Swamee model is developed (generators for the other models are already known and briefly described) and a new likelihood ratio test of unimodality is compared via simulations with other available tests. Several tools have been used to quantify and test for bimodality and assess goodness of fit including Bayesian information criterion, measures of agreement with the empirical distribution and the Kolmogorov–Smirnoff test. In the nine case studies, the results favoured models derived from Azzalini's asymmetrization, but no single model provided a best fit across the applications considered. In only two cases the normal mixture was selected as best model. Parameter estimation has been done by likelihood maximization. Numerical optimization must be performed with care since local optima are often present. We concluded that the models considered are flexible enough to fit different bimodal shapes and that the tools studied should be used with care and attention to detail.  相似文献   

14.
In this paper, we discuss the class of generalized Birnbaum–Saunders distributions, which is a very flexible family suitable for modeling lifetime data as it allows for different degrees of kurtosis and asymmetry and unimodality as well as bimodality. We describe the theoretical developments on this model including properties, transformations and related distributions, lifetime analysis, and shape analysis. We also discuss methods of inference based on uncensored and censored data, diagnostics methods, goodness-of-fit tests, and random number generation algorithms for the generalized Birnbaum–Saunders model. Finally, we present some illustrative examples and show that this distribution fits the data better than the classical Birnbaum–Saunders model.  相似文献   

15.
In this paper, we present growth curve models with an auxiliary variable which contains an uncertain data distribution based on mixtures of standard components, such as normal distributions. The multimodality of the auxiliary random variable motivates and necessitates the use of mixtures of normal distributions in our model. We have observed that Dirichlet process priors, composed of discrete and continuous components, are appropriate in addressing the two problems of determining the number of components and estimating the parameters simultaneously and are especially useful in the aforementioned multimodal scenario. A model for the application of Dirichlet mixture of normals (DMN) in growth curve models under Bayesian formulation is presented and algorithms for computing the number of components, as well as estimating the parameters are also rendered. The simulation results show that our model gives improved goodness of fit statistics over models without DMN and the estimates for the number of components and for parameters are reasonably accurate.  相似文献   

16.
We address the task of choosing prior weights for models that are to be used for weighted model averaging. Models that are very similar should usually be given smaller weights than models that are quite distinct. Otherwise, the importance of a model in the weighted average could be increased by augmenting the set of models with duplicates of the model or virtual duplicates of it. Similarly, the importance of a particular model feature (a certain covariate, say) could be exaggerated by including many models with that feature. Ways of forming a correlation matrix that reflects the similarity between models are suggested. Then, weighting schemes are proposed that assign prior weights to models on the basis of this matrix. The weighting schemes give smaller weights to models that are more highly correlated. Other desirable properties of a weighting scheme are identified, and we examine the extent to which these properties are held by the proposed methods. The weighting schemes are applied to real data, and prior weights, posterior weights and Bayesian model averages are determined. For these data, empirical Bayes methods were used to form the correlation matrices that yield the prior weights. Predictive variances are examined, as empirical Bayes methods can result in unrealistically small variances.  相似文献   

17.
A Bayes factor between two models can be greatly affected by the prior distributions on the model parameters. When prior information is weak, very dispersed proper prior distributions are known to create a problem for the Bayes factor when competing models differ in dimension, and it is of even greater concern when one of the models is of infinite dimension. Therefore, we propose an innovative method which uses training samples to calibrate the prior distributions so that they achieve a reasonable level of ‘information’. Then the calibrated Bayes factor can be computed over the remaining data. This method makes no assumption on model forms (parametric or nonparametric) and can be used with both proper and improper priors. We illustrate, through simulation studies and a real data example, that the calibrated Bayes factor yields robust and reliable model preferences under various situations.  相似文献   

18.
Due to computational challenges and non-availability of conjugate prior distributions, Bayesian variable selection in quantile regression models is often a difficult task. In this paper, we address these two issues for quantile regression models. In particular, we develop an informative stochastic search variable selection (ISSVS) for quantile regression models that introduces an informative prior distribution. We adopt prior structures which incorporate historical data into the current data by quantifying them with a suitable prior distribution on the model parameters. This allows ISSVS to search more efficiently in the model space and choose the more likely models. In addition, a Gibbs sampler is derived to facilitate the computation of the posterior probabilities. A major advantage of ISSVS is that it avoids instability in the posterior estimates for the Gibbs sampler as well as convergence problems that may arise from choosing vague priors. Finally, the proposed methods are illustrated with both simulation and real data.  相似文献   

19.
We present a new class of models to fit longitudinal data, obtained with a suitable modification of the classical linear mixed-effects model. For each sample unit, the joint distribution of the random effect and the random error is a finite mixture of scale mixtures of multivariate skew-normal distributions. This extension allows us to model the data in a more flexible way, taking into account skewness, multimodality and discrepant observations at the same time. The scale mixtures of skew-normal form an attractive class of asymmetric heavy-tailed distributions that includes the skew-normal, skew-Student-t, skew-slash and the skew-contaminated normal distributions as special cases, being a flexible alternative to the use of the corresponding symmetric distributions in this type of models. A simple efficient MCMC Gibbs-type algorithm for posterior Bayesian inference is employed. In order to illustrate the usefulness of the proposed methodology, two artificial and two real data sets are analyzed.  相似文献   

20.
Bayesian selection of variables is often difficult to carry out because of the challenge in specifying prior distributions for the regression parameters for all possible models, specifying a prior distribution on the model space and computations. We address these three issues for the logistic regression model. For the first, we propose an informative prior distribution for variable selection. Several theoretical and computational properties of the prior are derived and illustrated with several examples. For the second, we propose a method for specifying an informative prior on the model space, and for the third we propose novel methods for computing the marginal distribution of the data. The new computational algorithms only require Gibbs samples from the full model to facilitate the computation of the prior and posterior model probabilities for all possible models. Several properties of the algorithms are also derived. The prior specification for the first challenge focuses on the observables in that the elicitation is based on a prior prediction y 0 for the response vector and a quantity a 0 quantifying the uncertainty in y 0. Then, y 0 and a 0 are used to specify a prior for the regression coefficients semi-automatically. Examples using real data are given to demonstrate the methodology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号