首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
Abstract.  We consider a two-component mixture model where one component distribution is known while the mixing proportion and the other component distribution are unknown. These kinds of models were first introduced in biology to study the differences in expression between genes. The various estimation methods proposed till now have all assumed that the unknown distribution belongs to a parametric family. In this paper, we show how this assumption can be relaxed. First, we note that generally the above model is not identifiable, but we show that under moment and symmetry conditions some 'almost everywhere' identifiability results can be obtained. Where such identifiability conditions are fulfilled we propose an estimation method for the unknown parameters which is shown to be strongly consistent under mild conditions. We discuss applications of our method to microarray data analysis and to the training data problem. We compare our method to the parametric approach using simulated data and, finally, we apply our method to real data from microarray experiments.  相似文献   

2.
Markov Random Fields with Higher-order Interactions   总被引:5,自引:0,他引:5  
Discrete-state Markov random fields on regular arrays have played a significant role in spatial statistics and image analysis. For example, they are used to represent objects against background in computer vision and pixel-based classification of a region into different crop types in remote sensing. Convenience has generally favoured formulations that involve only pairwise interactions. Such models are in themselves unrealistic and, although they often perform surprisingly well in tasks such as the restoration of degraded images, they are unsatisfactory for many other purposes. In this paper, we consider particular forms of Markov random fields that involve higher-order interactions and therefore are better able to represent the large-scale properties of typical spatial scenes. Interpretations of the parameters are given and realizations from a variety of models are produced via Markov chain Monte Carlo. Potential applications are illustrated in two examples. The first concerns Bayesian image analysis and confirms that pairwise-interaction priors may perform very poorly for image functionals such as number of objects, even when restoration apparently works well. The second example describes a model for a geological dataset and obtains maximum-likelihood parameter estimates using Markov chain Monte Carlo. Despite the complexity of the formulation, realizations of the estimated model suggest that the representation is quite realistic.  相似文献   

3.
Hierarchical models enable the encoding of a variety of parametric structures. However, when presented with a large number of covariates upon which some component of a model hierarchy depends, the modeller may be unwilling or unable to specify a form for that dependence. Data-mining methods are designed to automatically discover relationships between many covariates and a response surface, easily accommodating non-linearities and higher-order interactions. We present a method of wrapping hierarchical models around data-mining methods, preserving the best qualities of the two paradigms. We fit the resulting semi-parametric models using an approximate Gibbs sampler called HEBBRU. Using a simulated dataset, we show that HEBBRU is useful for exploratory analysis and displays excellent predictive accuracy. Finally, we apply HEBBRU to an ornithological dataset drawn from the eBird database.  相似文献   

4.
We propose quantile regression (QR) in the Bayesian framework for a class of nonlinear mixed effects models with a known, parametric model form for longitudinal data. Estimation of the regression quantiles is based on a likelihood-based approach using the asymmetric Laplace density. Posterior computations are carried out via Gibbs sampling and the adaptive rejection Metropolis algorithm. To assess the performance of the Bayesian QR estimator, we compare it with the mean regression estimator using real and simulated data. Results show that the Bayesian QR estimator provides a fuller examination of the shape of the conditional distribution of the response variable. Our approach is proposed for parametric nonlinear mixed effects models, and therefore may not be generalized to models without a given model form.  相似文献   

5.
《随机性模型》2013,29(4):473-492
Abstract

In this paper, we show how the time for convergence to stationarity of a Markov chain can be assessed using the Wasserstein metric, rather than the usual choice of total variation distance. The Wasserstein metric may be more easily applied in some applications, particularly those on continuous state spaces. Bounds on convergence time are established by considering the number of iterations required to approximately couple two realizations of the Markov chain to within ε tolerance. The particular application considered is the use of the Gibbs sampler in the Bayesian restoration of a degraded image, with pixels that are a continuous grey-scale and with pixels that can only take two colours. On finite state spaces, a bound in the Wasserstein metric can be used to find a bound in total variation distance. We use this relationship to get a precise O(N log N) bound on the convergence time of the stochastic Ising model that holds for appropriate values of its parameter as well as other binary image models. Our method employing convergence in the Wasserstein metric can also be applied to perfect sampling algorithms involving coupling from the past to obtain estimates of their running times.  相似文献   

6.
In this article, we propose a parametric model for the distribution of time to first event when events are overdispersed and can be properly fitted by a Negative Binomial distribution. This is a very common situation in medical statistics, when the occurrence of events is summarized as a count for each patient and the simple Poisson model is not adequate to account for overdispersion of data. In this situation, studying the time of occurrence of the first event can be of interest. From the Negative Binomial distribution of counts, we derive a new parametric model for time to first event and apply it to fit the distribution of time to first relapse in multiple sclerosis (MS). We develop the regression model with methods for covariate estimation. We show that, as the Negative Binomial model properly fits relapse counts data, this new model matches quite perfectly the distribution of time to first relapse, as tested in two large datasets of MS patients. Finally we compare its performance, when fitting time to first relapse in MS, with other models widely used in survival analysis (the semiparametric Cox model and the parametric exponential, Weibull, log-logistic and log-normal models).  相似文献   

7.
Papers dealing with measures of predictive power in survival analysis have seen their independence of censoring, or their estimates being unbiased under censoring, as the most important property. We argue that this property has been wrongly understood. Discussing the so-called measure of information gain, we point out that we cannot have unbiased estimates if all values, greater than a given time τ, are censored. This is due to the fact that censoring before τ has a different effect than censoring after τ. Such τ is often introduced by design of a study. Independence can only be achieved under the assumption of the model being valid after τ, which is impossible to verify. But if one is willing to make such an assumption, we suggest using multiple imputation to obtain a consistent estimate. We further show that censoring has different effects on the estimation of the measure for the Cox model than for parametric models, and we discuss them separately. We also give some warnings about the usage of the measure, especially when it comes to comparing essentially different models.  相似文献   

8.
Parametric nonlinear mixed effects models (NLMEs) are now widely used in biometrical studies, especially in pharmacokinetics research and HIV dynamics models, due to, among other aspects, the computational advances achieved during the last years. However, this kind of models may not be flexible enough for complex longitudinal data analysis. Semiparametric NLMEs (SNMMs) have been proposed as an extension of NLMEs. These models are a good compromise and retain nice features of both parametric and nonparametric models resulting in more flexible models than standard parametric NLMEs. However, SNMMs are complex models for which estimation still remains a challenge. Previous estimation procedures are based on a combination of log-likelihood approximation methods for parametric estimation and smoothing splines techniques for nonparametric estimation. In this work, we propose new estimation strategies in SNMMs. On the one hand, we use the Stochastic Approximation version of EM algorithm (SAEM) to obtain exact ML and REML estimates of the fixed effects and variance components. On the other hand, we propose a LASSO-type method to estimate the unknown nonlinear function. We derive oracle inequalities for this nonparametric estimator. We combine the two approaches in a general estimation procedure that we illustrate with simulations and through the analysis of a real data set of price evolution in on-line auctions.  相似文献   

9.
Nowadays, there is an increasing interest in multi-point models and their applications in Earth sciences. However, users not only ask for multi-point methods able to capture the uncertainties of complex structures and to reproduce the properties of a training image, but also they need quantitative tools for assessing whether a set of realizations have the properties required. Moreover, it is crucial to study the sensitivity of the realizations to the size of the data template and to analyze how fast realization-based statistics converge on average toward training-based statistics. In this paper, some similarity measures and convergence indexes, based on some physically measurable quantities and cumulants of high-order, are presented. In the case study, multi-point simulations of the spatial distribution of coarse-grained limestone and calcareous rock, generated by using three templates of different sizes, are compared and convergence toward training-based statistics is analyzed by taking into account increasing numbers of realizations.  相似文献   

10.
In this paper, we extend the focused information criterion (FIC) to copula models. Copulas are often used for applications where the joint tail behavior of the variables is of particular interest, and selecting a copula that captures this well is then essential. Traditional model selection methods such as the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) aim at finding the overall best‐fitting model, which is not necessarily the one best suited for the application at hand. The FIC, on the other hand, evaluates and ranks candidate models based on the precision of their point estimates of a context‐given focus parameter. This could be any quantity of particular interest, for example, the mean, a correlation, conditional probabilities, or measures of tail dependence. We derive FIC formulae for the maximum likelihood estimator, the two‐stage maximum likelihood estimator, and the so‐called pseudo‐maximum‐likelihood (PML) estimator combined with parametric margins. Furthermore, we confirm the validity of the AIC formula for the PML estimator combined with parametric margins. To study the numerical behavior of FIC, we have carried out a simulation study, and we have also analyzed a multivariate data set pertaining to abalones. The results from the study show that the FIC successfully ranks candidate models in terms of their performance, defined as how well they estimate the focus parameter. In terms of estimation precision, FIC clearly outperforms AIC, especially when the focus parameter relates to only a specific part of the model, such as the conditional upper‐tail probability.  相似文献   

11.
Before biomarkers can be used in clinical trials or patients' management, the laboratory assays that measure their levels have to go through development and analytical validation. One of the most critical performance metrics for validation of any assay is related to the minimum amount of values that can be detected and any value below this limit is referred to as below the limit of detection (LOD). Most of the existing approaches that model such biomarkers, restricted by LOD, are parametric in nature. These parametric models, however, heavily depend on the distributional assumptions, and can result in loss of precision under the model or the distributional misspecifications. Using an example from a prostate cancer clinical trial, we show how a critical relationship between serum androgen biomarker and a prognostic factor of overall survival is completely missed by the widely used parametric Tobit model. Motivated by this example, we implement a semiparametric approach, through a pseudo-value technique, that effectively captures the important relationship between the LOD restricted serum androgen and the prognostic factor. Our simulations show that the pseudo-value based semiparametric model outperforms a commonly used parametric model for modeling below LOD biomarkers by having lower mean square errors of estimation.  相似文献   

12.
The interval-censored survival data appear very frequently, where the event of interest is not observed exactly but it is only known to occur within some time interval. In this paper, we propose a location-scale regression model based on the log-generalized gamma distribution for modelling interval-censored data. We shall be concerned only with parametric forms. The proposed model for interval-censored data represents a parametric family of models that has, as special submodels, other regression models which are broadly used in lifetime data analysis. Assuming interval-censored data, we consider a frequentist analysis, a Jackknife estimator and a non-parametric bootstrap for the model parameters. We derive the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes and present some techniques to perform global influence.  相似文献   

13.
We describe an approach, termed reified analysis, for linking the behaviour of mathematical models with inferences about the physical systems which the models represent. We describe the logical basis for the approach, based on coherent assessment of the implications of deficiencies in the mathematical model. We show how the statistical analysis may be carried out by specifying stochastic relationships between the model that we have, improved versions of the model that we might construct, and the system itself. We illustrate our approach with an example concerning the potential shutdown of the Thermohaline circulation in the Atlantic Ocean.  相似文献   

14.
When the results of biological experiments are tested for a possible difference between treatment and control groups, the inference is only valid if based upon a model that fits the experimental results satisfactorily. In dominant-lethal testing, foetal death has previously been assumed to follow a variety of models, including a Poisson, Binomial, Beta-binomial and various mixture models. However, discriminating between models has always been a particularly difficult problem. In this paper, we consider the data from 6 separate dominant-lethal assay experiments and discriminate between the competing models which could be used to describe them. We adopt a Bayesian approach and illustrate how a variety of different models may be considered, using Markov chain Monte Carlo (MCMC) simulation techniques and comparing the results with the corresponding maximum likelihood analyses. We present an auxiliary variable method for determining the probability that any particular data cell is assigned to a given component in a mixture and we illustrate the value of this approach. Finally, we show how the Bayesian approach provides a natural and unique perspective on the model selection problem via reversible jump MCMC and illustrate how probabilities associated with each of the different models may be calculated for each data set. In terms of estimation we show how, by averaging over the different models, we obtain reliable and robust inference for any statistic of interest.  相似文献   

15.
This article introduces a non parametric warping model for functional data. When the outcome of an experiment is a sample of curves, data can be seen as realizations of a stochastic process, which takes into account the variations between the different observed curves. The aim of this work is to define a mean pattern which represents the main behaviour of the set of all the realizations. So, we define the structural expectation of the underlying stochastic function. Then, we provide empirical estimators of this structural expectation and of each individual warping function. Consistency and asymptotic normality for such estimators are proved.  相似文献   

16.
Because model misspecification can lead to inconsistent and inefficient estimators and invalid tests of hypotheses, testing for misspecification is critically important. We focus here on several general purpose goodness-of-fit tests which can be applied to assess the adequacy of a wide variety of parametric models without specifying an alternative model. Parametric bootstrap is the method of choice for computing the p-values of these tests however the proof of its consistency has never been rigourously shown in this setting. Using properties of locally asymptotically normal parametric models, we prove that under quite general conditions, the parametric bootstrap provides a consistent estimate of the null distribution of the statistics under investigation.  相似文献   

17.
Birnbaum–Saunders (BS) models are receiving considerable attention in the literature. Multivariate regression models are a useful tool of the multivariate analysis, which takes into account the correlation between variables. Diagnostic analysis is an important aspect to be considered in the statistical modeling. In this paper, we formulate multivariate generalized BS regression models and carry out a diagnostic analysis for these models. We consider the Mahalanobis distance as a global influence measure to detect multivariate outliers and use it for evaluating the adequacy of the distributional assumption. We also consider the local influence approach and study how a perturbation may impact on the estimation of model parameters. We implement the obtained results in the R software, which are illustrated with real-world multivariate data to show their potential applications.  相似文献   

18.
Particle filters for mixture models with an unknown number of components   总被引:2,自引:1,他引:1  
We consider the analysis of data under mixture models where the number of components in the mixture is unknown. We concentrate on mixture Dirichlet process models, and in particular we consider such models under conjugate priors. This conjugacy enables us to integrate out many of the parameters in the model, and to discretize the posterior distribution. Particle filters are particularly well suited to such discrete problems, and we propose the use of the particle filter of Fearnhead and Clifford for this problem. The performance of this particle filter, when analyzing both simulated and real data from a Gaussian mixture model, is uniformly better than the particle filter algorithm of Chen and Liu. In many situations it outperforms a Gibbs Sampler. We also show how models without the required amount of conjugacy can be efficiently analyzed by the same particle filter algorithm.  相似文献   

19.
Abstract. This is probably the first paper which discusses likelihood inference for a random set using a germ‐grain model, where the individual grains are unobservable, edge effects occur and other complications appear. We consider the case where the grains form a disc process modelled by a marked point process, where the germs are the centres and the marks are the associated radii of the discs. We propose to use a recent parametric class of interacting disc process models, where the minimal sufficient statistic depends on various geometric properties of the random set, and the density is specified with respect to a given marked Poisson model (i.e. a Boolean model). We show how edge effects and other complications can be handled by considering a certain conditional likelihood. Our methodology is illustrated by analysing Peter Diggle's heather data set, where we discuss the results of simulation‐based maximum likelihood inference and the effect of specifying different reference Poisson models.  相似文献   

20.
The Dirichlet process has been used extensively in Bayesian non parametric modeling, and has proven to be very useful. In particular, mixed models with Dirichlet process random effects have been used in modeling many types of data and can often outperform their normal random effect counterparts. Here we examine the linear mixed model with Dirichlet process random effects from a classical view, and derive the best linear unbiased estimator (BLUE) of the fixed effects. We are also able to calculate the resulting covariance matrix and find that the covariance is directly related to the precision parameter of the Dirichlet process, giving a new interpretation of this parameter. We also characterize the relationship between the BLUE and the ordinary least-squares (OLS) estimator and show how confidence intervals can be approximated.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号