首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Providing certain parameters are known, almost any linear map from RP to R1 can be adjusted to yield a consistent and unbiased estimator in the context of estimating the mixing proportion θ on the basis of an unclassified sample of observations taken from a mixture of two p-dimensional distributions in proportions θ and 1-θ. Attention is focused on an estimator proposed recently, θ, which has minimum variance over all such linear maps. Unfortunately, the form of θ depends on the means of the component distributions and the covariance matrix of the mixture distribution. The effect of using appropriate sample estimates for these unknown parameters in forming θ is investigated by deriving the asymptotic mean and variance of the resulting estimator. The relative efficiency of this estimator under normality is derived. Also, a study is undertaken of the performance of a similar type of estimator appropriate in the context where an observed data vector is not an observation from either one or the other onent distributions, but is recorded as an integrated measurement over a surface area which is a mixture of two categories whose characteristics have different statistical distributions.The asymptotic bias in this case is compared with some available practical results.  相似文献   

2.
Assume that a number of individuals are to be classified into one of two populations and that, at the same time, the proportion of members of each population needs to be estimated. The allocated proportions given by the Bayes classification rule are not consistent estimates of the true proportions, so a different classification rule is proposed; this rule yields consistent estimates with only a small increase in the probability of misclassification. As an illustration, the case of two normal distributions with equal covariance matrices is dealt with in detail.  相似文献   

3.
The traditional mixture model assumes that a dataset is composed of several populations of Gaussian distributions. In real life, however, data often do not fit the restrictions of normality very well. It is likely that data from a single population exhibiting either asymmetrical or heavy-tail behavior could be erroneously modeled as two populations, resulting in suboptimal decisions. To avoid these pitfalls, we generalize the mixture model using adaptive kernel density estimators. Because kernel density estimators enforce no functional form, we can adapt to non-normal asymmetric, kurtotic, and tail characteristics in each population independently. This, in effect, robustifies mixture modeling. We adapt two computational algorithms, genetic algorithm with regularized Mahalanobis distance and genetic expectation maximization algorithm, to optimize the kernel mixture model (KMM) and use results from robust estimation theory in order to data-adaptively regularize both. Finally, we likewise extend the information criterion ICOMP to score the KMM. We use these tools to simultaneously select the best mixture model and classify all observations without making any subjective decisions. The performance of the KMM is demonstrated on two medical datasets; in both cases, we recover the clinically determined group structure and substantially improve patient classification rates over the Gaussian mixture model.  相似文献   

4.
Quantitative fatty acid signature analysis (QFASA) produces diet estimates containing the proportion of each species of prey in a predator's diet. Since the diet estimates are compositional, often contain an abundance of zeros (signifying the absence of a species in the diet), and samples sizes are generally small, inference problems require the use of nonstandard statistical methodology. Recently, a mixture distribution involving the multiplicative logistic normal distribution (and its skew-normal extension) was introduced in relation to QFASA to manage the problematic zeros. In this paper, we examine an alternative mixture distribution, namely, the recently proposed zero-inflated beta (ZIB) distribution. A potential advantage of using the ZIB distribution over the previously considered mixture models is that it does not require transformation of the data. To assess the usefulness of the ZIB distribution in QFASA inference problems, a simulation study is first carried out which compares the small sample properties of the maximum likelihood estimators of the means. The fit of the distributions is then examined using ‘pseudo-predators’ generated from a large real-life prey base. Finally, confidence intervals for the true diet based on the ZIB distribution are compared with earlier results through a simulation study and harbor seal data.  相似文献   

5.
The World Health Organization (WHO) diagnostic criteria for diabetes mellitus were determined in part by evidence that in some populations the plasma glucose level 2 h after an oral glucose load is a mixture of two distinct distributions. We present a finite mixture model that allows the two component densities to be generalized linear models and the mixture probability to be a logistic regression model. The model allows us to estimate the prevalence of diabetes and sensitivity and specificity of the diagnostic criteria as a function of covariates and to estimate them in the absence of an external standard. Sensitivity is the probability that a test indicates disease conditionally on disease being present. Specificity is the probability that a test indicates no disease conditionally on no disease being present. We obtained maximum likelihood estimates via the EM algorithm and derived the standard errors from the information matrix and by the bootstrap. In the application to data from the diabetes in Egypt project a two-component mixture model fits well and the two components are interpreted as normal and diabetes. The means and variances are similar to results found in other populations. The minimum misclassification cutpoints decrease with age, are lower in urban areas and are higher in rural areas than the 200 mg dl-1 cutpoint recommended by the WHO. These differences are modest and our results generally support the WHO criterion. Our methods allow the direct inclusion of concomitant data whereas past analyses were based on partitioning the data.  相似文献   

6.
A nonparametric mixture model specifies that observations arise from a mixture distribution, ∫ f(x, θ) dG(θ), where the mixing distribution G is completely unspecified. A number of algorithms have been developed to obtain unconstrained maximum-likelihood estimates of G, but none of these algorithms lead to estimates when functional constraints are present. In many cases, there is a natural interest in functional ?(G), such as the mean and variance, of the mixing distribution, and profile likelihoods and confidence intervals for ?(G) are desired. In this paper we develop a penalized generalization of the ISDM algorithm of Kalbfleisch and Lesperance (1992) that can be used to solve the problem of constrained estimation. We also discuss its use in various different applications. Convergence results and numerical examples are given for the generalized ISDM algorithm, and asymptotic results are developed for the likelihood-ratio test statistics in the multinomial case.  相似文献   

7.
We revisit the problem of estimating the proportion π of true null hypotheses where a large scale of parallel hypothesis tests are performed independently. While the proportion is a quantity of interest in its own right in applications, the problem has arisen in assessing or controlling an overall false discovery rate. On the basis of a Bayes interpretation of the problem, the marginal distribution of the p-value is modeled in a mixture of the uniform distribution (null) and a non-uniform distribution (alternative), so that the parameter π of interest is characterized as the mixing proportion of the uniform component on the mixture. In this article, a nonparametric exponential mixture model is proposed to fit the p-values. As an alternative approach to the convex decreasing mixture model, the exponential mixture model has the advantages of identifiability, flexibility, and regularity. A computation algorithm is developed. The new approach is applied to a leukemia gene expression data set where multiple significance tests over 3,051 genes are performed. The new estimate for π with the leukemia gene expression data appears to be about 10% lower than the other three estimates that are known to be conservative. Simulation results also show that the new estimate is usually lower and has smaller bias than the other three estimates.  相似文献   

8.
We propose a Bayesian nonparametric procedure for density estimation, for data in a closed, bounded interval, say [0,1]. To this aim, we use a prior based on Bemstein polynomials. This corresponds to expressing the density of the data as a mixture of given beta densities, with random weights and a random number of components. The density estimate is then obtained as the corresponding predictive density function. Comparison with classical and Bayesian kernel estimates is provided. The proposed procedure is illustrated in an example; an MCMC algorithm for approximating the estimate is also discussed.  相似文献   

9.
Suppose that ? is a Gaussian density and that g = f * ?, where * denotes convolution. From observations with density g, one wishes to estimate f. We analyze an estimate which is a linear combination of estimates of derivatives of g and show that this estimate converges in an L2 norm at a rate which is compatible with the pointwise optimal rate established by Fan (1991).  相似文献   

10.
Abstract

In this article, we propose a penalized local log-likelihood method to locally select the number of components in non parametric finite mixture of regression models via proportion shrinkage method. Mean functions and variance functions are estimated simultaneously. We show that the number of components can be estimated consistently, and further establish asymptotic normality of functional estimates. We use a modified EM algorithm to estimate the unknown functions. Simulations are conducted to demonstrate the performance of the proposed method. We illustrate our method via an empirical analysis of the housing price index data of United States.  相似文献   

11.
Linear maps of a single unclassified observation are used to estimate the mixing proportion in a mixture of two populations with homogeneous variances in the presence of covariates. with complete knowledge of the parameters of the individual populations, the linear map for which the estimator is unbiased and has minimum variance amongst all similar estimators can be determined. Plug-in estimator based on independent training samples from the component populations can be constructed and is asymptotically equivalent to Cochran's classification statistic V* for covariate classification; see Memon and Okamoto (1970). Under normality assumptions, asymptotic expansion of the distribution of the plug-in estimator is available. In the absence of covariates, our estimator reduces to that suggested by Walker (1980) who has investigated the problem based on information on large unclassified samples from a mixture of two populations with heterogeneous variances. In contrast, distribution of Walker's estimator seems intractable in moderate sample sizes even with normality assumption.  相似文献   

12.
Summary.  In an important class of problems involving mixture distributions, interest focuses on the mixture proportions, considering other possible parameters as nuisance parameters. We formulate a new variation on such problems that arose in a study on the link between the number of cells in a charge-coupled detector image sensor that turned defective because of cosmic radiation and the storage conditions of such sensors. In this variation, the component densities are bivariate, there are two classes and only a subset of the mixture proportions is of relevance. We propose a nonparametric method to estimate the relevant subset of the proportions and apply our method to the data in our study.  相似文献   

13.
ABSTRACT

In many real-world applications, the traditional theory of analysis of covariance (ANCOVA) leads to inadequate and unreliable results because of violation of the response variable observations from the essential Gaussian assumption that may be due to the heterogeneity of population, the presence of outlier or both of them. In this paper, we develop a Gaussian mixture ANCOVA model for modelling heterogeneous populations with a finite number of subpopulation. We provide the maximum likelihood estimates of the model parameters via an EM algorithm. We also drive the adjusted effects estimators for treatments and covariates. The Fisher information matrix of the model and asymptotic confidence intervals for the parameter are also discussed. We performed a simulation study to assess the performance of the proposed model. A real-world example is also worked out to explained the methodology.  相似文献   

14.
Abstract

In this paper we introduce continuous tree mixture model that is the mixture of undirected graphical models with tree structured graphs and is considered as multivariate analysis with a non parametric approach. We estimate its parameters, the component edge sets and mixture proportions through regularized maximum likalihood procedure. Our new algorithm, which uses expectation maximization algorithm and the modified version of Kruskal algorithm, simultaneosly estimates and prunes the mixture component trees. Simulation studies indicate this method performs better than the alternative Gaussian graphical mixture model. The proposed method is also applied to water-level data set and is compared with the results of Gaussian mixture model.  相似文献   

15.
Summary.  The cure fraction (the proportion of patients who are cured of disease) is of interest to both patients and clinicians and is a useful measure to monitor trends in survival of curable disease. The paper extends the non-mixture and mixture cure fraction models to estimate the proportion cured of disease in population-based cancer studies by incorporating a finite mixture of two Weibull distributions to provide more flexibility in the shape of the estimated relative survival or excess mortality functions. The methods are illustrated by using public use data from England and Wales on survival following diagnosis of cancer of the colon where interest lies in differences between age and deprivation groups. We show that the finite mixture approach leads to improved model fit and estimates of the cure fraction that are closer to the empirical estimates. This is particularly so in the oldest age group where the cure fraction is notably lower. The cure fraction is broadly similar in each deprivation group, but the median survival of the 'uncured' is lower in the more deprived groups. The finite mixture approach overcomes some of the limitations of the more simplistic cure models and has the potential to model the complex excess hazard functions that are seen in real data.  相似文献   

16.
We consider moving average processes, {Xs, s ∈ ??}, where ?? is a triangular lattice in the plane R2. To estimate the parameters of such processes, Adjengue & Moore (1993) have considered likelihood and gaussian pseudo-likelihood methods. We consider here two other methods. The first one is based on the estimation of the correlations and the relation between these correlations and the parameters of the process. The second relies on a linear approximation of the process. The asymptotic properties of the proposed estimators are analyzed and compared. A simulation study allows us to compare the estimators for fixed sample sizes.  相似文献   

17.
Mild to moderate skew in errors can substantially impact regression mixture model results; one approach for overcoming this includes transforming the outcome into an ordered categorical variable and using a polytomous regression mixture model. This is effective for retaining differential effects in the population; however, bias in parameter estimates and model fit warrant further examination of this approach at higher levels of skew. The current study used Monte Carlo simulations; 3000 observations were drawn from each of two subpopulations differing in the effect of X on Y. Five hundred simulations were performed in each of the 10 scenarios varying in levels of skew in one or both classes. Model comparison criteria supported the accurate two-class model, preserving the differential effects, while parameter estimates were notably biased. The appropriate number of effects can be captured with this approach but we suggest caution when interpreting the magnitude of the effects.  相似文献   

18.
In the literature, assuming independence of random variables X and Y, statistical estimation of the stress–strength parameter R = P(X > Y) is intensively investigated. However, in some real applications, the strength variable X could be highly dependent on the stress variable Y. In this paper, unlike the common practice in the literature, we discuss on estimation of the parameter R where more realistically X and Y are dependent random variables distributed as bivariate Rayleigh model. We derive the Bayes estimates and highest posterior density credible intervals of the parameters using suitable priors on the parameters. Because there are not closed forms for the Bayes estimates, we will use an approximation based on Laplace method and a Markov Chain Monte Carlo technique to obtain the Bayes estimate of R and unknown parameters. Finally, simulation studies are conducted in order to evaluate the performances of the proposed estimators and analysis of two data sets are provided.  相似文献   

19.
We advocate the use of an Indirect Inference method to estimate the parameter of a COGARCH(1,1) process for equally spaced observations. This requires that the true model can be simulated and a reasonable estimation method for an approximate auxiliary model. We follow previous approaches and use linear projections leading to an auxiliary autoregressive model for the squared COGARCH returns. The asymptotic theory of the Indirect Inference estimator relies on a uniform strong law of large numbers and asymptotic normality of the parameter estimates of the auxiliary model, which require continuity and differentiability of the COGARCH process with respect to its parameter and which we prove via Kolmogorov's continuity criterion. This leads to consistent and asymptotically normal Indirect Inference estimates under moment conditions on the driving Lévy process. A simulation study shows that the method yields a substantial finite sample bias reduction compared with previous estimators.  相似文献   

20.
Bootstrapping has been used as a diagnostic tool for validating model results for a wide array of statistical models. Here we evaluate the use of the non-parametric bootstrap for model validation in mixture models. We show that the bootstrap is problematic for validating the results of class enumeration and demonstrating the stability of parameter estimates in both finite mixture and regression mixture models. In only 44% of simulations did bootstrapping detect the correct number of classes in at least 90% of the bootstrap samples for a finite mixture model without any model violations. For regression mixture models and cases with violated model assumptions, the performance was even worse. Consequently, we cannot recommend the non-parametric bootstrap for validating mixture models.

The cause of the problem is that when resampling is used influential individual observations have a high likelihood of being sampled many times. The presence of multiple replications of even moderately extreme observations is shown to lead to additional latent classes being extracted. To verify that these replications cause the problems we show that leave-k-out cross-validation where sub-samples taken without replacement does not suffer from the same problem.  相似文献   


设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号