共查询到20条相似文献,搜索用时 0 毫秒
1.
The limiting distribution of the log-likelihood-ratio statistic for testing the number of components in finite mixture models can be very complex. We propose two alternative methods. One method is generalized from a locally most powerful test. The test statistic is asymptotically normal, but its asymptotic variance depends on the true null distribution. Another method is to use a bootstrap log-likelihood-ratio statistic which has a uniform limiting distribution in [0,1]. When tested against local alternatives, both methods have the same power asymptotically. Simulation results indicate that the asymptotic results become applicable when the sample size reaches 200 for the bootstrap log-likelihood-ratio test, but the generalized locally most powerful test needs larger sample sizes. In addition, the asymptotic variance of the locally most powerful test statistic must be estimated from the data. The bootstrap method avoids this problem, but needs more computational effort. The user may choose the bootstrap method and let the computer do the extra work, or choose the locally most powerful test and spend quite some time to derive the asymptotic variance for the given model. 相似文献
2.
Jiahua Chen 《Revue canadienne de statistique》1994,22(3):387-399
The number of components is an important feature in finite mixture models. Because of the irregularity of the parameter space, the log-likelihood-ratio statistic does not have a chi-square limit distribution. It is very difficult to find a test with a specified significance level, and this is especially true for testing k — 1 versus k components. Most of the existing work has concentrated on finding a comparable approximation to the limit distribution of the log-likelihood-ratio statistic. In this paper, we use a statistic similar to the usual log likelihood ratio, but its null distribution is asymptotically normal. A simulation study indicates that the method has good power at detecting extra components. We also discuss how to improve the power of the test, and some simulations are performed. 相似文献
3.
Loukia Meligkotsidou 《Statistics and Computing》2007,17(2):93-107
In this paper we present Bayesian analysis of finite mixtures of multivariate Poisson distributions with an unknown number
of components. The multivariate Poisson distribution can be regarded as the discrete counterpart of the multivariate normal
distribution, which is suitable for modelling multivariate count data. Mixtures of multivariate Poisson distributions allow
for overdispersion and for negative correlations between variables. To perform Bayesian analysis of these models we adopt
a reversible jump Markov chain Monte Carlo (MCMC) algorithm with birth and death moves for updating the number of components.
We present results obtained from applying our modelling approach to simulated and real data. Furthermore, we apply our approach
to a problem in multivariate disease mapping, namely joint modelling of diseases with correlated counts. 相似文献
4.
Carl M. Harris 《统计学通讯:理论与方法》2013,42(9):987-1007
Finite mixtures of distributions have been getting increasing use in the applied literature. In the continuous case, linear combinations of exponentials and gammas have been shown to be well suited for modeling purposes. In the discrete case, the focus has primarily been on continuous mixing, usually of Poisson distributions and typically using gammas to describe the random parameter, But many of these applications are forced, especially when a continuous mixing distribution is used. Instead, it is often prefe-rable to try finite mixtures of geometries or negative binomials, since these are the fundamental building blocks of all discrete random variables. To date, a major stumbling block to their use has been the lack of easy routines for estimating the parameters of such models. This problem has now been alleviated by the adaptation to the discrete case of numerical procedures recently developed for exponential, Weibull, and gamma mixtures. The new methods have been applied to four previously studied data sets, and significant improvements reported in goodness-of-fit, with resultant implications for each affected study. 相似文献
5.
We propose data generating structures which can be represented as the nonlinear autoregressive models with single and finite mixtures of scale mixtures of skew normal innovations. This class of models covers symmetric/asymmetric and light/heavy-tailed distributions, so provide a useful generalization of the symmetrical nonlinear autoregressive models. As semiparametric and nonparametric curve estimation are the approaches for exploring the structure of a nonlinear time series data set, in this article the semiparametric estimator for estimating the nonlinear function of the model is investigated based on the conditional least square method and nonparametric kernel approach. Also, an Expectation–Maximization-type algorithm to perform the maximum likelihood (ML) inference of unknown parameters of the model is proposed. Furthermore, some strong and weak consistency of the semiparametric estimator in this class of models are presented. Finally, to illustrate the usefulness of the proposed model, some simulation studies and an application to real data set are considered. 相似文献
6.
Athanasios Christou Micheas 《Journal of applied statistics》2014,41(12):2596-2615
We investigate marked non-homogeneous Poisson processes using finite mixtures of bivariate normal components to model the spatial intensity function. We employ a Bayesian hierarchical framework for estimation of the parameters in the model, and propose an approach for including covariate information in this context. The methodology is exemplified through an application involving modeling of and inference for tornado occurrences. 相似文献
7.
Vännman Kerstin 《统计学通讯:理论与方法》2013,42(6):1569-1584
The distribution of the estimated mean of the nonstandard mixture of distributions that has a discrete probability mass at zero and a gamma distribution for positive values is derived. Furthermore, for the studied nonstandard mixture of distributions, the distribution of the standardized statistic (estimator - true mean)/standard deviation of estimator is derived. The results are used to study the accuracy of the confidence interval for the mean based on a large sample approximation. Quantiles for the standardized statistic are also calculated. 相似文献
8.
A procedure is described which detects the number of components in a mixture distribution having normal components. The test is based on the behavior of the sample order statistics near the center of the distribution. Numerical results are presented and comparisons with tests proposed by Shapiro and Francia (1972) and Baker (1958) are provided. 相似文献
9.
The number ofl-overlapping success runs of lengthk inn trials, which was introduced and studied recently, is presently reconsidered in the Bernoulli case and two exact formulas
are derived for its probability distribution function in terms of multinomial and binomial coefficients respectively. A recurrence
relation concerning this distribution, as well as its mean, is also obtained. Furthermore, the number ofl-overlapping success runs of lengthk inn Bernoulli trials arranged on a circle is presently considered for the first time and its probability distribution function
and mean are derived. Finally, the latter distribution is related to the first, two open problems regarding limiting distributions
are stated, and numerical illustrations are given in two tables. All results are new and they unify and extend several results
of various authors on binomial and circular binomial distributions of orderk. 相似文献
10.
Wen-Liang Hung Shou-Jen Chang-Chien Miin-Shen Yang 《Journal of applied statistics》2012,39(10):2259-2274
The EM algorithm is the standard method for estimating the parameters in finite mixture models. Yang and Pan [25] proposed a generalized classification maximum likelihood procedure, called the fuzzy c-directions (FCD) clustering algorithm, for estimating the parameters in mixtures of von Mises distributions. Two main drawbacks of the EM algorithm are its slow convergence and the dependence of the solution on the initial value used. The choice of initial values is of great importance in the algorithm-based literature as it can heavily influence the speed of convergence of the algorithm and its ability to locate the global maximum. On the other hand, the algorithmic frameworks of EM and FCD are closely related. Therefore, the drawbacks of FCD are the same as those of the EM algorithm. To resolve these problems, this paper proposes another clustering algorithm, which can self-organize local optimal cluster numbers without using cluster validity functions. These numerical results clearly indicate that the proposed algorithm is superior in performance of EM and FCD algorithms. Finally, we apply the proposed algorithm to two real data sets. 相似文献
11.
Robust mixture modelling using the t distribution 总被引:2,自引:0,他引:2
Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consider a more robust approach by modelling the data by a mixture of t distributions. The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise. 相似文献
12.
The distributions of some transformations of the sample correlation coefficient r are studied here, when the parent population is a mixture of two standard bivariate normals. The behavior of these transformations is assessed through the first four standard moments. It is shown that there is a close relationship between the behavior of the transformed variables and the lack of normality as evinced by the 'kurtosis' defined in the bivariate population 相似文献
13.
The authors propose a procedure for determining the unknown number of components in mixtures by generalizing a Bayesian testing method proposed by Mengersen & Robert (1996). The testing criterion they propose involves a Kullback‐Leibler distance, which may be weighted or not. They give explicit formulas for the weighted distance for a number of mixture distributions and propose a stepwise testing procedure to select the minimum number of components adequate for the data. Their procedure, which is implemented using the BUGS software, exploits a fast collapsing approach which accelerates the search for the minimum number of components by avoiding full refitting at each step. The performance of their method is compared, using both distances, to the Bayes factor approach. 相似文献
14.
Fortunato Pesarin 《Statistical Methods and Applications》1992,1(1):87-101
Summary This paper deals with nonparametric methods for combining dependent permutation or randomization tests. Particularly, they
are nonparametric with respect to the underlying dependence structure. The methods are based on a without replacement resampling
procedure (WRRP) conditional on the observed data, also called conditional simulation, which provide suitable estimates, as
good as computing time permits, of the permutational distribution of any statistic. A class C of combining functions is characterized
in such a way that all its members, under suitable and reasonable conditions, are found to be consistent and unbiased. Moreover,
for some of its members, their almost sure asymptotic equivalence with respect to best tests, in particular cases, is shown.
An applicational example to a multivariate permutationalt-paired test is also discussed. 相似文献
15.
Let us have a probability distribution P (possibly empirical) on the real line R. Consider the problem of finding the k-mean of P, i.e. a set A of at most k points that minimizes given loss-function. It is known that the k-mean can be found using an iterative algorithm by Lloyd [1982. Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 129–136]. However, depending on the complexity of the distribution P, the application of this algorithm can be quite resource-consuming. One possibility to overcome the problem is to group the original data and calculate the k-mean on the basis of the grouped data. As a result, the new k-mean will be biased, and our aim is to measure the loss of the quality of approximation caused by such approach. 相似文献
16.
A stochastic volatility in mean model with correlated errors using the symmetrical class of scale mixtures of normal distributions is introduced in this article. The scale mixture of normal distributions is an attractive class of symmetric distributions that includes the normal, Student-t, slash and contaminated normal distributions as special cases, providing a robust alternative to estimation in stochastic volatility in mean models in the absence of normality. Using a Bayesian paradigm, an efficient method based on Markov chain Monte Carlo (MCMC) is developed for parameter estimation. The methods developed are applied to analyze daily stock return data from the São Paulo Stock, Mercantile & Futures Exchange index (IBOVESPA). The Bayesian predictive information criteria (BPIC) and the logarithm of the marginal likelihood are used as model selection criteria. The results reveal that the stochastic volatility in mean model with correlated errors and slash distribution provides a significant improvement in model fit for the IBOVESPA data over the usual normal model. 相似文献
17.
Exact sampling distributions of sums of squares in the unbalanced one-way random model are obtained under heterogeneous error variances. These distributions are used to investigate the effect of heteroscedasticity and unbalancedness on the probability of negative estimate of the group variance component. The computed results reveal that heteroscedasticity affects the probability of negative estimate in all situations of group sizes. Further, the probability decreases with heterogeneity of error variances for balanced situations and increases with variability among group size for equal error variances case. 相似文献
18.
Robust automatic selection techniques for the smoothing parameter of a smoothing spline are introduced. They are based on a robust predictive error criterion and can be viewed as robust versions of C
p and cross-validation. They lead to smoothing splines which are stable and reliable in terms of mean squared error over a large spectrum of model distributions. 相似文献
19.
ABSTRACTThis article addresses the problem of repeats detection used in the comparison of significant repeats in sequences. The case of self-overlapping leftmost repeats for large sequences generated by an homogeneous stationary Markov chain has not been treated in the literature. In this work, we are interested by the approximation of the number of self-overlapping leftmost long enough repeats distribution in an homogeneous stationary Markov chain. Using the Chen–Stein method, we show that the number of self-overlapping leftmost long enough repeats distribution is approximated by the Poisson distribution. Moreover, we show that this approximation can be extended to the case where the sequences are generated by a m-order Markov chain. 相似文献
20.
《Journal of Statistical Computation and Simulation》2012,82(3):564-581
In healthcare studies, count data sets measured with covariates often exhibit heterogeneity and contain extreme values. To analyse such count data sets, we use a finite mixture of regression model framework and investigate a robust estimation approach, called the L2E [D.W. Scott, On fitting and adapting of density estimates, Comput. Sci. Stat. 30 (1998), pp. 124–133], to estimate the parameters. The L2E is based on an integrated L2 distance between parametric conditional and true conditional mass functions. In addition to studying the theoretical properties of the L2E estimator, we compare the performance of L2E with the maximum likelihood (ML) estimator and a minimum Hellinger distance (MHD) estimator via Monte Carlo simulations for correctly specified and gross-error contaminated mixture of Poisson regression models. These show that the L2E is a viable robust alternative to the ML and MHD estimators. More importantly, we use the L2E to perform a comprehensive analysis of a Western Australia hospital inpatient obstetrical length of stay (LOS) (in days) data that contains extreme values. It is shown that the L2E provides a two-component Poisson mixture regression fit to the LOS data which is better than those based on the ML and MHD estimators. The L2E fit identifies admission type as a significant covariate that profiles the predominant subpopulation of normal-stayers as planned patients and the small subpopulation of long-stayers as emergency patients. 相似文献