期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

On testing the number of components in finite mixture models with known relevant component distributions

Jiahua Chen Ping Cheng 《Revue canadienne de statistique》1997,25(3):389-400

The limiting distribution of the log-likelihood-ratio statistic for testing the number of components in finite mixture models can be very complex. We propose two alternative methods. One method is generalized from a locally most powerful test. The test statistic is asymptotically normal, but its asymptotic variance depends on the true null distribution. Another method is to use a bootstrap log-likelihood-ratio statistic which has a uniform limiting distribution in [0,1]. When tested against local alternatives, both methods have the same power asymptotically. Simulation results indicate that the asymptotic results become applicable when the sample size reaches 200 for the bootstrap log-likelihood-ratio test, but the generalized locally most powerful test needs larger sample sizes. In addition, the asymptotic variance of the locally most powerful test statistic must be estimated from the data. The bootstrap method avoids this problem, but needs more computational effort. The user may choose the bootstrap method and let the computer do the extra work, or choose the locally most powerful test and spend quite some time to derive the asymptotic variance for the given model. 相似文献

2.

Generalized likelihood-ratio test of the number of components in finite mixture models

Jiahua Chen 《Revue canadienne de statistique》1994,22(3):387-399

The number of components is an important feature in finite mixture models. Because of the irregularity of the parameter space, the log-likelihood-ratio statistic does not have a chi-square limit distribution. It is very difficult to find a test with a specified significance level, and this is especially true for testing k — 1 versus k components. Most of the existing work has concentrated on finding a comparable approximation to the limit distribution of the log-likelihood-ratio statistic. In this paper, we use a statistic similar to the usual log likelihood ratio, but its null distribution is asymptotically normal. A simulation study indicates that the method has good power at detecting extra components. We also discuss how to improve the power of the test, and some simulations are performed. 相似文献

3.

Bayesian multivariate Poisson mixtures with an unknown number of components

Loukia Meligkotsidou 《Statistics and Computing》2007,17(2):93-107

In this paper we present Bayesian analysis of finite mixtures of multivariate Poisson distributions with an unknown number of components. The multivariate Poisson distribution can be regarded as the discrete counterpart of the multivariate normal distribution, which is suitable for modelling multivariate count data. Mixtures of multivariate Poisson distributions allow for overdispersion and for negative correlations between variables. To perform Bayesian analysis of these models we adopt a reversible jump Markov chain Monte Carlo (MCMC) algorithm with birth and death moves for updating the number of components. We present results obtained from applying our modelling approach to simulated and real data. Furthermore, we apply our approach to a problem in multivariate disease mapping, namely joint modelling of diseases with correlated counts. 相似文献

4.

On finite mixtures of geometric and negative binomial distributions

Carl M. Harris 《统计学通讯:理论与方法》2013,42(9):987-1007

Finite mixtures of distributions have been getting increasing use in the applied literature. In the continuous case, linear combinations of exponentials and gammas have been shown to be well suited for modeling purposes. In the discrete case, the focus has primarily been on continuous mixing, usually of Poisson distributions and typically using gammas to describe the random parameter, But many of these applications are forced, especially when a continuous mixing distribution is used. Instead, it is often prefe-rable to try finite mixtures of geometries or negative binomials, since these are the fundamental building blocks of all discrete random variables. To date, a major stumbling block to their use has been the lack of easy routines for estimating the parameters of such models. This problem has now been alleviated by the adaptation to the discrete case of numerical procedures recently developed for exponential, Weibull, and gamma mixtures. The new methods have been applied to four previously studied data sets, and significant improvements reported in goodness-of-fit, with resultant implications for each affected study. 相似文献

5.

Nonlinear semiparametric autoregressive model with finite mixtures of scale mixtures of skew normal innovations

A. Hajrajabi M. Maleki 《Journal of applied statistics》2019,46(11):2010-2029

We propose data generating structures which can be represented as the nonlinear autoregressive models with single and finite mixtures of scale mixtures of skew normal innovations. This class of models covers symmetric/asymmetric and light/heavy-tailed distributions, so provide a useful generalization of the symmetrical nonlinear autoregressive models. As semiparametric and nonparametric curve estimation are the approaches for exploring the structure of a nonlinear time series data set, in this article the semiparametric estimator for estimating the nonlinear function of the model is investigated based on the conditional least square method and nonparametric kernel approach. Also, an Expectation–Maximization-type algorithm to perform the maximum likelihood (ML) inference of unknown parameters of the model is proposed. Furthermore, some strong and weak consistency of the semiparametric estimator in this class of models are presented. Finally, to illustrate the usefulness of the proposed model, some simulation studies and an application to real data set are considered. 相似文献

6.

Hierarchical Bayesian modeling of marked non-homogeneous Poisson processes with finite mixtures and inclusion of covariate information

Athanasios Christou Micheas 《Journal of applied statistics》2014,41(12):2596-2615

We investigate marked non-homogeneous Poisson processes using finite mixtures of bivariate normal components to model the spatial intensity function. We employ a Bayesian hierarchical framework for estimation of the parameters in the model, and propose an approach for including covariate information in this context. The methodology is exemplified through an application involving modeling of and inference for tornado occurrences. 相似文献

7.

On the distribution of the estimated mean from nonstandard mixtures of distributions

Vännman Kerstin 《统计学通讯:理论与方法》2013,42(6):1569-1584

The distribution of the estimated mean of the nonstandard mixture of distributions that has a discrete probability mass at zero and a gamma distribution for positive values is derived. Furthermore, for the studied nonstandard mixture of distributions, the distribution of the standardized statistic (estimator - true mean)/standard deviation of estimator is derived. The results are used to study the accuracy of the confidence interval for the mean based on a large sample approximation. Quantiles for the standardized statistic are also calculated. 相似文献

8.

Detecting the number of components in a finite mixture having normal components

Mary Maine Thomas Boullion Gaspard T. Rizzuto 《统计学通讯:理论与方法》2013,42(2):611-620

A procedure is described which detects the number of components in a mixture distribution having normal components. The test is based on the behavior of the sample order statistics near the center of the distribution. Numerical results are presented and comparisons with tests proposed by Shapiro and Francia (1972) and Baker (1958) are provided. 相似文献

9.

On binomial and circular binomial distributions of orderk forl-overlapping success runs of lengthk

Frosso S. Makri Andreas N. Philippou 《Statistical Papers》2005,46(3):411-432

The number ofl-overlapping success runs of lengthk inn trials, which was introduced and studied recently, is presently reconsidered in the Bernoulli case and two exact formulas are derived for its probability distribution function in terms of multinomial and binomial coefficients respectively. A recurrence relation concerning this distribution, as well as its mean, is also obtained. Furthermore, the number ofl-overlapping success runs of lengthk inn Bernoulli trials arranged on a circle is presently considered for the first time and its probability distribution function and mean are derived. Finally, the latter distribution is related to the first, two open problems regarding limiting distributions are stated, and numerical illustrations are given in two tables. All results are new and they unify and extend several results of various authors on binomial and circular binomial distributions of orderk. 相似文献

10.

Self-updating clustering algorithm for estimating the parameters in mixtures of von Mises distributions

Wen-Liang Hung Shou-Jen Chang-Chien Miin-Shen Yang 《Journal of applied statistics》2012,39(10):2259-2274

The EM algorithm is the standard method for estimating the parameters in finite mixture models. Yang and Pan [25] proposed a generalized classification maximum likelihood procedure, called the fuzzy c-directions (FCD) clustering algorithm, for estimating the parameters in mixtures of von Mises distributions. Two main drawbacks of the EM algorithm are its slow convergence and the dependence of the solution on the initial value used. The choice of initial values is of great importance in the algorithm-based literature as it can heavily influence the speed of convergence of the algorithm and its ability to locate the global maximum. On the other hand, the algorithmic frameworks of EM and FCD are closely related. Therefore, the drawbacks of FCD are the same as those of the EM algorithm. To resolve these problems, this paper proposes another clustering algorithm, which can self-organize local optimal cluster numbers without using cluster validity functions. These numerical results clearly indicate that the proposed algorithm is superior in performance of EM and FCD algorithms. Finally, we apply the proposed algorithm to two real data sets. 相似文献

11.

Robust mixture modelling using the t distribution 总被引：2，自引：0，他引：2

Peel D. McLachlan G. J. 《Statistics and Computing》2000,10(4):339-348

Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consider a more robust approach by modelling the data by a mixture of t distributions. The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise. 相似文献

12.

On the distribution of rin samples from the mixtures of bivariate normal populations

K. Kocherlakota S. Kocherlakota 《统计学通讯:理论与方法》2013,42(19):1943-1966

The distributions of some transformations of the sample correlation coefficient r are studied here, when the parent population is a mixture of two standard bivariate normals. The behavior of these transformations is assessed through the first four standard moments. It is shown that there is a close relationship between the behavior of the transformed variables and the lack of normality as evinced by the 'kurtosis' defined in the bivariate population 相似文献

13.

A fast distance‐based approach for determining the number of components in mixtures

Sujit K. Sahu Russell C. H. Cheng 《Revue canadienne de statistique》2003,31(1):3-22

The authors propose a procedure for determining the unknown number of components in mixtures by generalizing a Bayesian testing method proposed by Mengersen & Robert (1996). The testing criterion they propose involves a Kullback‐Leibler distance, which may be weighted or not. They give explicit formulas for the weighted distance for a number of mixture distributions and propose a stepwise testing procedure to select the minimum number of components adequate for the data. Their procedure, which is implemented using the BUGS software, exploits a fast collapsing approach which accelerates the search for the minimum number of components by avoiding full refitting at each step. The performance of their method is compared, using both distances, to the Bayes factor approach. 相似文献

14.

A resampling procedure for nonparametric combination of several dependent tests

Fortunato Pesarin 《Statistical Methods and Applications》1992,1(1):87-101

Summary This paper deals with nonparametric methods for combining dependent permutation or randomization tests. Particularly, they are nonparametric with respect to the underlying dependence structure. The methods are based on a without replacement resampling procedure (WRRP) conditional on the observed data, also called conditional simulation, which provide suitable estimates, as good as computing time permits, of the permutational distribution of any statistic. A class C of combining functions is characterized in such a way that all its members, under suitable and reasonable conditions, are found to be consistent and unbiased. Moreover, for some of its members, their almost sure asymptotic equivalence with respect to best tests, in particular cases, is shown. An applicational example to a multivariate permutationalt-paired test is also discussed. 相似文献

15.

On the quality of k-means clustering based on grouped data

Meelis Krik Kalev Prna 《Journal of statistical planning and inference》2009,139(11):3836-3841

Let us have a probability distribution P (possibly empirical) on the real line R

R

. Consider the problem of finding the k-mean of P, i.e. a set A of at most k points that minimizes given loss-function. It is known that the k-mean can be found using an iterative algorithm by Lloyd [1982. Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 129–136]. However, depending on the complexity of the distribution P, the application of this algorithm can be quite resource-consuming. One possibility to overcome the problem is to group the original data and calculate the k-mean on the basis of the grouped data. As a result, the new k-mean will be biased, and our aim is to measure the loss of the quality of approximation caused by such approach. 相似文献

16.

Stochastic volatility in mean models with scale mixtures of normal distributions and correlated errors: A Bayesian approach

C.A. Abanto-Valle H.S. Migon 《Journal of statistical planning and inference》2011,141(5):1875-1887

A stochastic volatility in mean model with correlated errors using the symmetrical class of scale mixtures of normal distributions is introduced in this article. The scale mixture of normal distributions is an attractive class of symmetric distributions that includes the normal, Student-t, slash and contaminated normal distributions as special cases, providing a robust alternative to estimation in stochastic volatility in mean models in the absence of normality. Using a Bayesian paradigm, an efficient method based on Markov chain Monte Carlo (MCMC) is developed for parameter estimation. The methods developed are applied to analyze daily stock return data from the São Paulo Stock, Mercantile & Futures Exchange index (IBOVESPA). The Bayesian predictive information criteria (BPIC) and the logarithm of the marginal likelihood are used as model selection criteria. The results reveal that the stochastic volatility in mean model with correlated errors and slash distribution provides a significant improvement in model fit for the IBOVESPA data over the usual normal model. 相似文献

17.

On the distributions of components sums of squares in one-way unbalanced random model under heter06ene0us error variances

B. Singh D.D. Joshi 《统计学通讯:理论与方法》2013,42(11):4137-4144

Exact sampling distributions of sums of squares in the unbalanced one-way random model are obtained under heterogeneous error variances. These distributions are used to investigate the effect of heteroscedasticity and unbalancedness on the probability of negative estimate of the group variance component. The computed results reveal that heteroscedasticity affects the probability of negative estimate in all situations of group sizes. Further, the probability decreases with heterogeneity of error variances for balanced situations and increases with variability among group size for equal error variances case. 相似文献

18.

Resistant selection of the smoothing parameter for smoothing splines

Eva Cantoni Elvezio Ronchetti 《Statistics and Computing》2001,11(2):141-146

Robust automatic selection techniques for the smoothing parameter of a smoothing spline are introduced. They are based on a robust predictive error criterion and can be viewed as robust versions of C _p and cross-validation. They lead to smoothing splines which are stable and reliable in terms of mean squared error over a large spectrum of model distributions. 相似文献

19.

Statistical analysis of the number of self-overlapping leftmost repeats in an homogeneous stationary Markov chain on finite states

Ferhat Ziram Dominique Cellier François Charlot 《统计学通讯:理论与方法》2013,42(20):6087-6101

ABSTRACT

This article addresses the problem of repeats detection used in the comparison of significant repeats in sequences. The case of self-overlapping leftmost repeats for large sequences generated by an homogeneous stationary Markov chain has not been treated in the literature. In this work, we are interested by the approximation of the number of self-overlapping leftmost long enough repeats distribution in an homogeneous stationary Markov chain. Using the Chen–Stein method, we show that the number of self-overlapping leftmost long enough repeats distribution is approximated by the Poisson distribution. Moreover, we show that this approximation can be extended to the case where the sequences are generated by a m-order Markov chain. 相似文献

20.

On the performance of L2E estimation in modelling heterogeneous count responses with extreme values

《Journal of Statistical Computation and Simulation》2012,82(3):564-581

In healthcare studies, count data sets measured with covariates often exhibit heterogeneity and contain extreme values. To analyse such count data sets, we use a finite mixture of regression model framework and investigate a robust estimation approach, called the L₂E [D.W. Scott, On fitting and adapting of density estimates, Comput. Sci. Stat. 30 (1998), pp. 124–133], to estimate the parameters. The L₂E is based on an integrated L₂ distance between parametric conditional and true conditional mass functions. In addition to studying the theoretical properties of the L₂E estimator, we compare the performance of L₂E with the maximum likelihood (ML) estimator and a minimum Hellinger distance (MHD) estimator via Monte Carlo simulations for correctly specified and gross-error contaminated mixture of Poisson regression models. These show that the L₂E is a viable robust alternative to the ML and MHD estimators. More importantly, we use the L₂E to perform a comprehensive analysis of a Western Australia hospital inpatient obstetrical length of stay (LOS) (in days) data that contains extreme values. It is shown that the L₂E provides a two-component Poisson mixture regression fit to the LOS data which is better than those based on the ML and MHD estimators. The L₂E fit identifies admission type as a significant covariate that profiles the predominant subpopulation of normal-stayers as planned patients and the small subpopulation of long-stayers as emergency patients. 相似文献