共查询到20条相似文献,搜索用时 134 毫秒
1.
Model-based clustering typically involves the development of a family of mixture models and the imposition of these models upon data. The best member of the family is then chosen using some criterion and the associated parameter estimates lead to predicted group memberships, or clusterings. This paper describes the extension of the mixtures of multivariate t-factor analyzers model to include constraints on the degrees of freedom, the factor loadings, and the error variance matrices. The result is a family of six mixture models, including parsimonious models. Parameter estimates for this family of models are derived using an alternating expectation-conditional maximization algorithm and convergence is determined based on Aitken’s acceleration. Model selection is carried out using the Bayesian information criterion (BIC) and the integrated completed likelihood (ICL). This novel family of mixture models is then applied to simulated and real data where clustering performance meets or exceeds that of established model-based clustering methods. The simulation studies include a comparison of the BIC and the ICL as model selection techniques for this novel family of models. Application to simulated data with larger dimensionality is also explored. 相似文献
2.
In high-dimensional data, one often seeks a few interesting low-dimensional projections which reveal important aspects of the data. Projection pursuit for classification finds projections that reveal differences between classes. Even though projection pursuit is used to bypass the curse of dimensionality, most indexes will not work well when there are a small number of observations relative to the number of variables, known as a large p (dimension) small n (sample size) problem. This paper discusses the relationship between the sample size and dimensionality on classification and proposes a new projection pursuit index that overcomes the problem of small sample size for exploratory classification. 相似文献
3.
Let X be a N(μ, σ 2) distributed characteristic with unknown σ. We present the minimax version of the two-stage t test having minimal maximal average sample size among all two-stage t tests obeying the classical two-point-condition on the operation characteristic. We give several examples. Furthermore, the minimax version of the two-stage t test is compared with the corresponding two-stage Gauß test. 相似文献
4.
A finite mixture model using the Student's t distribution has been recognized as a robust extension of normal mixtures. Recently, a mixture of skew normal distributions
has been found to be effective in the treatment of heterogeneous data involving asymmetric behaviors across subclasses. In
this article, we propose a robust mixture framework based on the skew t distribution to efficiently deal with heavy-tailedness, extra skewness and multimodality in a wide range of settings. Statistical
mixture modeling based on normal, Student's t and skew normal distributions can be viewed as special cases of the skew t mixture model. We present analytically simple EM-type algorithms for iteratively computing maximum likelihood estimates.
The proposed methodology is illustrated by analyzing a real data example. 相似文献
5.
This paper deals with the problem of testing statistical hypotheses when both the hypotheses and data are fuzzy. To this end, we first introduce the concept of fuzzy p-value and then develop an approach for testing fuzzy hypotheses by comparing a fuzzy p-value and a fuzzy significance level. Numerical examples are provided to illustrate the approach for different cases. 相似文献
6.
7.
The skew t-distribution includes both the skew normal and the normal distributions as special cases. Inference for the skew t-model becomes problematic in these cases because the expected information matrix is singular and the parameter corresponding to the degrees of freedom takes a value at the boundary of its parameter space. In particular, the distributions of the likelihood ratio statistics for testing the null hypotheses of skew normality and normality are not asymptotically \(\chi ^2\). The asymptotic distributions of the likelihood ratio statistics are considered by applying the results of Self and Liang (J Am Stat Assoc 82:605–610, 1987) for boundary-parameter inference in terms of reparameterizations designed to remove the singularity of the information matrix. The Self–Liang asymptotic distributions are mixtures, and it is shown that their accuracy can be improved substantially by correcting the mixing probabilities. Furthermore, although the asymptotic distributions are non-standard, versions of Bartlett correction are developed that afford additional accuracy. Bootstrap procedures for estimating the mixing probabilities and the Bartlett adjustment factors are shown to produce excellent approximations, even for small sample sizes. 相似文献
8.
9.
In the present paper the distribution theory of maximum and minimum of ther th concomitants from k independent subgroups each of same size m from the Morgenstern family is investigated. Some applications of the results in estimation of the scale parameter of a marginal variable in the bivariate uniform distribution and a selection problem are discussed. 相似文献
10.
11.
Constrained monotone EM algorithms for mixtures of multivariate <Emphasis Type="Italic">t</Emphasis> distributions 总被引:1,自引:0,他引:1
Mixtures of multivariate t distributions provide a robust parametric extension to the fitting of data with respect to normal mixtures. In presence of
some noise component, potential outliers or data with longer-than-normal tails, one way to broaden the model can be provided
by considering t distributions. In this framework, the degrees of freedom can act as a robustness parameter, tuning the heaviness of the tails,
and downweighting the effect of the outliers on the parameters estimation. The aim of this paper is to extend to mixtures
of multivariate elliptical distributions some theoretical results about the likelihood maximization on constrained parameter
spaces. Further, a constrained monotone algorithm implementing maximum likelihood mixture decomposition of multivariate t distributions is proposed, to achieve improved convergence capabilities and robustness. Monte Carlo numerical simulations
and a real data study illustrate the better performance of the algorithm, comparing it to earlier proposals. 相似文献
12.
The distribution of the probabilities of misclassification is derived in this paper, which are reproduced by the use of the
linear discriminant function. The statistical background is two independent doubly truncated t populations with distinct location parameters and common scale parameter and degrees of freedom. The behavior of the linear
discriminant function is studied by comparing the distribution function of the errors of misclassification under the truncated
t and truncated normal models. 相似文献
13.
Martin Bachmaier 《Statistical Papers》2009,50(3):649-660
An exact confidence set for that x-coordinate where a quadratic regression model has a given gradient is derived. The limits of the confidence set are given by mathematical formulae. They are implemented in Fortran programs that can be downloaded from the web. The confidence set need not be an interval. Its increase and its changing shape for increasing confidence level is extensively described and visualized in a figure that relates to data from nitrogen-rate trials in Germany. The wheat yields in this example are modeled as quadratic functions of the nitrogen input in order to determine a confidence set for the economically optimum nitrogen fertilization. The disadvantage that the confidence set does not distinguish between concave and convex parabolae, between profit maxima and minima, is also discussed. 相似文献
14.
Francesco Dalla Valle Fortunato Pesarin Luigi Salmaso 《Statistical Methods and Applications》2002,11(3):265-276
Exact permutation testing of effects in unreplicated two-level multifactorial designs is developed based on the notion of
realigning observations and on paired permutations. This approach preserves the exchangeability of error components for testing
up tok effects. Advantages and limitations of exact permutation procedures for unreplicated factorials are discussed and a simulation
study on paired permutation testing is presented. 相似文献
15.
This paper presents a novel framework for maximum likelihood (ML) estimation in skew-t factor analysis (STFA) models in the presence of missing values or nonresponses. As a robust extension of the ordinary factor analysis model, the STFA model assumes a restricted version of the multivariate skew-t distribution for the latent factors and the unobservable errors to accommodate non-normal features such as asymmetry and heavy tails or outliers. An EM-type algorithm is developed to carry out ML estimation and imputation of missing values under a missing at random mechanism. The practical utility of the proposed methodology is illustrated through real and synthetic data examples. 相似文献
16.
Based on an FQ-System for continuous unimodal distributions, which was introduced by Scheffner (1998), we propose a pure data-driven method
for density estimation, which provides good results even for small samples. This procedure does not involve any problems or
uncertainties as e.g. bandwidth selection for kernel density estimates. 相似文献
17.
The paper considers joint maximum likelihood (ML) and semiparametric (SP) estimation of copula parameters in a bivariate t-copula. Analytical expressions for the asymptotic covariance matrix involving integrals over special functions are derived,
which can be evaluated numerically. These direct evaluations of the Fisher information matrix are compared to Hessian evaluations
based on numerical differentiation in a simulation study showing a satisfactory performance of the computationally less demanding
Hessian evaluations. Individual asymptotic confidence intervals for the t-copula parameters and the corresponding tail dependence coefficient are derived. For two financial datasets these confidence
intervals are calculated using both direct evaluation of the Fisher information and numerical evaluation of the Hessian matrix.
These confidence intervals are compared to parametric and nonparametric BCA bootstrap intervals based on ML and SP estimation,
respectively, showing a preference for asymptotic confidence intervals based on numerical Hessian evaluations. 相似文献
18.
For measuring the goodness of 2
m
41 designs, Wu and Zhang (1993) proposed the minimum aberration (MA) criterion. MA 2
m
41 designs have been constructed using the idea of complementary designs when the number of two-level factors, m, exceeds n/2, where n is the total number of runs. In this paper, the structures of MA 2
m
41 designs are obtained when m>5n/16. Based on these structures, some methods are developed for constructing MA 2
m
41 designs for 5n/16<m<n/2 as well as for n/2≤m<n. When m≤5n/16, there is no general method for constructing MA 2
m
41 designs. In this case, we obtain lower bounds for A
30 and A
31, where A
30 and A
31 are the numbers of type 0 and type 1 words with length three respectively. And a method for constructing weak minimum aberration
(WMA) 2
m
41 designs (A
30 and A
31 achieving the lower bounds) is demonstrated. Some MA or WMA 2
m
41 designs with 32 or 64 runs are tabulated for practical use, which supplement the tables in Wu and Zhang (1993), Zhang and
Shao (2001) and Mukerjee and Wu (2001). 相似文献
19.
This paper (i) discusses theR-chart with asymmetric probability control limits under the assumption that the distribution of the quality characteristic
under study is either exponential, Laplace, or logistic, (ii) examines the effect of the estimated probability limits on the
performance of theR-chart, and (iii) obtains the desired probability limits of theR-chart that has a specified false alarm rate when probability limits must be estimated from preliminary samples taken from
either the exponential, Laplace, or logistic processes. 相似文献
20.
Kø-divergence’s statistic family for goodness-of-fit, under the null hypothesis, has an asymptotic chi-squared distribution; however, for small samples, the chi-squared approximation in some cases does not well agree with the exact distribution. In this paper, a closer approximation to the exact distribution is obtained by extracting the ø-dependent second order component from the distribution. Moreover, numerical results are presented for moderate sample sizes with moderate number of cells. 相似文献