首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The K-means algorithm and the normal mixture model method are two common clustering methods. The K-means algorithm is a popular heuristic approach which gives reasonable clustering results if the component clusters are ball-shaped. Currently, there are no analytical results for this algorithm if the component distributions deviate from the ball-shape. This paper analytically studies how the K-means algorithm changes its classification rule as the normal component distributions become more elongated under the homoscedastic assumption and compares this rule with that of the Bayes rule from the mixture model method. We show that the classification rules of both methods are linear, but the slopes of the two classification lines change in the opposite direction as the component distributions become more elongated. The classification performance of the K-means algorithm is then compared to that of the mixture model method via simulation. The comparison, which is limited to two clusters, shows that the K-means algorithm provides poor classification performances consistently as the component distributions become more elongated while the mixture model method can potentially, but not necessarily, take advantage of this change and provide a much better classification performance.  相似文献   

2.
The estimation of the mean of an univariate normal population with unknown variance is considered when uncertain non-sample prior information is available. Alternative estimators are defined to incorporate both the sample as well as the non-sample information in the estimation process. Some of the important statistical properties of the restricted, preliminary test, and shrinkage estimators are investigated. The performances of the estimators are compared based on the criteria of unbiasedness and mean square error in order to search for a ‘best’ estimator. Both analytical and graphical methods are explored. There is no superior estimator that uniformly dominates the others. However, if the non-sample information regarding the value of the mean is close to its true value, the shrinkage estimator over performs the rest of the estimators. Received: June 19, 1999; revised version: March 23, 2000  相似文献   

3.
4.
We propose optimal procedures to achieve the goal of partitioning k multivariate normal populations into two disjoint subsets with respect to a given standard vector. Definition of good or bad multivariate normal populations is given according to their Mahalanobis distances to a known standard vector as being small or large. Partitioning k multivariate normal populations is reduced to partitioning k non-central Chi-square or non-central F distributions with respect to the corresponding non-centrality parameters depending on whether the covariance matrices are known or unknown. The minimum required sample size for each population is determined to ensure that the probability of correct decision attains a certain level. An example is given to illustrate our procedures.  相似文献   

5.
6.
A p-value is developed for testing the equivalence of the variances of a bivariate normal distribution. The unknown correlation coefficient is a nuisance parameter in the problem. If the correlation is known, the proposed p-value provides an exact test. For large samples, the p-value can be computed by replacing the unknown correlation by the sample correlation, and the resulting test is quite satisfactory. For small samples, it is proposed to compute the p-value by replacing the unknown correlation by a scalar multiple of the sample correlation. However, a single scalar is not satisfactory, and it is proposed to use different scalars depending on the magnitude of the sample correlation coefficient. In order to implement this approach, tables are obtained providing sub-intervals for the sample correlation coefficient, and the scalars to be used if the sample correlation coefficient belongs to a particular sub-interval. Once such tables are available, the proposed p-value is quite easy to compute since it has an explicit analytic expression. Numerical results on the type I error probability and power are reported on the performance of such a test, and the proposed p-value test is also compared to another test based on a rejection region. The results are illustrated with two examples: an example dealing with the comparability of two measuring devices, and an example dealing with the assessment of bioequivalence.  相似文献   

7.
This paper extends the results of canonical correlation analysis of Anderson [2002. Canonical correlation analysis and reduced-rank regression in autoregressive models. Ann. Statist. 30, 1134–1154] to a vector AR(1) process with a vector ARCH(1) innovations. We obtain the limiting distributions of the sample matrices, the canonical correlations and the canonical vectors of the process. The extension is important because many time series in economics and finance exhibit conditional heteroscedasticity. We also use simulation to demonstrate the effects of ARCH innovations on the canonical correlation analysis in finite sample. Both the limiting distributions and simulation results show that overlooking the ARCH effects in canonical correlation analysis can easily lead to erroneous inference.  相似文献   

8.
9.
The weighted likelihood is a generalization of the likelihood designed to borrow strength from similar populations while making minimal assumptions. If the weights are properly chosen, the maximum weighted likelihood estimate may perform better than the maximum likelihood estimate (MLE). In a previous article, the minimum averaged mean squared error (MAMSE) weights are proposed and simulations show that they allow to outperform the MLE in many cases. In this paper, we study the asymptotic properties of the MAMSE weights. In particular, we prove that the MAMSE-weighted mixture of empirical distribution functions converges uniformly to the target distribution and that the maximum weighted likelihood estimate is strongly consistent. A short simulation illustrates the use of bootstrap in this context.  相似文献   

10.
We discuss a general application of categorical data analysis to mutations along the HIV genome. We consider a multidimensional table for several positions at the same time. Due to the complexity of the multidimensional table, we may collapse it by pooling some categories. However, the association between the remaining variables may not be the same as before collapsing. We discuss the collapsibility of tables and the change in the meaning of parameters after collapsing categories. We also address this problem with a log-linear model. We present a parameterization with the consensus output as the reference cell as is appropriate to explain genomic mutations in HIV. We also consider five null hypotheses and some classical methods to address them. We illustrate methods for six positions along the HIV genome, through consideration of all triples of positions.  相似文献   

11.
The asymptotically best linear unbiased estimate (ABLUE) of the normal mean is discussed. The estimate is based on k selected order statistics chosen from a singly or doubly censored large sample of size n(>k). The coefficients, the asymptotic relative efficiency of the estimate, and the optimum spacing of k real numbers between 0 and 1 which determines the optimum ranks of order statistics, are provided. A comparison between the ABLUE and the iterated maximum likelihood estimate is made.  相似文献   

12.
13.
In this paper the problem of estimating the scale matrix in a complex elliptically contoured distribution (complex ECD) is addressed. An extended Haff–Stein identity for this model is derived. It is shown that the minimax estimators of the covariance matrix obtained under the complex normal model remain robust under the complex ECD model when the Stein loss function is employed.  相似文献   

14.
A general modeling procedure for analyzing genetic data is reviewed. We review ANOVA type model that can handle both the continuous and discrete genetic variables in one modeling framework. Unlike the regression type models which typically set the phenotype variable as a response, this ANOVA model treats the phenotype variable as an explanatory variable. By reversely treating the phenotype variable, usual high dimensional problem is turned into low dimension. Instead, the ANOVA model always includes interaction term between the genetic locations and phenotype variable to find potential association between them. The interaction term is designed to be low rank with the multiplication of bilinear terms so that the required number of parameters is kept in a manageable degree. We compare the performance of the reviewed ANOVA model to the other popular methods via microarray and SNP data sets.  相似文献   

15.
This paper presents a class of estimators for the mean of a normal population and determines the conditions on characterizing scalars under which the class of estimators uniformly dominates over the conventional sample mean according to the mean-square-error criterion.  相似文献   

16.
17.
In this paper a set of residuals for the multivariate linear regression model is introduced. These residuals are shown to be independent with known distributions which do not depend on the parameters of the model. Transformations of the mentioned residuals may be used to construct exact α goodness-of-fit tests for the multivariate regression model.  相似文献   

18.
19.
R. Van de Ven  N. C. Weber 《Statistics》2013,47(3-4):345-352
Upper and lower bounds are obtained for the mean of the negative binomial distribution. These bounds are simple functions of a percentile determined by the shape parameter. The result is then used to obtain a robust estimate of the mean when the shape parameter is known.  相似文献   

20.
In this paper tests of hypothesis are constructed for the family of skew normal distributions. The proposed tests utilize the fact that the moment generating function of the skew normal variable satisfies a simple differential equation. The empirical counterpart of this equation, involving the empirical moment generating function, yields simple consistent test statistics. Finite-sample results as well as results from real data are provided for the proposed procedures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号