首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
In high-dimensional data, one often seeks a few interesting low-dimensional projections which reveal important aspects of the data. Projection pursuit for classification finds projections that reveal differences between classes. Even though projection pursuit is used to bypass the curse of dimensionality, most indexes will not work well when there are a small number of observations relative to the number of variables, known as a large p (dimension) small n (sample size) problem. This paper discusses the relationship between the sample size and dimensionality on classification and proposes a new projection pursuit index that overcomes the problem of small sample size for exploratory classification.  相似文献   

4.
Model-based clustering typically involves the development of a family of mixture models and the imposition of these models upon data. The best member of the family is then chosen using some criterion and the associated parameter estimates lead to predicted group memberships, or clusterings. This paper describes the extension of the mixtures of multivariate t-factor analyzers model to include constraints on the degrees of freedom, the factor loadings, and the error variance matrices. The result is a family of six mixture models, including parsimonious models. Parameter estimates for this family of models are derived using an alternating expectation-conditional maximization algorithm and convergence is determined based on Aitken’s acceleration. Model selection is carried out using the Bayesian information criterion (BIC) and the integrated completed likelihood (ICL). This novel family of mixture models is then applied to simulated and real data where clustering performance meets or exceeds that of established model-based clustering methods. The simulation studies include a comparison of the BIC and the ICL as model selection techniques for this novel family of models. Application to simulated data with larger dimensionality is also explored.  相似文献   

5.
Traditionally, when applying the two-sample t test, some pre-testing occurs. That is, the theory-based assumptions of normal distributions as well as of homogeneity of the variances are often tested in applied sciences in advance of the tried-for t test. But this paper shows that such pre-testing leads to unknown final type-I- and type-II-risks if the respective statistical tests are performed using the same set of observations. In order to get an impression of the extension of the resulting misinterpreted risks, some theoretical deductions are given and, in particular, a systematic simulation study is done. As a result, we propose that it is preferable to apply no pre-tests for the t test and no t test at all, but instead to use the Welch-test as a standard test: its power comes close to that of the t test when the variances are homogeneous, and for unequal variances and skewness values |γ 1| < 3, it keeps the so called 20% robustness whereas the t test as well as Wilcoxon’s U test cannot be recommended for most cases.  相似文献   

6.
This paper deals with the problem of testing statistical hypotheses when both the hypotheses and data are fuzzy. To this end, we first introduce the concept of fuzzy p-value and then develop an approach for testing fuzzy hypotheses by comparing a fuzzy p-value and a fuzzy significance level. Numerical examples are provided to illustrate the approach for different cases.  相似文献   

7.
This paper (i) discusses theR-chart with asymmetric probability control limits under the assumption that the distribution of the quality characteristic under study is either exponential, Laplace, or logistic, (ii) examines the effect of the estimated probability limits on the performance of theR-chart, and (iii) obtains the desired probability limits of theR-chart that has a specified false alarm rate when probability limits must be estimated from preliminary samples taken from either the exponential, Laplace, or logistic processes.  相似文献   

8.
In the present paper the distribution theory of maximum and minimum of ther th concomitants from k independent subgroups each of same size m from the Morgenstern family is investigated. Some applications of the results in estimation of the scale parameter of a marginal variable in the bivariate uniform distribution and a selection problem are discussed.  相似文献   

9.
10.
An exact confidence set for that x-coordinate where a quadratic regression model has a given gradient is derived. The limits of the confidence set are given by mathematical formulae. They are implemented in Fortran programs that can be downloaded from the web. The confidence set need not be an interval. Its increase and its changing shape for increasing confidence level is extensively described and visualized in a figure that relates to data from nitrogen-rate trials in Germany. The wheat yields in this example are modeled as quadratic functions of the nitrogen input in order to determine a confidence set for the economically optimum nitrogen fertilization. The disadvantage that the confidence set does not distinguish between concave and convex parabolae, between profit maxima and minima, is also discussed.  相似文献   

11.
Exact permutation testing of effects in unreplicated two-level multifactorial designs is developed based on the notion of realigning observations and on paired permutations. This approach preserves the exchangeability of error components for testing up tok effects. Advantages and limitations of exact permutation procedures for unreplicated factorials are discussed and a simulation study on paired permutation testing is presented.  相似文献   

12.
For measuring the goodness of 2 m 41 designs, Wu and Zhang (1993) proposed the minimum aberration (MA) criterion. MA 2 m 41 designs have been constructed using the idea of complementary designs when the number of two-level factors, m, exceeds n/2, where n is the total number of runs. In this paper, the structures of MA 2 m 41 designs are obtained when m>5n/16. Based on these structures, some methods are developed for constructing MA 2 m 41 designs for 5n/16<m<n/2 as well as for n/2≤m<n. When m≤5n/16, there is no general method for constructing MA 2 m 41 designs. In this case, we obtain lower bounds for A 30 and A 31, where A 30 and A 31 are the numbers of type 0 and type 1 words with length three respectively. And a method for constructing weak minimum aberration (WMA) 2 m 41 designs (A 30 and A 31 achieving the lower bounds) is demonstrated. Some MA or WMA 2 m 41 designs with 32 or 64 runs are tabulated for practical use, which supplement the tables in Wu and Zhang (1993), Zhang and Shao (2001) and Mukerjee and Wu (2001).  相似文献   

13.
A finite mixture model using the Student's t distribution has been recognized as a robust extension of normal mixtures. Recently, a mixture of skew normal distributions has been found to be effective in the treatment of heterogeneous data involving asymmetric behaviors across subclasses. In this article, we propose a robust mixture framework based on the skew t distribution to efficiently deal with heavy-tailedness, extra skewness and multimodality in a wide range of settings. Statistical mixture modeling based on normal, Student's t and skew normal distributions can be viewed as special cases of the skew t mixture model. We present analytically simple EM-type algorithms for iteratively computing maximum likelihood estimates. The proposed methodology is illustrated by analyzing a real data example.  相似文献   

14.
Based on an FQ-System for continuous unimodal distributions, which was introduced by Scheffner (1998), we propose a pure data-driven method for density estimation, which provides good results even for small samples. This procedure does not involve any problems or uncertainties as e.g. bandwidth selection for kernel density estimates.  相似文献   

15.
The r largest order statistics approach is widely used in extreme value analysis because it may use more information from the data than just the block maxima. In practice, the choice of r is critical. If r is too large, bias can occur; if too small, the variance of the estimator can be high. The limiting distribution of the r largest order statistics, denoted by GEV\(_r\), extends that of the block maxima. Two specification tests are proposed to select r sequentially. The first is a score test for the GEV\(_r\) distribution. Due to the special characteristics of the GEV\(_r\) distribution, the classical chi-square asymptotics cannot be used. The simplest approach is to use the parametric bootstrap, which is straightforward to implement but computationally expensive. An alternative fast weighted bootstrap or multiplier procedure is developed for computational efficiency. The second test uses the difference in estimated entropy between the GEV\(_r\) and GEV\(_{r-1}\) models, applied to the r largest order statistics and the \(r-1\) largest order statistics, respectively. The asymptotic distribution of the difference statistic is derived. In a large scale simulation study, both tests held their size and had substantial power to detect various misspecification schemes. A new approach to address the issue of multiple, sequential hypotheses testing is adapted to this setting to control the false discovery rate or familywise error rate. The utility of the procedures is demonstrated with extreme sea level and precipitation data.  相似文献   

16.
Kø-divergence’s statistic family for goodness-of-fit, under the null hypothesis, has an asymptotic chi-squared distribution; however, for small samples, the chi-squared approximation in some cases does not well agree with the exact distribution. In this paper, a closer approximation to the exact distribution is obtained by extracting the ø-dependent second order component from the distribution. Moreover, numerical results are presented for moderate sample sizes with moderate number of cells.  相似文献   

17.
In this paper we compare two robust pseudo-likelihoods for a parameter of interest, also in the presence of nuisance parameters. These functions are obtained by computing quasi-likelihood and empirical likelihood from the estimating equations which define robustM-estimators. Application examples in the context of linear transformation models are considered. Monte Carlo studies are performed in order to assess the finite-sample performance of the inferential procedures based on quasi-and empirical likelihood, when the objective is the construction of robust confidence regions.  相似文献   

18.
19.
Let X be a N(μ, σ 2) distributed characteristic with unknown σ. We present the minimax version of the two-stage t test having minimal maximal average sample size among all two-stage t tests obeying the classical two-point-condition on the operation characteristic. We give several examples. Furthermore, the minimax version of the two-stage t test is compared with the corresponding two-stage Gauß test.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号