首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
Reduced k‐means clustering is a method for clustering objects in a low‐dimensional subspace. The advantage of this method is that both clustering of objects and low‐dimensional subspace reflecting the cluster structure are simultaneously obtained. In this paper, the relationship between conventional k‐means clustering and reduced k‐means clustering is discussed. Conditions ensuring almost sure convergence of the estimator of reduced k‐means clustering as unboundedly increasing sample size have been presented. The results for a more general model considering conventional k‐means clustering and reduced k‐means clustering are provided in this paper. Moreover, a consistent selection of the numbers of clusters and dimensions is described.  相似文献   

2.
The k nearest neighbors (k-NN) classifier is one of the most popular methods for statistical pattern recognition and machine learning. In practice, the size k, the number of neighbors used for classification, is usually arbitrarily set to one or some other small numbers, or based on the cross-validation procedure. In this study, we propose a novel alternative approach to decide the size k. Based on a k-NN-based multivariate multi-sample test, we assign each k a permutation test based Z-score. The number of NN is set to the k with the highest Z-score. This approach is computationally efficient since we have derived the formulas for the mean and variance of the test statistic under permutation distribution for multiple sample groups. Several simulation and real-world data sets are analyzed to investigate the performance of our approach. The usefulness of our approach is demonstrated through the evaluation of prediction accuracies using Z-score as a criterion to select the size k. We also compare our approach to the widely used cross-validation approaches. The results show that the size k selected by our approach yields high prediction accuracies when informative features are used for classification, whereas the cross-validation approach may fail in some cases.  相似文献   

3.
The dimension reduction in regression is an efficient method of overcoming the curse of dimensionality in non-parametric regression. Motivated by recent developments for dimension reduction in time series, an empirical extension of central mean subspace in time series to a single-input transfer function model is performed in this paper. Here, we use central mean subspace as a tool of dimension reduction for bivariate time series in the case when the dimension and lag are known and estimate the central mean subspace through the Nadaraya–Watson kernel smoother. Furthermore, we develop a data-dependent approach based on a modified Schwarz Bayesian criterion to estimate the unknown dimension and lag. Finally, we show that the approach in bivariate time series works well using an expository demonstration, two simulations, and a real data analysis such as El Niño and fish Population.  相似文献   

4.
Contours may be viewed as the 2D outline of the image of an object. This type of data arises in medical imaging as well as in computer vision and can be modeled as data on a manifold and can be studied using statistical shape analysis. Practically speaking, each observed contour, while theoretically infinite dimensional, must be discretized for computations. As such, the coordinates for each contour as obtained at k sampling times, resulting in the contour being represented as a k-dimensional complex vector. While choosing large values of k will result in closer approximations to the original contour, this will also result in higher computational costs in the subsequent analysis. The goal of this study is to determine reasonable values for k so as to keep the computational cost low while maintaining accuracy. To do this, we consider two methods for selecting sample points and determine lower bounds for k for obtaining a desired level of approximation error using two different criteria. Because this process is computationally inefficient to perform on a large scale, we then develop models for predicting the lower bounds for k based on simple characteristics of the contours.  相似文献   

5.
Liu and Singh (1993, 2006) introduced a depth‐based d‐variate extension of the nonparametric two sample scale test of Siegel and Tukey (1960). Liu and Singh (2006) generalized this depth‐based test for scale homogeneity of k ≥ 2 multivariate populations. Motivated by the work of Gastwirth (1965), we propose k sample percentile modifications of Liu and Singh's proposals. The test statistic is shown to be asymptotically normal when k = 2, and compares favorably with Liu and Singh (2006) if the underlying distributions are either symmetric with light tails or asymmetric. In the case of skewed distributions considered in this paper the power of the proposed tests can attain twice the power of the Liu‐Singh test for d ≥ 1. Finally, in the k‐sample case, it is shown that the asymptotic distribution of the proposed percentile modified Kruskal‐Wallis type test is χ2 with k ? 1 degrees of freedom. Power properties of this k‐sample test are similar to those for the proposed two sample one. The Canadian Journal of Statistics 39: 356–369; 2011 © 2011 Statistical Society of Canada  相似文献   

6.
G = F k (k > 1); G = 1 − (1−F) k (k < 1); G = F k (k < 1); and G = 1 − (1−F) k (k > 1), where F and G are two continuous cumulative distribution functions. If an optimal precedence test (one with the maximal power) is determined for one of these four classes, the optimal tests for the other classes of alternatives can be derived. Application of this is given using the results of Lin and Sukhatme (1992) who derived the best precedence test for testing the null hypothesis that the lifetimes of two types of items on test have the same distibution. The test has maximum power for fixed κ in the class of alternatives G = 1 − (1−F) k , with k < 1. Best precedence tests for the other three classes of Lehmann-type alternatives are derived using their results. Finally, a comparison of precedence tests with Wilcoxon's two-sample test is presented. Received: February 22, 1999; revised version: June 7, 2000  相似文献   

7.
Abstract

K-means inverse regression was developed as an easy-to-use dimension reduction procedure for multivariate regression. This approach is similar to the original sliced inverse regression method, with the exception that the slices are explicitly produced by a K-means clustering of the response vectors. In this article, we propose K-medoids clustering as an alternative clustering approach for slicing and compare its performance to K-means in a simulation study. Although the two methods often produce comparable results, K-medoids tends to yield better performance in the presence of outliers. In addition to isolation of outliers, K-medoids clustering also has the advantage of accommodating a broader range of dissimilarity measures, which could prove useful in other graphical regression applications where slicing is required.  相似文献   

8.
Li et al. (2011 Li, B., Artemiou, A., Li, L. (2011). Principal support vector machine for linear and nonlinear sufficient dimension reduction. Ann. Stat. 39:31823210.[Crossref], [Web of Science ®] [Google Scholar]) presented the novel idea of using support vector machines (SVMs) to perform sufficient dimension reduction. In this work, we investigate the potential improvement in recovering the dimension reduction subspace when one changes the SVM algorithm to treat imbalance based on several proposals in the machine learning literature. We find out that in most situations, treating the imbalanced nature of the slices will help improve the estimation. Our results are verified through simulation and real data applications.  相似文献   

9.
We discuss the covariate dimension reduction properties of conditional density ratios in the estimation of balanced contrasts of expectations. Conditional density ratios, as well as related sufficient summaries, can be used to replace the covariates with a smaller number of variables. For example, for comparisons among k   populations the covariates can be replaced with k-1k-1 conditional density ratios. The dimension reduction properties of conditional density ratios are directly connected with sufficiency, the dimension reduction concepts considered in regression theory, and propensity theory. The theory presented here extends the ideas in propensity theory to situations in which propensities do not exist and develops an approach to dimension reduction outside of the potential outcomes or counterfactual framework. Under general conditions, we show that a principal components transformation of the estimated conditional density ratios can be used to investigate whether a sufficient summary of dimension lower than k-1k-1 exists and to identify such a lower dimensional summary.  相似文献   

10.
Jae Keun Yoo 《Statistics》2018,52(2):409-425
In this paper, a model-based approach to reduce the dimension of response variables in multivariate regression is newly proposed, following the existing context of the response dimension reduction developed by Yoo and Cook [Response dimension reduction for the conditional mean in multivariate regression. Comput Statist Data Anal. 2008;53:334–343]. The related dimension reduction subspace is estimated by maximum likelihood, assuming an additive error. In the new approach, the linearity condition, which is assumed for the methodological development in Yoo and Cook (2008), is understood through the covariance matrix of the random error. Numerical studies show potential advantages of the proposed approach over Yoo and Cook (2008). A real data example is presented for illustration.  相似文献   

11.
The two-sample problem and its extension to the k-sample problem are well known in the statistical literature. However, the discrete version of the k-sample problem is relatively less explored. Here in this work we suggest a k-sample non-parametric test procedure for discrete distributions based on mutual information. A detailed power study with comparison with other alternatives is provided. Finally, a comparison of some English soccer league teams based on their goal-scoring pattern is discussed.  相似文献   

12.
13.
Many multivariate statistical procedures are based on the assumption of normality and different approaches have been proposed for testing this assumption. The vast majority of these tests, however, are exclusively designed for cases when the sample size n is larger than the dimension of the variable p, and the null distributions of their test statistics are usually derived under the asymptotic case when p is fixed and n increases. In this article, a test that utilizes principal components to test for nonnormality is proposed for cases when p/nc. The power and size of the test are examined through Monte Carlo simulations, and it is argued that the test remains well behaved and consistent against most nonnormal distributions under this type of asymptotics.  相似文献   

14.
Tests of homogeneity of normal means with the alternative restricted by an ordering on the means are considered. The simply ordered case, μ1 ≤ μ2 ≤ ··· ≤ μk, and the simple tree ordering, μ1 ≤ μj, for; j= 2, 3,…, k, are emphasized. A modification of the likelihood-ratio test is proposed which is asymptotically equivalent to it but is more robust to violations of the hypothesized orderings. The new test has power at the points satisfying the hypothesized ordering which is similar to that of the likelihood-ratio test provided the degrees of freedom are not too small. The modified test is shown to be unbiased and consistent.  相似文献   

15.
16.
Exact ksample permutation tests for binary data for three commonly encountered hypotheses tests are presented,, The tests are derived both under the population and randomization models . The generating function for the number of cases in the null distribution is obtained, The asymptotic distributions of the test statistics are derived . Actual significance levels are computed for the asymptotic test versions , Random sampling of the null distribution is suggested as a superior alternative to the asymptotics and an efficient computer technique for implementing the random sampling is described., finally, some numerical examples are presented and sample size guidelines given for computer implementation of the exact tests.  相似文献   

17.
A novel distribution-free k-sample test of differences in location shifts based on the analysis of kernel density functional estimation is introduced and studied. The proposed test parallels one-way analysis of variance and the Kruskal–Wallis (KW) test aiming at testing locations of unknown distributions. In contrast to the rank (score)-transformed non-parametric approach, such as the KW test, the proposed F-test uses the measurement responses along with well-known kernel density estimation (KDE) to estimate the locations and construct the test statistic. A practical optimal bandwidth selection procedure is also provided. Our simulation studies and real data example indicate that the proposed analysis of kernel density functional estimate (ANDFE) test is superior to existing competitors for fat-tailed or heavy-tailed distributions when the k groups differ mainly in location rather than shape, especially with unbalanced data. ANDFE is also highly recommended when it is unclear whether test groups differ mainly in shape or location. The Canadian Journal of Statistics 48: 167–186; 2020 © 2019 Statistical Society of Canada  相似文献   

18.
Zerbet and Nikulin presented the new statistic Z k for detecting outliers in exponential distribution. They also compared this statistic with Dixon's statistic D k . In this article, we extend this approach to gamma distribution and compare the result with Dixon's statistic. The results show that the test based on statistic Z k is more powerful than the test based on the Dixon's statistic.  相似文献   

19.
Dimension reduction with bivariate responses, especially a mix of a continuous and categorical responses, can be of special interest. One immediate application is to regressions with censoring. In this paper, we propose two novel methods to reduce the dimension of the covariates of a bivariate regression via a model-free approach. Both methods enjoy a simple asymptotic chi-squared distribution for testing the dimension of the regression, and also allow us to test the contributions of the covariates easily without pre-specifying a parametric model. The new methods outperform the current one both in simulations and in analysis of a real data. The well-known PBC data are used to illustrate the application of our method to censored regression.  相似文献   

20.
Testing for equality of competing risks based on their cumulative incidence functions (CIFs) or their cause specific hazard rates (CSHRs) has been considered by many authors. The finite sample distributions of the existing test statistics are in general complicated and the use of their asymptotic distributions can lead to conservative tests. In this paper we show how to perform some of these tests using the conditional distributions of their corresponding test statistics instead (conditional on the observed data). The resulting conditional tests are initially developed for the case of k = 2 and are then extended to k > 2 by performing a sequence of two sample tests and by combining several risks into one. A simulation study to compare the powers of several tests based on their conditional and asymptotic distributions shows that using conditional tests leads to a gain in power. A real life example is also discussed to show how to implement such conditional tests.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号