期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Novel Method for Visualization of Clustering Results

Chao Gao 《统计学通讯:模拟与计算》2013,42(5):1049-1056

Sammon mapping is an approach of nonlinear dimension reduction and can be used for visualization. To avoid numerical complexity of the algorithm of traditional Sammon mapping, Kovacs and Abonyi (2004 Kovacs , A. , Abonyi , J. ( 2004 ). Visualization of fuzzy clustering results by modified Sammon mapping . Proceedings of the 3rd International Symposium of Hungarian Researchers on Computational Intelligence 177 – 188 . [Google Scholar]) proposed a modified Sammon mapping method. However, this improvement can only be applied to fuzzy clustering results. By using the property of Fermat point, we develop a new method in this article that can be applied to any clustering results. Different from other methods of visualization, we transfer information of clustering results into concentric circles around the Fermat points. So our procedure can demonstrate the data structure in a more informative way and the clustering results become easier to understand, especially for nonprofessionals. The effectiveness of the proposed method is studied by application to a real data in this article. 相似文献

2.

The Dantzig Discriminant Analysis with High Dimensional Data

Yanli Zhang Lei Huo Yunhui Zeng 《统计学通讯:理论与方法》2014,43(23):5012-5025

It is well known that linear discriminant analysis (LDA) works well and is asymptotically optimal under fixed-p-large-n situations. But Bickel and Levina (2004 Bickel, P.J., Levina, E. (2004). Some theory for Fishers linear discriminant function, naive Bayes, and some alternatives when there are many more variables than observations. Bernoulli 10:989–1010.[Crossref], [Web of Science ®] , [Google Scholar]) showed that the LDA is as bad as random guessing when p > n. This article studies the sparse discriminant analysis via Dantzig penalized least squares. Our method avoids estimating the high-dimensional covariance matrix and does not need the sparsity assumption on the inverse of the covariance matrix. We show that the new discriminant analysis is asymptotically optimal theoretically. Simulation and real data studies show that the classifier performs better than the existing sparse methods. 相似文献

3.

A computationally efficient nonparametric approach for changepoint detection

Kaylea Haynes Paul Fearnhead Idris A. Eckley 《Statistics and Computing》2017,27(5):1293-1305

In this paper we build on an approach proposed by Zou et al. (2014) for nonparametric changepoint detection. This approach defines the best segmentation for a data set as the one which minimises a penalised cost function, with the cost function defined in term of minus a non-parametric log-likelihood for data within each segment. Minimising this cost function is possible using dynamic programming, but their algorithm had a computational cost that is cubic in the length of the data set. To speed up computation, Zou et al. (2014) resorted to a screening procedure which means that the estimated segmentation is no longer guaranteed to be the global minimum of the cost function. We show that the screening procedure adversely affects the accuracy of the changepoint detection method, and show how a faster dynamic programming algorithm, pruned exact linear time (PELT) (Killick et al. 2012), can be used to find the optimal segmentation with a computational cost that can be close to linear in the amount of data. PELT requires a penalty to avoid under/over-fitting the model which can have a detrimental effect on the quality of the detected changepoints. To overcome this issue we use a relatively new method, changepoints over a range of penalties (Haynes et al. 2016), which finds all of the optimal segmentations for multiple penalty values over a continuous range. We apply our method to detect changes in heart-rate during physical activity. 相似文献

4.

Automatic clustering algorithm for fuzzy data

Wen-Liang Hung Jenn-Hwai Yang 《Journal of applied statistics》2015,42(7):1503-1518

Coppi et al. [7 R. Coppi, P. D'Urso, and P. Giordani, Fuzzy and possibilistic clustering for fuzzy data, Comput. Stat. Data Anal. 56 (2012), pp. 915–927. doi: 10.1016/j.csda.2010.09.013[Crossref], [Web of Science ®] , [Google Scholar]] applied Yang and Wu's [20 M.-S. Yang and K.-L. Wu, Unsupervised possibilistic clustering, Pattern Recognit. 30 (2006), pp. 5–21. doi: 10.1016/j.patcog.2005.07.005[Crossref], [Web of Science ®] , [Google Scholar]] idea to propose a possibilistic k-means (PkM) clustering algorithm for LR-type fuzzy numbers. The memberships in the objective function of PkM no longer need to satisfy the constraint in fuzzy k-means that of a data point across classes sum to one. However, the clustering performance of PkM depends on the initializations and weighting exponent. In this paper, we propose a robust clustering method based on a self-updating procedure. The proposed algorithm not only solves the initialization problems but also obtains a good clustering result. Several numerical examples also demonstrate the effectiveness and accuracy of the proposed clustering method, especially the robustness to initial values and noise. Finally, three real fuzzy data sets are used to illustrate the superiority of this proposed algorithm. 相似文献

5.

The Asymptotic Approximation of EPMC for Linear Discriminant Rules Using a Moore-Penrose Inverse Matrix in High Dimension

Takayuki Yamada Takashi Seo 《统计学通讯:理论与方法》2013,42(18):3329-3338

We consider the discriminant rule in a high-dimensional setting, i.e., when the number of feature variables p is comparable to or larger than the number of observations N. The discriminant rule must be modified in order to cope with singular sample covariance matrix in high-dimension. One way to do so is by considering the Moor-Penrose inverse matrix. Recently, Srivastava (2006 Srivastava , M. S. ( 2006 ). Minimum distance classification rules for high dimensional data . J. Multivariate Anal. 97 : 2057 – 2070 .[Crossref], [Web of Science ®] , [Google Scholar]) proposed maximum likelihood ratio rule by using Moor-Penrose inverse matrix of sample covariance matrix. In this article, we consider the linear discriminant rule by using Moor-Penrose inverse matrix of sample covariance matrix (LDRMP). With the discriminant rule, the expected probability of misclassification (EPMC) is commonly used as measure of the classification accuracy. We investigate properties of EPMC for LDRMP in high-dimension as well as the one of the maximum likelihood rule given by Srivastava (2006 Srivastava , M. S. ( 2006 ). Minimum distance classification rules for high dimensional data . J. Multivariate Anal. 97 : 2057 – 2070 .[Crossref], [Web of Science ®] , [Google Scholar]). From our asymptotic results, we show that the classification accuracy of LDRMP depends on new distance. Additionally, our asymptotic result is verified by using the Monte Carlo simulation. 相似文献

6.

Discriminant analysis under the common principal components model

P. T. Pepler D. W. Uys D. G. Nel 《统计学通讯:模拟与计算》2017,46(6):4812-4827

For two or more populations of which the covariance matrices have a common set of eigenvectors, but different sets of eigenvalues, the common principal components (CPC) model is appropriate. Pepler et al. (2015 Pepler, P. T., Uys, D. W. and Nel, D. G. (2015). Regularised covariance matrix estimation under the common principal components model. Communications in Statistics: Simulation and Computation. (In press). [Google Scholar]) proposed a regularized CPC covariance matrix estimator and showed that this estimator outperforms the unbiased and pooled estimators in situations, where the CPC model is applicable. This article extends their work to the context of discriminant analysis for two groups, by plugging the regularized CPC estimator into the ordinary quadratic discriminant function. Monte Carlo simulation results show that CPC discriminant analysis offers significant improvements in misclassification error rates in certain situations, and at worst performs similar to ordinary quadratic and linear discriminant analysis. Based on these results, CPC discriminant analysis is recommended for situations, where the sample size is small compared to the number of variables, in particular for cases where there is uncertainty about the population covariance matrix structures. 相似文献

7.

A Model Selection Criterion for Discriminant Analysis of Several Groups When the Dimension is Larger than the Total Sample Size

《统计学通讯:理论与方法》2012,41(13-14):2419-2436

This article deals with a criterion for selection of variables for the multiple group discriminant analysis in high-dimensional data. The variable selection models considered for discriminant analysis in Fujikoshi (1985 Fujikoshi , Y. ( 1985 ). Selection of variables in discriminant analysis and canonical correlation analysis . In: Krishnaiah , P. R. , ed. Multivariate Analysis . Vol. VI. Amsterdam : North-Holland , pp. 219 – 236 . [Google Scholar], 2002 Fujikoshi , Y. ( 2002 ). Selection of variables for discriminant analysis in a high-dimensional case . Sankhya Ser. A 64 : 256 – 257 . [Google Scholar]) are the ones based on additional information due to Rao (1948 Rao , C. R. ( 1948 ). Tests of significance in multivariate analysis . Biometrika 35 : 58 – 79 .[Crossref], [PubMed], [Web of Science ®] , [Google Scholar], 1970 Rao , C. R. ( 1970 ). Inference on discriminant function coefficients . In: Bose , R. C. , ed. Essays in Probability and Statistics . Chapel Hill , NC : University of North Carolina Press , pp. 537 – 602 . [Google Scholar]). Our criterion is based on Akaike information criterion (AIC) for this model. The AIC has been successfully used in the literature in model selection when the dimension p is smaller than the sample size N. However, the case when p > N has not been considered in the literature, because MLE can not be estimated corresponding to singularity of the within-group covariance matrix. A popular method used to address the singularity problem in high-dimensional classification is the regularized method, which replaces the within-group sample covariance matrix with a ridge-type covariance estimate to stabilize the estimate. In this article, we propose AIC-type criterion by replacing MLE of the within-group covariance matrix with ridge-type estimator. This idea follows Srivastava and Kubokawa (2008 Srivastava , M. S. , Kubokawa , T. ( 2008 ). Akaike information criterion for selecting components of the mean vector in high dimensional data with fewer observations . J. Japan Statist. Soc. 38 : 259 – 283 . [Google Scholar]). Simulations revealed that our proposed criterion performs well. 相似文献

8.

Importance tempering

Robert Gramacy Richard Samworth Ruth King 《Statistics and Computing》2010,20(1):1-7

Simulated tempering (ST) is an established Markov chain Monte Carlo (MCMC) method for sampling from a multimodal density π(θ). Typically, ST involves introducing an auxiliary variable k taking values in a finite subset of [0,1] and indexing a set of tempered distributions, say π _k(θ)∝ π(θ)^k. In this case, small values of k encourage better mixing, but samples from π are only obtained when the joint chain for (θ,k) reaches k=1. However, the entire chain can be used to estimate expectations under π of functions of interest, provided that importance sampling (IS) weights are calculated. Unfortunately this method, which we call importance tempering (IT), can disappoint. This is partly because the most immediately obvious implementation is naïve and can lead to high variance estimators. We derive a new optimal method for combining multiple IS estimators and prove that the resulting estimator has a highly desirable property related to the notion of effective sample size. We briefly report on the success of the optimal combination in two modelling scenarios requiring reversible-jump MCMC, where the naïve approach fails. 相似文献

9.

On selecting a prior for the precision parameter of Dirichlet process mixture models

Robert M. Dorazio 《Journal of statistical planning and inference》2009

In hierarchical mixture models the Dirichlet process is used to specify latent patterns of heterogeneity, particularly when the distribution of latent parameters is thought to be clustered (multimodal). The parameters of a Dirichlet process include a precision parameter α

α

and a base probability measure _G₀

G_{0}

. In problems where α

α

is unknown and must be estimated, inferences about the level of clustering can be sensitive to the choice of prior assumed for α

α

. In this paper an approach is developed for computing a prior for the precision parameter α

α

that can be used in the presence or absence of prior information about the level of clustering. This approach is illustrated in an analysis of counts of stream fishes. The results of this fully Bayesian analysis are compared with an empirical Bayes analysis of the same data and with a Bayesian analysis based on an alternative commonly used prior. 相似文献

10.

Bayesian Markov chain Monte Carlo imputation for the transiting exoplanets with an application in clustering analysis

Huei-Wen Teng Yen-Ju Chao 《Journal of applied statistics》2015,42(5):1120-1132

相似文献

11.

Linear discriminant analysis for multiple functional data analysis

Sugnet Gardner-Lubbe 《Journal of applied statistics》2021,48(11):1917

In multivariate data analysis, Fisher linear discriminant analysis is useful to optimally separate two classes of observations by finding a linear combination of p variables. Functional data analysis deals with the analysis of continuous functions and thus can be seen as a generalisation of multivariate analysis where the dimension of the analysis space p strives to infinity. Several authors propose methods to perform discriminant analysis in this infinite dimensional space. Here, the methodology is introduced to perform discriminant analysis, not on single infinite dimensional functions, but to find a linear combination of p infinite dimensional continuous functions, providing a set of continuous canonical functions which are optimally separated in the canonical space.KEYWORDS: Functional data analysis, linear discriminant analysis, classification 相似文献

12.

Designing an accelerated degradation experiment with a reciprocal Weibull degradation rate

《Journal of statistical planning and inference》2006,136(1):282-297

相似文献

13.

An intuitive clustering algorithm for spherical data with application to extrasolar planets

Wen-Liang Hung Shou-Jen Chang-Chien Miin-Shen Yang 《Journal of applied statistics》2015,42(10):2220-2232

This paper proposes an intuitive clustering algorithm capable of automatically self-organizing data groups based on the original data structure. Comparisons between the propopsed algorithm and EM [1 A. Banerjee, I.S. Dhillon, J. Ghosh, and S. Sra, Clustering on the unit hypersphere using von Mises–Fisher distribution, J. Mach. Learn. Res. 6 (2005), pp. 1–39. [Google Scholar]] and spherical k-means [7 I.S. Dhillon and D.S. Modha, Concept decompositions for large sparse text data using clustering, Mach. Learn. 42 (2001), pp. 143–175. doi: 10.1023/A:1007612920971[Crossref], [Web of Science ®] , [Google Scholar]] algorithms are given. These numerical results show the effectiveness of the proposed algorithm, using the correct classification rate and the adjusted Rand index as evaluation criteria [5 J.-M. Chiou and P.-L. Li, Functional clustering and identifying substructures of longitudinal data, J. R. Statist. Soc. Ser. B. 69 (2007), pp. 679–699. doi: 10.1111/j.1467-9868.2007.00605.x[Crossref] , [Google Scholar],6 J.-M. Chiou and P.-L. Li, Correlation-based functional clustering via subspace projection, J. Am. Statist. Assoc. 103 (2008), pp. 1684–1692. doi: 10.1198/016214508000000814[Taylor &; Francis Online], [Web of Science ®] , [Google Scholar]]. In 1995, Mayor and Queloz announced the detection of the first extrasolar planet (exoplanet) around a Sun-like star. Since then, observational efforts of astronomers have led to the detection of more than 1000 exoplanets. These discoveries may provide important information for understanding the formation and evolution of planetary systems. The proposed clustering algorithm is therefore used to study the data gathered on exoplanets. Two main implications are also suggested: (1) there are three major clusters, which correspond to the exoplanets in the regimes of disc, ongoing tidal and tidal interactions, respectively, and (2) the stellar metallicity does not play a key role in exoplanet migration. 相似文献

14.

Finie mixures of canonical fundamenal skew $$$$-disribuions

Sharon X. Lee Geoffrey J. McLachlan 《Statistics and Computing》2016,26(3):573-589

This paper introduces a finite mixture of canonical fundamental skew $t$ (CFUST) distributions for a model-based approach to clustering where the clusters are asymmetric and possibly long-tailed (in: Lee and McLachlan, arXiv:1401.8182 [statME], 2014b). The family of CFUST distributions includes the restricted multivariate skew $t$ and unrestricted multivariate skew $t$ distributions as special cases. In recent years, a few versions of the multivariate skew $t$ (MST) mixture model have been put forward, together with various EM-type algorithms for parameter estimation. These formulations adopted either a restricted or unrestricted characterization for their MST densities. In this paper, we examine a natural generalization of these developments, employing the CFUST distribution as the parametric family for the component distributions, and point out that the restricted and unrestricted characterizations can be unified under this general formulation. We show that an exact implementation of the EM algorithm can be achieved for the CFUST distribution and mixtures of this distribution, and present some new analytical results for a conditional expectation involved in the E-step. 相似文献

15.

Optimal sampling frequency for high frequency data using a finite mixture model

《Journal of the Korean Statistical Society》2014,43(2):251-262

相似文献

16.

Means and variances for a family of similarity indices used in cluster analysis

Ahmed N. Albatineh 《Journal of statistical planning and inference》2010

相似文献

17.

Variable selection of linear programming discriminant estimator

Dong Xia 《统计学通讯:理论与方法》2017,46(7):3321-3341

相似文献

18.

Convexity-based clustering criteria: theory,algorithms, and applications in statistics

Hans-Hermann?Bock Email author 《Statistical Methods and Applications》2004,12(3):293-317

This paper deals with the construction of optimum partitions of for a clustering criterion which is based on a convex function of the class centroids as a generalization of the classical SSQ clustering criterion for n data points. We formulate a dual optimality problem involving two sets of variables and derive a maximum-support-plane (MSP) algorithm for constructing a (sub-)optimum partition as a generalized k-means algorithm. We present various modifications of the basic criterion and describe the corresponding MSP algorithm. It is shown that the method can also be used for solving optimality problems in classical statistics (maximizing Csiszárs -divergence) and for simultaneous classification of the rows and columns of a contingency table. 相似文献

19.

Using High-Frequency Data in Dynamic Portfolio Choice

Federico M. Bandi Jeffrey R. Russell 《Econometric Reviews》2013,32(1-3):163-198

This article evaluates the economic benefit of methods that have been suggested to optimally sample (in an MSE sense) high-frequency return data for the purpose of realized variance/covariance estimation in the presence of market microstructure noise (Bandi and Russell, 2005a Bandi , F. M. , Russell , J. R. ( 2005a ). Realized covariation, realized beta, and microstructure noise . Working paper . [Google Scholar], 2008 Bandi , F. M. , Russell , J. R. ( 2008 ). Microstructure noise, realized variance, and optimal sampling . Review of Economic Studies , forthcoming . [Google Scholar]). We compare certainty equivalents derived from volatility-timing trading strategies relying on optimally-sampled realized variances and covariances, on realized variances and covariances obtained by sampling every 5 minutes, and on realized variances and covariances obtained by sampling every 15 minutes. In our sample, we show that a risk-averse investor who is given the option of choosing variance/covariance forecasts derived from MSE-based optimal sampling methods versus forecasts obtained from 5- and 15-minute intervals (as generally proposed in the literature) would be willing to pay up to about 80 basis points per year to achieve the level of utility that is guaranteed by optimal sampling. We find that the gains yielded by optimal sampling are economically large, statistically significant, and robust to realistic transaction costs. 相似文献

20.

A Note on Criterion-Robust Optimal Designs for Model Discrimination and Parameter Estimation in Polynomial Regression Models

Mei-Mei Zen Chia-Hao Chan Yi-Hsiung Lin 《统计学通讯:理论与方法》2013,42(5):584-593

Consider the problem of discriminating between the polynomial regression models on [?1, 1] and estimating parameters in the models. Zen and Tsai (2002 Zen , M. M. , Tsai , M. H. ( 2002 ). Some criterion-robust optimal designs for the dual problem of model discrimination and parameter estimation . Sankhya Ind. J. Statist. 64 : (Series B, Pt. 3) : 322 – 338 . [Google Scholar]) proposed a multiple-objective optimality criterion, M _γ-criterion, which uses weight γ (0 ≤ γ ≤ 1) for model discrimination and α = β = (1 ? γ)/2 for parameter estimation in each model. In this article, we generalize it to a wider setup with different values of α and β. For instance, α = 2 β suggests that the “smaller” model is more likely to be the true model. Using similar techniques, the corresponding criterion-robust optimal design is investigated. A study for the original criterion-robust optimal design with α = β, through M-efficiency, shows that it is good enough for any wider setup. 相似文献