共查询到10条相似文献,搜索用时 0 毫秒
1.
Classification of gene expression microarray data is important in the diagnosis of diseases such as cancer, but often the analysis of microarray data presents difficult challenges because the gene expression dimension is typically much larger than the sample size. Consequently, classification methods for microarray data often rely on regularization techniques to stabilize the classifier for improved classification performance. In particular, numerous regularization techniques, such as covariance-matrix regularization, are available, which, in practice, lead to a difficult choice of regularization methods. In this paper, we compare the classification performance of five covariance-matrix regularization methods applied to the linear discriminant function using two simulated high-dimensional data sets and five well-known, high-dimensional microarray data sets. In our simulation study, we found the minimum distance empirical Bayes method reported in Srivastava and Kubokawa [Comparison of discrimination methods for high dimensional data, J. Japan Statist. Soc. 37(1) (2007), pp. 123–134], and the new linear discriminant analysis reported in Thomaz, Kitani, and Gillies [A Maximum Uncertainty LDA-based approach for Limited Sample Size problems – with application to Face Recognition, J. Braz. Comput. Soc. 12(1) (2006), pp. 1–12], to perform consistently well and often outperform three other prominent regularization methods. Finally, we conclude with some recommendations for practitioners. 相似文献
2.
Mixed-Weibull distribution has been used to model a wide range of failure data sets, and in many practical situations the number of components in a mixture model is unknown. Thus, the parameter estimation of a mixed-Weibull distribution is considered and the important issue of how to determine the number of components is discussed. Two approaches are proposed to solve this problem. One is the method of moments and the other is a regularization type of fuzzy clustering algorithm. Finally, numerical examples and two real data sets are given to illustrate the features of the proposed approaches. 相似文献
3.
Miki Aoyagi 《统计学通讯:理论与方法》2013,42(15):2667-2687
The coefficient of the main term of the generalization error in Bayesian estimation is called a Bayesian learning coefficient. In this article, we first introduce Vandermonde matrix type singularities and show certain orthogonality conditions of them. Recently, it has been recognized that Vandermonde matrix type singularities are related to Bayesian learning coefficients for several hierarchical learning models. By applying the orthogonality conditions of them, we show that their log canonical threshold also corresponds to the Bayesian learning coefficient for normal mixture models, and we obtain the explicit computational results in dimension one. 相似文献
4.
《Journal of Statistical Computation and Simulation》2012,82(1):109-121
Quite often we are faced with a sparse number of observations over a finite number of cells and are interested in estimating the cell probabilities. Some local polynomial smoothers or local likelihood estimators have been proposed to improve on the histogram, which would produce too many zero values. We propose a relativized local polynomial smoothing for this problem, weighting heavier the estimating errors in small probability cells. A simulation study about the estimators that are proposed show a good behaviour with respect to natural error criteria, especially when dealing with sparse observations. 相似文献
5.
Searls in 1964 showed that when the coefficient of variation is known, the sample mean is dominated with respect to mean squared error by an improved estimator that makes use of that coefficient. In this article we illustrate that this is true for a general class of estimators. Expressions for the minimum mean squared error and the relative efficiency are given for general distributions. The improvement, as measured by relative efficiency, is seen to be independent of the form of the distribution. 相似文献
6.
Inflated statistical significance of student's t test associated with small intersubject correlation
《Journal of Statistical Computation and Simulation》2012,82(9):691-696
The independence assumption in statistical significance testing becomes increasingly crucial and unforgiving as sample size increases. Seemingly, inconsequential violations of this assumption can substantially increase the probability of a Type I error if sample sizes are large. In the case of Student's t test, it is found that correlations within samples in a range from 0.01 to 0.05 can lead to rejection of a true null hypothesis with high probability, if N is 50, 100 or larger. 相似文献
7.
《Journal of Statistical Computation and Simulation》2012,82(12):1393-1406
Traditionally, using a control chart to monitor a process assumes that process observations are normally and independently distributed. In fact, for many processes, products are either connected or autocorrelated and, consequently, obtained observations are autocorrelative rather than independent. In this scenario, applying an independence assumption instead of autocorrelation for process monitoring is unsuitable. This study examines a generally weighted moving average (GWMA) with a time-varying control chart for monitoring the mean of a process based on autocorrelated observations from a first-order autoregressive process (AR(1)) with random error. Simulation is utilized to evaluate the average run length (ARL) of exponentially weighted moving average (EWMA) and GWMA control charts. Numerous comparisons of ARLs indicate that the GWMA control chart requires less time to detect various shifts at low levels of autocorrelation than those at high levels of autocorrelation. The GWMA control chart is more sensitive than the EWMA control chart for detecting small shifts in a process mean. 相似文献
8.
Kazuhiro Ohtani 《Statistical Papers》1999,40(1):75-87
In this paper, we derive the exact formula of the risk function of a pre-test estimator for normal variance with the Stein-variance (PTSV) estimator when the asymmetric LINEX loss function is used. Fixing the critical value of the pre-test to unity which is a suggested critical value in some sense, we examine numerically the risk performance of the PTSV estimator based on the risk function derived. Our numerical results show that although the PTSV estimator does not dominate the usual variance estimator when under-estimation is more severe than over-estimation, the PTSV estimator dominates the usual variance estimator when over-estimation is more severe. It is also shown that the dominance of the PTSV estimator over the original Stein-variance estimator is robust to the extension from the quadratic loss function to the LINEX loss function. 相似文献
9.
Asymptotics of cross-validated risk estimation in estimator selection and performance assessment 总被引:1,自引:0,他引:1
Risk estimation is an important statistical question for the purposes of selecting a good estimator (i.e., model selection) and assessing its performance (i.e., estimating generalization error). This article introduces a general framework for cross-validation and derives distributional properties of cross-validated risk estimators in the context of estimator selection and performance assessment. Arbitrary classes of estimators are considered, including density estimators and predictors for both continuous and polychotomous outcomes. Results are provided for general full data loss functions (e.g., absolute and squared error, indicator, negative log density). A broad definition of cross-validation is used in order to cover leave-one-out cross-validation, V-fold cross-validation, Monte Carlo cross-validation, and bootstrap procedures. For estimator selection, finite sample risk bounds are derived and applied to establish the asymptotic optimality of cross-validation, in the sense that a selector based on a cross-validated risk estimator performs asymptotically as well as an optimal oracle selector based on the risk under the true, unknown data generating distribution. The asymptotic results are derived under the assumption that the size of the validation sets converges to infinity and hence do not cover leave-one-out cross-validation. For performance assessment, cross-validated risk estimators are shown to be consistent and asymptotically linear for the risk under the true data generating distribution and confidence intervals are derived for this unknown risk. Unlike previously published results, the theorems derived in this and our related articles apply to general data generating distributions, loss functions (i.e., parameters), estimators, and cross-validation procedures. 相似文献