共查询到20条相似文献,搜索用时 15 毫秒
1.
AbstractThis paper examines the high dimensional asymptotics of the naive Hotelling T2 statistic. Naive Bayes has been utilized in high dimensional pattern recognition as a method to avoid singularities in the estimated covariance matrix. The naive Hotelling T2 statistic, which is equivalent to the estimator of the naive canonical correlation, is a statistically important quantity in naive Bayes and its high dimensional behavior has been studied under several conditions. In this paper, asymptotic normality of the naive Hotelling T2 statistic under a high dimension low sample size setting is developed using the central limit theorem of a martingale difference sequence. 相似文献
2.
Fangli Dong 《Journal of Statistical Computation and Simulation》2019,89(11):2151-2174
This paper proposes an algorithm for the classification of multi-dimensional datasets based on the conjugate Bayesian Multiple Kernel Grouping Learning (BMKGL). Using conjugate Bayesian framework improves the computation efficiency. Multiple kernels instead of a single kernel avoid the kernel selection problem which is also a computationally expensive work. Through grouping parameter learning, BMKGL can simultaneously integrate information from different dimensions and find the dimensions which contribute more to the variations of the outcome for the purpose of interpretable property. Meanwhile, BMKGL can select the most suitable combination of kernels for different dimensions so as to extract the most appropriate measure for each dimension and improve the accuracy of classification results. The simulation results illustrate that our learning process has better performance in prediction results and stability compared to some popular classifiers, such as k-nearest neighbours algorithm, support vector machine algorithm and naive Bayes classifier. BMKGL also outperforms previous methods in terms of accuracy and interpretation for the heart disease and EEG datasets. 相似文献
3.
Classification of gene expression microarray data is important in the diagnosis of diseases such as cancer, but often the analysis of microarray data presents difficult challenges because the gene expression dimension is typically much larger than the sample size. Consequently, classification methods for microarray data often rely on regularization techniques to stabilize the classifier for improved classification performance. In particular, numerous regularization techniques, such as covariance-matrix regularization, are available, which, in practice, lead to a difficult choice of regularization methods. In this paper, we compare the classification performance of five covariance-matrix regularization methods applied to the linear discriminant function using two simulated high-dimensional data sets and five well-known, high-dimensional microarray data sets. In our simulation study, we found the minimum distance empirical Bayes method reported in Srivastava and Kubokawa [Comparison of discrimination methods for high dimensional data, J. Japan Statist. Soc. 37(1) (2007), pp. 123–134], and the new linear discriminant analysis reported in Thomaz, Kitani, and Gillies [A Maximum Uncertainty LDA-based approach for Limited Sample Size problems – with application to Face Recognition, J. Braz. Comput. Soc. 12(1) (2006), pp. 1–12], to perform consistently well and often outperform three other prominent regularization methods. Finally, we conclude with some recommendations for practitioners. 相似文献
4.
判别分析是三大多元统计分析方法之一,在许多领域都有广泛的应用。通常认为距离判别、Fisher判别和Bayes判别是三种不同的判别分析方法,本文的研究表明,距离判别与Bayes判别是两种实质的判别方法,前者实际依据的是百分位点或置信区间,后者实际依据的是概率。而著名的Fisher判别,只是依据方差分析的思想,对判别变量进行线性变换,然后用于距离判别,其实不能算是一种实质的判别方法。本文将Fisher变换与Bayes判别结合起来,即先做Fisher变换,再利用概率最大原则做Bayes判别,得到一种新的判别途径,可进一步提高判别效率。理论与实证分析表明,基于Fisher变换的Bayes判别,适用场合广泛,判别效率最高。 相似文献
5.
Mohanad Fayez Al-khasawneh 《Journal of Statistical Computation and Simulation》2019,89(12):2175-2186
In this article, we consider the Bayes and empirical Bayes problem of the current population mean of a finite population when the sample data is available from other similar (m-1) finite populations. We investigate a general class of linear estimators and obtain the optimal linear Bayes estimator of the finite population mean under a squared error loss function that considered the cost of sampling. The optimal linear Bayes estimator and the sample size are obtained as a function of the parameters of the prior distribution. The corresponding empirical Bayes estimates are obtained by replacing the unknown hyperparameters with their respective consistent estimates. A Monte Carlo study is conducted to evaluate the performance of the proposed empirical Bayes procedure. 相似文献
6.
We consider the empirical Bayes decision theory where the component problems are the optimal fixed sample size decision problem and a sequential decision problem. With these components, an empirical Bayes decision procedure selects both a stopping rule function and a terminal decision rule function. Empirical Bayes stopping rules are constructed for each case and the asymptotic behaviours are investigated. 相似文献
7.
In this article, a variable selection procedure, called surrogate selection, is proposed which can be applied when a support vector machine or kernel Fisher discriminant analysis is used in a binary classification problem. Surrogate selection applies the lasso after substituting the kernel discriminant scores for the binary group labels, as well as values for the input variable observations. Empirical results are reported, showing that surrogate selection performs well. 相似文献
8.
Ming Ha Lee 《统计学通讯:理论与方法》2013,42(5):883-897
This study investigates the statistical properties of the adaptive Hotelling's T 2 charts with run rules in which the sample size and sampling interval are allowed to vary according on the current and past sampling points. The adaptive charts include variable sample size (VSS), variable sampling interval (VSI), and variable sample size and sampling interval (VSSI) charts. The adaptive Hotelling's T 2 charts with run rules are compared with the fixed sampling rate Hotelling's T 2 chart with run rules. The numerical results show that the VSS, VSI, and VSSI features improve the performance of the Hotelling's T 2 chart with run rules. 相似文献
9.
In this paper, we study the empirical Bayes two-action problem under linear loss function. Upper bounds on the regret of empirical Bayes testing rules are investigated. Previous results on this problem construct empirical Bayes tests using kernel type estimators of nonparametric functionals. Further, they have assumed specific forms, such as the continuous one-parameter exponential family for {Fθ:θΩ}, for the family of distributions of the observations. In this paper, we present a new general approach of establishing upper bounds (in terms of rate of convergence) of empirical Bayes tests for this problem. Our results are given for any family of continuous distributions and apply to empirical Bayes tests based on any type of nonparametric method of functional estimation. We show that our bounds are very sharp in the sense that they reduce to existing optimal or nearly optimal rates of convergence when applied to specific families of distributions. 相似文献
10.
Frederick Mosteller 《The American statistician》2013,67(1):20-22
Tests for redundancy of variables in linear two-group discriminant analysis are well known and frequently used. We give a survey of similar tests, including the one-sample T 2 as a special case, in the situation in which only the mean vector (but no covariance matrix) is available in one sample. Then we show that a relation between linear regression and discriminant functions found by Fisher (1936) can be generalized to this situation. Relating regression and discriminant analysis to a multivariate linear model sheds more light on the relationship between them. Practical and didactical advantages of the regression approach to T 2 tests and discriminant analysis are outlined. 相似文献
11.
SOMNATH DATTA DIPANKAR BANDYOPADHYAY GLEN A. SATTEN 《Scandinavian Journal of Statistics》2010,37(4):680-700
Abstract. A right‐censored version of a U ‐statistic with a kernel of degree m 1 is introduced by the principle of a mean preserving reweighting scheme which is also applicable when the dependence between failure times and the censoring variable is explainable through observable covariates. Its asymptotic normality and an expression of its standard error are obtained through a martingale argument. We study the performances of our U ‐statistic by simulation and compare them with theoretical results. A doubly robust version of this reweighted U ‐statistic is also introduced to gain efficiency under correct models while preserving consistency in the face of model mis‐specifications. Using a Kendall's kernel, we obtain a test statistic for testing homogeneity of failure times for multiple failure causes in a multiple decrement model. The performance of the proposed test is studied through simulations. Its usefulness is also illustrated by applying it to a real data set on graft‐versus‐host‐disease. 相似文献
12.
This paper discusses a supervised classification approach for the differential diagnosis of Raynaud's phenomenon (RP). The classification of data from healthy subjects and from patients suffering for primary and secondary RP is obtained by means of a set of classifiers derived within the framework of linear discriminant analysis. A set of functional variables and shape measures extracted from rewarming/reperfusion curves are proposed as discriminant features. Since the prediction of group membership is based on a large number of these features, the high dimension/small sample size problem is considered to overcome the singularity problem of the within-group covariance matrix. Results on a data set of 72 subjects demonstrate that a satisfactory classification of the subjects can be achieved through the proposed methodology. 相似文献
13.
Ricardo Leiva 《统计学通讯:理论与方法》2014,43(5):989-1012
Although devised in 1936 by Fisher, discriminant analysis is still rapidly evolving, as the complexity of contemporary data sets grows exponentially. Our classification rules explore these complexities by modeling various correlations in higher-order data. Moreover, our classification rules are suitable to data sets where the number of response variables is comparable or larger than the number of observations. We assume that the higher-order observations have a separable variance-covariance matrix and two different Kronecker product structures on the mean vector. In this article, we develop quadratic classification rules among g different populations where each individual has κth order (κ ≥2) measurements. We also provide the computational algorithms to compute the maximum likelihood estimates for the model parameters and eventually the sample classification rules. 相似文献
14.
We consider the sequential procedures developed by Robbins and Siegmund (1974), Louis (1975) and Zoubeidi (1992) for comparing the means of two treatments. We let the procedures have equal power functions and compare their Bayes and minimax risks using the invariance property of their power functions. For each of several formulations of the problem we determine the most relatively efficient procedure and compute its expected total sample size. 相似文献
15.
Mohan Delampady Anirban Dasgupta George Casella Herman Rubin William E. Strawderman 《Revue canadienne de statistique》2001,29(3):437-450
Following the developments in DasGupta et al. (2000), the authors propose and explore a new method for constructing proper default priors and a method for selecting a Bayes estimate from a family. Their results are based on asymptotic expansions of certain marginal correlations. For ease of exposition, most results are presented for location families and squared error loss only. The default prior methodology amounts, ultimately, to the minimization of Fisher information, and hence, Bickel's prior works out as the default prior if the location parameter is bounded. As for the selected Bayes estimate, it corresponds to ‘Gaussian tilting’ of an initial reference prior. 相似文献
16.
Ashok K. Singh 《Revue canadienne de statistique》1978,6(2):201-218
A class of invariant Bayes rules is derived for testing homogeneity of k (≥2) different populations against (kt) slippage alternatives that some (unknown) subset of size t of the given populations has parameter larger than the remaining k-t, where t is a given integer between 1 and k-1. For a similar problem in nonparametric situations, locally best tests based on ranks are derived. 相似文献
17.
This paper is an investigation on the sufficient statistic for the parameters of the vector-valued (multivariate) ARMA models, when a finite sample is available. In the simplest case ARMA(1,1), by using the factorization theorem, we present a sufficient statistic whose dimension depends on the sample size and
this dimension is even larger than the sample size. In this case and under some restrictions, we have solved this problem
and have presented a sufficient statistic whose dimension does not depend on the sample size. In the general case, due to
the complexity of the problem, we will use the modified versions of the likelihood function to find an approximate sufficient
statistic in terms of the periodogram. The dimension of this sufficient statistic depends on the sample size; however, this
dimension is much lower than the sample size. 相似文献
18.
For binomial data analysis, many methods based on empirical Bayes interpretations have been developed, in which a variance‐stabilizing transformation and a normality assumption are usually required. To achieve the greatest model flexibility, we conduct nonparametric Bayesian inference for binomial data and employ a special nonparametric Bayesian prior—the Bernstein–Dirichlet process (BDP)—in the hierarchical Bayes model for the data. The BDP is a special Dirichlet process (DP) mixture based on beta distributions, and the posterior distribution resulting from it has a smooth density defined on [0, 1]. We examine two Markov chain Monte Carlo procedures for simulating from the resulting posterior distribution, and compare their convergence rates and computational efficiency. In contrast to existing results for posterior consistency based on direct observations, the posterior consistency of the BDP, given indirect binomial data, is established. We study shrinkage effects and the robustness of the BDP‐based posterior estimators in comparison with several other empirical and hierarchical Bayes estimators, and we illustrate through examples that the BDP‐based nonparametric Bayesian estimate is more robust to the sample variation and tends to have a smaller estimation error than those based on the DP prior. In certain settings, the new estimator can also beat Stein's estimator, Efron and Morris's limited‐translation estimator, and many other existing empirical Bayes estimators. The Canadian Journal of Statistics 40: 328–344; 2012 © 2012 Statistical Society of Canada 相似文献
19.
Peter Hall Richard J. Samworth 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2005,67(3):363-379
Summary. It is shown that bagging, a computationally intensive method, asymptotically improves the performance of nearest neighbour classifiers provided that the resample size is less than 69% of the actual sample size, in the case of with-replacement bagging, or less than 50% of the sample size, for without-replacement bagging. However, for larger sampling fractions there is no asymptotic difference between the risk of the regular nearest neighbour classifier and its bagged version. In particular, neither achieves the large sample performance of the Bayes classifier. In contrast, when the sampling fractions converge to 0, but the resample sizes diverge to ∞, the bagged classifier converges to the optimal Bayes rule and its risk converges to the risk of the latter. These results are most readily seen when the two populations have well-defined densities, but they may also be derived in other cases, where densities exist in only a relative sense. Cross-validation can be used effectively to choose the sampling fraction. Numerical calculation is used to illustrate these theoretical properties. 相似文献
20.
《Journal of Statistical Computation and Simulation》2012,82(3-4):255-271
We study a situation in which N independent classifications between N(- 1; 1) and N(1,1) are to be faced simultaneously. This problem was the featured example in Robbins′ (1951) introduction of the compound decision problem and has been used many times since to illustrate various aspects of the developing theory of compound and empirical Bayes decisions. We here study the moderate N risk behavior of recently developed so-called “extended” bootstrap and Bayes procedures for the prob!em. The behavicr of :hese rules is compared to that of the bootstrap and Bayes rules originally suggested by Robbins. 相似文献