首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Abstract

This paper examines the high dimensional asymptotics of the naive Hotelling T2 statistic. Naive Bayes has been utilized in high dimensional pattern recognition as a method to avoid singularities in the estimated covariance matrix. The naive Hotelling T2 statistic, which is equivalent to the estimator of the naive canonical correlation, is a statistically important quantity in naive Bayes and its high dimensional behavior has been studied under several conditions. In this paper, asymptotic normality of the naive Hotelling T2 statistic under a high dimension low sample size setting is developed using the central limit theorem of a martingale difference sequence.  相似文献   

2.
This paper proposes an algorithm for the classification of multi-dimensional datasets based on the conjugate Bayesian Multiple Kernel Grouping Learning (BMKGL). Using conjugate Bayesian framework improves the computation efficiency. Multiple kernels instead of a single kernel avoid the kernel selection problem which is also a computationally expensive work. Through grouping parameter learning, BMKGL can simultaneously integrate information from different dimensions and find the dimensions which contribute more to the variations of the outcome for the purpose of interpretable property. Meanwhile, BMKGL can select the most suitable combination of kernels for different dimensions so as to extract the most appropriate measure for each dimension and improve the accuracy of classification results. The simulation results illustrate that our learning process has better performance in prediction results and stability compared to some popular classifiers, such as k-nearest neighbours algorithm, support vector machine algorithm and naive Bayes classifier. BMKGL also outperforms previous methods in terms of accuracy and interpretation for the heart disease and EEG datasets.  相似文献   

3.
Classification of gene expression microarray data is important in the diagnosis of diseases such as cancer, but often the analysis of microarray data presents difficult challenges because the gene expression dimension is typically much larger than the sample size. Consequently, classification methods for microarray data often rely on regularization techniques to stabilize the classifier for improved classification performance. In particular, numerous regularization techniques, such as covariance-matrix regularization, are available, which, in practice, lead to a difficult choice of regularization methods. In this paper, we compare the classification performance of five covariance-matrix regularization methods applied to the linear discriminant function using two simulated high-dimensional data sets and five well-known, high-dimensional microarray data sets. In our simulation study, we found the minimum distance empirical Bayes method reported in Srivastava and Kubokawa [Comparison of discrimination methods for high dimensional data, J. Japan Statist. Soc. 37(1) (2007), pp. 123–134], and the new linear discriminant analysis reported in Thomaz, Kitani, and Gillies [A Maximum Uncertainty LDA-based approach for Limited Sample Size problems – with application to Face Recognition, J. Braz. Comput. Soc. 12(1) (2006), pp. 1–12], to perform consistently well and often outperform three other prominent regularization methods. Finally, we conclude with some recommendations for practitioners.  相似文献   

4.
基于Fisher变换的Bayes判别方法探索   总被引:1,自引:0,他引:1       下载免费PDF全文
判别分析是三大多元统计分析方法之一,在许多领域都有广泛的应用。通常认为距离判别、Fisher判别和Bayes判别是三种不同的判别分析方法,本文的研究表明,距离判别与Bayes判别是两种实质的判别方法,前者实际依据的是百分位点或置信区间,后者实际依据的是概率。而著名的Fisher判别,只是依据方差分析的思想,对判别变量进行线性变换,然后用于距离判别,其实不能算是一种实质的判别方法。本文将Fisher变换与Bayes判别结合起来,即先做Fisher变换,再利用概率最大原则做Bayes判别,得到一种新的判别途径,可进一步提高判别效率。理论与实证分析表明,基于Fisher变换的Bayes判别,适用场合广泛,判别效率最高。  相似文献   

5.
In this article, we consider the Bayes and empirical Bayes problem of the current population mean of a finite population when the sample data is available from other similar (m-1) finite populations. We investigate a general class of linear estimators and obtain the optimal linear Bayes estimator of the finite population mean under a squared error loss function that considered the cost of sampling. The optimal linear Bayes estimator and the sample size are obtained as a function of the parameters of the prior distribution. The corresponding empirical Bayes estimates are obtained by replacing the unknown hyperparameters with their respective consistent estimates. A Monte Carlo study is conducted to evaluate the performance of the proposed empirical Bayes procedure.  相似文献   

6.
We consider the empirical Bayes decision theory where the component problems are the optimal fixed sample size decision problem and a sequential decision problem. With these components, an empirical Bayes decision procedure selects both a stopping rule function and a terminal decision rule function. Empirical Bayes stopping rules are constructed for each case and the asymptotic behaviours are investigated.  相似文献   

7.
In this article, a variable selection procedure, called surrogate selection, is proposed which can be applied when a support vector machine or kernel Fisher discriminant analysis is used in a binary classification problem. Surrogate selection applies the lasso after substituting the kernel discriminant scores for the binary group labels, as well as values for the input variable observations. Empirical results are reported, showing that surrogate selection performs well.  相似文献   

8.
This study investigates the statistical properties of the adaptive Hotelling's T 2 charts with run rules in which the sample size and sampling interval are allowed to vary according on the current and past sampling points. The adaptive charts include variable sample size (VSS), variable sampling interval (VSI), and variable sample size and sampling interval (VSSI) charts. The adaptive Hotelling's T 2 charts with run rules are compared with the fixed sampling rate Hotelling's T 2 chart with run rules. The numerical results show that the VSS, VSI, and VSSI features improve the performance of the Hotelling's T 2 chart with run rules.  相似文献   

9.
In this paper, we study the empirical Bayes two-action problem under linear loss function. Upper bounds on the regret of empirical Bayes testing rules are investigated. Previous results on this problem construct empirical Bayes tests using kernel type estimators of nonparametric functionals. Further, they have assumed specific forms, such as the continuous one-parameter exponential family for {Fθ:θΩ}, for the family of distributions of the observations. In this paper, we present a new general approach of establishing upper bounds (in terms of rate of convergence) of empirical Bayes tests for this problem. Our results are given for any family of continuous distributions and apply to empirical Bayes tests based on any type of nonparametric method of functional estimation. We show that our bounds are very sharp in the sense that they reduce to existing optimal or nearly optimal rates of convergence when applied to specific families of distributions.  相似文献   

10.
Chapter Notes     
Tests for redundancy of variables in linear two-group discriminant analysis are well known and frequently used. We give a survey of similar tests, including the one-sample T 2 as a special case, in the situation in which only the mean vector (but no covariance matrix) is available in one sample. Then we show that a relation between linear regression and discriminant functions found by Fisher (1936) can be generalized to this situation. Relating regression and discriminant analysis to a multivariate linear model sheds more light on the relationship between them. Practical and didactical advantages of the regression approach to T 2 tests and discriminant analysis are outlined.  相似文献   

11.
Abstract. A right‐censored version of a U ‐statistic with a kernel of degree m 1 is introduced by the principle of a mean preserving reweighting scheme which is also applicable when the dependence between failure times and the censoring variable is explainable through observable covariates. Its asymptotic normality and an expression of its standard error are obtained through a martingale argument. We study the performances of our U ‐statistic by simulation and compare them with theoretical results. A doubly robust version of this reweighted U ‐statistic is also introduced to gain efficiency under correct models while preserving consistency in the face of model mis‐specifications. Using a Kendall's kernel, we obtain a test statistic for testing homogeneity of failure times for multiple failure causes in a multiple decrement model. The performance of the proposed test is studied through simulations. Its usefulness is also illustrated by applying it to a real data set on graft‐versus‐host‐disease.  相似文献   

12.
This paper discusses a supervised classification approach for the differential diagnosis of Raynaud's phenomenon (RP). The classification of data from healthy subjects and from patients suffering for primary and secondary RP is obtained by means of a set of classifiers derived within the framework of linear discriminant analysis. A set of functional variables and shape measures extracted from rewarming/reperfusion curves are proposed as discriminant features. Since the prediction of group membership is based on a large number of these features, the high dimension/small sample size problem is considered to overcome the singularity problem of the within-group covariance matrix. Results on a data set of 72 subjects demonstrate that a satisfactory classification of the subjects can be achieved through the proposed methodology.  相似文献   

13.
Although devised in 1936 by Fisher, discriminant analysis is still rapidly evolving, as the complexity of contemporary data sets grows exponentially. Our classification rules explore these complexities by modeling various correlations in higher-order data. Moreover, our classification rules are suitable to data sets where the number of response variables is comparable or larger than the number of observations. We assume that the higher-order observations have a separable variance-covariance matrix and two different Kronecker product structures on the mean vector. In this article, we develop quadratic classification rules among g different populations where each individual has κth order (κ ≥2) measurements. We also provide the computational algorithms to compute the maximum likelihood estimates for the model parameters and eventually the sample classification rules.  相似文献   

14.
We consider the sequential procedures developed by Robbins and Siegmund (1974), Louis (1975) and Zoubeidi (1992) for comparing the means of two treatments. We let the procedures have equal power functions and compare their Bayes and minimax risks using the invariance property of their power functions. For each of several formulations of the problem we determine the most relatively efficient procedure and compute its expected total sample size.  相似文献   

15.
Following the developments in DasGupta et al. (2000), the authors propose and explore a new method for constructing proper default priors and a method for selecting a Bayes estimate from a family. Their results are based on asymptotic expansions of certain marginal correlations. For ease of exposition, most results are presented for location families and squared error loss only. The default prior methodology amounts, ultimately, to the minimization of Fisher information, and hence, Bickel's prior works out as the default prior if the location parameter is bounded. As for the selected Bayes estimate, it corresponds to ‘Gaussian tilting’ of an initial reference prior.  相似文献   

16.
A class of invariant Bayes rules is derived for testing homogeneity of k (≥2) different populations against (kt) slippage alternatives that some (unknown) subset of size t of the given populations has parameter larger than the remaining k-t, where t is a given integer between 1 and k-1. For a similar problem in nonparametric situations, locally best tests based on ranks are derived.  相似文献   

17.
This paper is an investigation on the sufficient statistic for the parameters of the vector-valued (multivariate) ARMA models, when a finite sample is available. In the simplest case ARMA(1,1), by using the factorization theorem, we present a sufficient statistic whose dimension depends on the sample size and this dimension is even larger than the sample size. In this case and under some restrictions, we have solved this problem and have presented a sufficient statistic whose dimension does not depend on the sample size. In the general case, due to the complexity of the problem, we will use the modified versions of the likelihood function to find an approximate sufficient statistic in terms of the periodogram. The dimension of this sufficient statistic depends on the sample size; however, this dimension is much lower than the sample size.  相似文献   

18.
For binomial data analysis, many methods based on empirical Bayes interpretations have been developed, in which a variance‐stabilizing transformation and a normality assumption are usually required. To achieve the greatest model flexibility, we conduct nonparametric Bayesian inference for binomial data and employ a special nonparametric Bayesian prior—the Bernstein–Dirichlet process (BDP)—in the hierarchical Bayes model for the data. The BDP is a special Dirichlet process (DP) mixture based on beta distributions, and the posterior distribution resulting from it has a smooth density defined on [0, 1]. We examine two Markov chain Monte Carlo procedures for simulating from the resulting posterior distribution, and compare their convergence rates and computational efficiency. In contrast to existing results for posterior consistency based on direct observations, the posterior consistency of the BDP, given indirect binomial data, is established. We study shrinkage effects and the robustness of the BDP‐based posterior estimators in comparison with several other empirical and hierarchical Bayes estimators, and we illustrate through examples that the BDP‐based nonparametric Bayesian estimate is more robust to the sample variation and tends to have a smaller estimation error than those based on the DP prior. In certain settings, the new estimator can also beat Stein's estimator, Efron and Morris's limited‐translation estimator, and many other existing empirical Bayes estimators. The Canadian Journal of Statistics 40: 328–344; 2012 © 2012 Statistical Society of Canada  相似文献   

19.
Summary.  It is shown that bagging, a computationally intensive method, asymptotically improves the performance of nearest neighbour classifiers provided that the resample size is less than 69% of the actual sample size, in the case of with-replacement bagging, or less than 50% of the sample size, for without-replacement bagging. However, for larger sampling fractions there is no asymptotic difference between the risk of the regular nearest neighbour classifier and its bagged version. In particular, neither achieves the large sample performance of the Bayes classifier. In contrast, when the sampling fractions converge to 0, but the resample sizes diverge to ∞, the bagged classifier converges to the optimal Bayes rule and its risk converges to the risk of the latter. These results are most readily seen when the two populations have well-defined densities, but they may also be derived in other cases, where densities exist in only a relative sense. Cross-validation can be used effectively to choose the sampling fraction. Numerical calculation is used to illustrate these theoretical properties.  相似文献   

20.
We study a situation in which N independent classifications between N(- 1; 1) and N(1,1) are to be faced simultaneously. This problem was the featured example in Robbins′ (1951) introduction of the compound decision problem and has been used many times since to illustrate various aspects of the developing theory of compound and empirical Bayes decisions. We here study the moderate N risk behavior of recently developed so-called “extended” bootstrap and Bayes procedures for the prob!em. The behavicr of :hese rules is compared to that of the bootstrap and Bayes rules originally suggested by Robbins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号