期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A deletion/substitution/addition algorithm for classification neural networks,with applications to biomedical data

Blythe Durbin Sandrine DudoitMark J. van der Laan 《Journal of statistical planning and inference》2008

Neural networks are a popular machine learning tool, particularly in applications such as protein structure prediction; however, overfitting can pose an obstacle to their effective use. Due to the large number of parameters in a typical neural network, one may obtain a network fit that perfectly predicts the learning data, yet fails to generalize to other data sets. One way of reducing the size of the parmeter space is to alter the network topology so that some edges are removed; however it is often not immediately apparent which edges should be eliminated. We propose a data-adaptive method of selecting an optimal network architecture using a deletion/substitution/addition algorithm. Results of this approach to classification are presented on simulated data and the breast cancer data of Wolberg and Mangasarian [1990. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc. Nat. Acad. Sci. 87, 9193–9196]. 相似文献

2.

用于分类的随机森林和Bagging分类树比较 总被引：5，自引：0，他引：5

马景义谢邦昌《统计与信息论坛》2010,25(10):18-22

借助试验数据,从两种理论分析角度解释随机森林算法优于Bagging分类树算法的原因。将两种算法表述在两种不同的框架下,消除了这两种算法分析中的一些模糊之处。尤其在第二种分析框架下,更能清楚的看出,之所以随机森林算法优于Bagging分类树算法,是因为随机森林算法对应更小的偏差。相似文献

3.

A Permutation Based Procedure for Classification Assessment

《统计学通讯:理论与方法》2012,41(16-17):3126-3137

This article proposes a permutation procedure for evaluating the performance of different classification methods. In particular, we focus on two of the most widespread and used classification methodologies: latent class analysis and k-means clustering. The classification performance is assessed by means of a permutation procedure which allows for a direct comparison of the methods, the development of a statistical test, and points out better potential solutions. Our proposal provides an innovative framework for the validation of the data partitioning and offers a guide in the choice of which classification procedure should be used 相似文献

4.

The Role of Classification Trees and Expert Knowledge in Building Bayesian Networks: A Case Study in Medicine

L. Stracqualursi P. Agati 《统计学通讯:理论与方法》2014,43(4):839-850

In clinical research an early and prompt detection of the risk class of a new patient may really play a crucial role in determining the effectiveness of the treatment and, consequently, achieving a satisfying prognosis of the patient's chances. There exists a number of popular rule-based algorithms for classification, whose performances are very attractive whenever data of large number of patients are available. However, when datasets only include data of a few hundred patients, the most common approaches give unstable results and developing effective decision-support systems become scientifically challenging. Since rules can be derived from different models as well as expert knowledge resources, each of them having its advantages and weaknesses, this article suggests a “hybrid” approach to address the classification problem when the number of patients is too small to effectively use a single technique only. The hybrid strategy was applied to a case study and its predictive performance was compared with performances of each single approach: due to the seriousness of a misclassification of high-risk patients, special attention was paid on the specificity. The results show that the hybrid strategy outperforms each single strategy involved. 相似文献

5.

A Classification Method for Directional Data with Application to the Human Skull

Ashis Sengupta 《统计学通讯:理论与方法》2013,42(3):457-466

There are many well-known methods applied in classification problem for linear data with both known and unknown distribution. Here, we deal with classification involving data on torus and cylinder. A new method involving a generalized likelihood ratio test is developed for classifying in two populations using directional data. The approach assumes that one of the probabilities of misclassification is known. The procedure is constructed by applying Gibbs sampler on the conditionally specified distribution. A parametric bootstrap approach is also presented. An application to data involving linear and circular measurements on human skull from two tribal populations is given. 相似文献

6.

一种免疫否定分段匹配选择的数据分类方法

下载免费PDF全文

徐雪松王四春《统计研究》2012,29(4):108-112

根据免疫否定选择原理,设计了基于掩码分段匹配的否定选择分类器,克服连续r位匹配法的缺陷。给出了适用于免疫优化的分类规则编码及分类信息分的评价。通过免疫进化对其进行群体优化以约简数据规则集。避免了传统分类算法缺乏全局优化能力的缺点,提高了对样本的识别能力。实验结果表明本文方法提高了数据分类的准确性,在数据分类准确率及平均信息分上优于传统的分类方法。相似文献

7.

An Optimal Semiparametric Method for Two‐group Classification

《Scandinavian Journal of Statistics》2018,45(3):806-846

In the classical discriminant analysis, when two multivariate normal distributions with equal variance–covariance matrices are assumed for two groups, the classical linear discriminant function is optimal with respect to maximizing the standardized difference between the means of two groups. However, for a typical case‐control study, the distributional assumption for the case group often needs to be relaxed in practice. Komori et al. (Generalized t ‐statistic for two‐group classification. Biometrics 2015, 71: 404–416) proposed the generalized t ‐statistic to obtain a linear discriminant function, which allows for heterogeneity of case group. Their procedure has an optimality property in the class of consideration. We perform a further study of the problem and show that additional improvement is achievable. The approach we propose does not require a parametric distributional assumption on the case group. We further show that the new estimator is efficient, in that no further improvement is possible to construct the linear discriminant function more efficiently. We conduct simulation studies and real data examples to illustrate the finite sample performance and the gain that it produces in comparison with existing methods. 相似文献

8.

A Remedy for Kernel Estimation Under Random Design

Alois Kneip Joachim Engel 《Statistics》2013,47(3):201-225

Two common kernel-based methods for non-parametric regression estimation suffer from well-known drawbacks when the design is random. The Gasser-Müller estimator is inadmissible due to its high variance while the Nadaraya-Watson estimator has zero asymptotic efficiency because of poor bias behavior. Under asymptotic consideration, the local linear estimator avoids these two drawbacks of kernel estimators and achieves minimax optimality. However, when based on compact support kernels its finite sample behavior is disappointing because sudden kinks may show up in the estimate.

This paper proposes a modification of the kernel estimator, called the binned convolution estimator leading to a fast O(n) method. Provided the design density is continously differentiable and the conditional fourth moments exist the binned convolution estimator has asymptotic properties identical with those of the local linear estimator. 相似文献

9.

A Jackknife Method for Estimation of Variance Components

Christian Lavergne 《Statistics》2013,47(1-2):1-13

This paper concerns a method of estimation of variance components in a random effect linear model. It is mainly a resampling method and relies on the Jackknife principle. The derived estimators are presented as least squares estimators in an appropriate linear model, and one of them appears as a MINQUE (Minimum Norm Quadratic Unbiased Estimation) estimator. Our resampling method is illustrated by an example given by C. R. Rao [7] and some optimal properties of our estimator are derived for this example. In the last part, this method is used to derive an estimation of variance components in a random effect linear model when one of the components is assumed to be known. 相似文献

10.

A Likelihood Integrated Method for Exploratory Graphical Analysis of Change Point Problem with Directional Data

Ashis SenGupta Arnab Kumar Laha 《统计学通讯:理论与方法》2013,42(11):1783-1791

In this article we introduce a new likelihood based method, called the likelihood integrated method, which is distinct from the well-known integrated likelihood method. We use the likelihood integrated to propose a simple exploratory graphical analysis for the change point problem in the context of directional data. The method is applied to analysis of two real life data sets. The results obtained by application of this simple method are seen to be quite similar to those obtained earlier by different formal methods in most cases. A small simulation study is conducted to assess the effectiveness of this procedure in indicating presence of change point. 相似文献