首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 0 毫秒
1.
Neural networks are a popular machine learning tool, particularly in applications such as protein structure prediction; however, overfitting can pose an obstacle to their effective use. Due to the large number of parameters in a typical neural network, one may obtain a network fit that perfectly predicts the learning data, yet fails to generalize to other data sets. One way of reducing the size of the parmeter space is to alter the network topology so that some edges are removed; however it is often not immediately apparent which edges should be eliminated. We propose a data-adaptive method of selecting an optimal network architecture using a deletion/substitution/addition algorithm. Results of this approach to classification are presented on simulated data and the breast cancer data of Wolberg and Mangasarian [1990. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc. Nat. Acad. Sci. 87, 9193–9196].  相似文献   

2.
用于分类的随机森林和Bagging分类树比较   总被引:5,自引:0,他引:5  
借助试验数据,从两种理论分析角度解释随机森林算法优于Bagging分类树算法的原因。将两种算法表述在两种不同的框架下,消除了这两种算法分析中的一些模糊之处。尤其在第二种分析框架下,更能清楚的看出,之所以随机森林算法优于Bagging分类树算法,是因为随机森林算法对应更小的偏差。  相似文献   

3.
《统计学通讯:理论与方法》2012,41(16-17):3126-3137
This article proposes a permutation procedure for evaluating the performance of different classification methods. In particular, we focus on two of the most widespread and used classification methodologies: latent class analysis and k-means clustering. The classification performance is assessed by means of a permutation procedure which allows for a direct comparison of the methods, the development of a statistical test, and points out better potential solutions. Our proposal provides an innovative framework for the validation of the data partitioning and offers a guide in the choice of which classification procedure should be used  相似文献   

4.
In clinical research an early and prompt detection of the risk class of a new patient may really play a crucial role in determining the effectiveness of the treatment and, consequently, achieving a satisfying prognosis of the patient's chances. There exists a number of popular rule-based algorithms for classification, whose performances are very attractive whenever data of large number of patients are available. However, when datasets only include data of a few hundred patients, the most common approaches give unstable results and developing effective decision-support systems become scientifically challenging. Since rules can be derived from different models as well as expert knowledge resources, each of them having its advantages and weaknesses, this article suggests a “hybrid” approach to address the classification problem when the number of patients is too small to effectively use a single technique only. The hybrid strategy was applied to a case study and its predictive performance was compared with performances of each single approach: due to the seriousness of a misclassification of high-risk patients, special attention was paid on the specificity. The results show that the hybrid strategy outperforms each single strategy involved.  相似文献   

5.
There are many well-known methods applied in classification problem for linear data with both known and unknown distribution. Here, we deal with classification involving data on torus and cylinder. A new method involving a generalized likelihood ratio test is developed for classifying in two populations using directional data. The approach assumes that one of the probabilities of misclassification is known. The procedure is constructed by applying Gibbs sampler on the conditionally specified distribution. A parametric bootstrap approach is also presented. An application to data involving linear and circular measurements on human skull from two tribal populations is given.  相似文献   

6.
徐雪松  王四春 《统计研究》2012,29(4):108-112
根据免疫否定选择原理,设计了基于掩码分段匹配的否定选择分类器,克服连续r位匹配法的缺陷。给出了适用于免疫优化的分类规则编码及分类信息分的评价。通过免疫进化对其进行群体优化以约简数据规则集。避免了传统分类算法缺乏全局优化能力的缺点,提高了对样本的识别能力。实验结果表明本文方法提高了数据分类的准确性,在数据分类准确率及平均信息分上优于传统的分类方法。  相似文献   

7.
In the classical discriminant analysis, when two multivariate normal distributions with equal variance–covariance matrices are assumed for two groups, the classical linear discriminant function is optimal with respect to maximizing the standardized difference between the means of two groups. However, for a typical case‐control study, the distributional assumption for the case group often needs to be relaxed in practice. Komori et al. (Generalized t ‐statistic for two‐group classification. Biometrics 2015, 71: 404–416) proposed the generalized t ‐statistic to obtain a linear discriminant function, which allows for heterogeneity of case group. Their procedure has an optimality property in the class of consideration. We perform a further study of the problem and show that additional improvement is achievable. The approach we propose does not require a parametric distributional assumption on the case group. We further show that the new estimator is efficient, in that no further improvement is possible to construct the linear discriminant function more efficiently. We conduct simulation studies and real data examples to illustrate the finite sample performance and the gain that it produces in comparison with existing methods.  相似文献   

8.
Two common kernel-based methods for non-parametric regression estimation suffer from well-known drawbacks when the design is random. The Gasser-Müller estimator is inadmissible due to its high variance while the Nadaraya-Watson estimator has zero asymptotic efficiency because of poor bias behavior. Under asymptotic consideration, the local linear estimator avoids these two drawbacks of kernel estimators and achieves minimax optimality. However, when based on compact support kernels its finite sample behavior is disappointing because sudden kinks may show up in the estimate.

This paper proposes a modification of the kernel estimator, called the binned convolution estimator leading to a fast O(n) method. Provided the design density is continously differentiable and the conditional fourth moments exist the binned convolution estimator has asymptotic properties identical with those of the local linear estimator.  相似文献   

9.
This paper concerns a method of estimation of variance components in a random effect linear model. It is mainly a resampling method and relies on the Jackknife principle. The derived estimators are presented as least squares estimators in an appropriate linear model, and one of them appears as a MINQUE (Minimum Norm Quadratic Unbiased Estimation) estimator. Our resampling method is illustrated by an example given by C. R. Rao [7] and some optimal properties of our estimator are derived for this example. In the last part, this method is used to derive an estimation of variance components in a random effect linear model when one of the components is assumed to be known.  相似文献   

10.
In this article we introduce a new likelihood based method, called the likelihood integrated method, which is distinct from the well-known integrated likelihood method. We use the likelihood integrated to propose a simple exploratory graphical analysis for the change point problem in the context of directional data. The method is applied to analysis of two real life data sets. The results obtained by application of this simple method are seen to be quite similar to those obtained earlier by different formal methods in most cases. A small simulation study is conducted to assess the effectiveness of this procedure in indicating presence of change point.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号