期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

陈创练郑挺国《统计研究》2018,35(8):23-38

本文拓展构建了后顾、同期和前瞻三种类型的货币政策规则,并基于实时数据和最终数据实证分析数据修订和实时估计对货币政策参数的影响效应。研究结果发现,数据修订对泰勒规则的影响取决于不同模型,而且在三种模型设定中,盯住产出缺口和通胀目标的时变参数均在不同程度上受数据修订的影响。特别是,对于最终数据,采用同期性货币政策规则展开估计最为有效;而对于实时数据,则基于后顾性货币政策规则模型估计是最佳的。最后,本文在数据选择和模型匹配上提出相应的对策建议。相似文献

2.

大数据环境下基于SVM-WNB的网络舆情分类研究

张宸韩夏《统计与决策》2017,(14):45-48

当前网络舆情信息存在数据量大、流动快及数据非结构化等特点,难以实现对其快速、准确的分类.SVM算法和朴素贝叶斯算法都是性能优秀的传统分类算法,但无法满足快速处理海量数据.文章利用Hadoop平台可并行处理分布式数据存储的优良特性,提出了HSVM_WNB分类算法,将采集的舆情文档依照HDFS架构进行本地化存储,并通过MapReduce进程完成并行分类处理.最后利用实验验证,本算法能够有效提升网络舆情分类能力与分类效率. 相似文献

3.

Rough Set综合评价算法的优化及应用 总被引：2，自引：0，他引：2

潘德宝傅春《统计与决策》2006,(13):144-145

粗集理论是由波兰数学家Z.Pawlak,在20世纪80年代初提出的一种处理模糊和不精确性问题的新型数学工具,粗集理论在处理有限元集合数据时,即不需要关于数据的任何附加信息,也无需预先给定某些特征或属性的数量描述,如统计学中的概率分布、模糊集理论中的隶属度或隶属函数等,通过对大量数据进行分析,根据论域中等价关系的依赖关系,剔除相容信息,抽取潜在有价值的规则知识.依据粗糙集理论的对象分类能力,以及粗糙集理论中的知识依赖性和属性重要性度量方法,可以得出一种完全数据驱动的综合评价方法,它克服了传统方法存在的主观性和片面性的问题,本文将属性的同分辨能力数引入到粗糙集评价,优化了评价算法. 相似文献

4.

基于Borda法不确定偏好序下的双边匹配决策模型

《统计与信息论坛》2017,(12):22-26

针对不确定偏好序下的双边匹配决策问题,提出了一种决策方法。首先,对具有不确定偏好序信息的双边匹配决策问题进行了描述,依据传统Borda法的思想,针对不确定偏好序的特点构建Borda分值计算规则;然后,基于综合Borda分值给出了满意度测度方法,在此基础上,建立双边满意匹配优化模型,求解模型获得双边匹配方案;最后,通过算例验证了提出方法的可行性和有效性。相似文献

5.

行政记录整合的贝叶斯分层记录链接模型及应用

丁东洋周丽莉《统计与信息论坛》2016,(7):30-35

记录链接的技术问题与统计理论密切相关,尤其是在建立记录链接分类规则时需要构建统计模型,识别关键变量以完成数据匹配。在贝叶斯框架下构建分层模型整合行政记录,通过多元回归可以实现匹配错误率的估计,而且一对一限制下的记录链接允许通过模块反映记录信息的来源变化,基于MCMC模拟的后验分布计算方便,有助于提高数据整合效率。相似文献

6.

基于整合治愈率模型的信贷违约时点预测

范新妍等《统计研究》2021,38(2):99-113

传统信用评分方法主要利用统计分类方法,只能预测借款人是否会发生违约,但不能预测违约发生的时点。治愈率模型是二分类和生存分析的混合模型,不仅可以预测是否会发生违约,而且可以预测违约发生的时点,比传统二分类方法可以提供更多的信息。另外,随着大数据的发展,数据源越来越多,针对相同或者相似任务,可以收集到多个数据集,本文提出了融合多源数据的整合治愈率模型,可以对多个数据集同时建模和估计参数,通过复合惩罚函数进行组间和组内双层变量选择,并通过促进两个子模型回归系数符号相同,提高模型的可解释性。通过数值模拟发现,所提方法在变量选择和参数估计上均有明显优势。最后,将所提方法应用于信用贷款的违约时点预测中,模型表现良好。相似文献

7.

基于最大匹配算法的似然导向中文分词方法

《统计与信息论坛》2019,(3):18-23

综合基于规则的分词方法与基于统计的分词方法在分词效果上的优势,提出一种基于最大匹配算法的似然导向中文分词方法。新方法在分词阶段,将训练数据的统计信息融入到基于规则的最大匹配分词算法中,并根据共现性自动识别后续词;在判定阶段,利用具有马尔可夫性的n-gram模型对分词阶段获得的多组分词模式进行判定,并基于最大似然原理确定最优的分词模式以提高分词准确率。实验结果表明,新方法有效提高了分词准确率和召回率,适用于中文文本信息挖掘。相似文献

8.

基于Bayes概率边界域的粗集分类方法及其在高频数据中的应用

下载免费PDF全文

来升强谢邦昌朱建平《统计研究》2010,27(3):76-82

作为一种近似处理的工具,粗集主要用于不确定情况下的决策分析,并且不需要任何事先的数据假定。但当前的主流粗集分类方法仍然需要先经过离散化的步骤,这就损失了数值型变量提供的高质量信息。本文对隶属函数重新加以概率定义,并提出了一种基于Bayes概率边界域的粗集分类技术,比较好的解决了当前粗集方法所面临的数值型属性分类的不适应、分类规则不完备等一系列问题。相似文献

9.

不平衡数据分类问题的FL逻辑回归算法

陈钟毓尹居良《统计与决策》2023,(5):33-37

针对不平衡数据的分类问题，文章利用焦点损失函数可以挖掘困难样本的特性，提出了一种新的逻辑回归算法。首先，定义逻辑回归模型新的损失函数；其次，基于牛顿迭代法，设计FL逻辑回归算法；最后，在比较实验中，运用随机森林进行特征选择，以阈值优化逻辑回归模型为分类模型进行实验。实验结果表明，与传统逻辑回归算法相比，改进后的算法提高了少数类样本的分类精度，增强了模型的整体分类性能。相似文献

10.

基因表达数据特征子集的冗余研究

《统计与信息论坛》2019,(5):10-15

过滤式特征选择是一种在基因表达数据上广泛使用且简单有效的方法。针对其特征子集冗余性问题,使用皮尔逊相关系数,提出一种带冗余去除的特征选择算法。研究了在不同相关强度下特征子集冗余去除及分类准确度效果。实验选用三个不同的基因表达数据集,使用支持向量机、k近邻、随机森林作为分类器分别进行了测试。实验结果表明,带冗余去除的过滤式特征选择方法在不同分类器上均能获得良好的分类性能,另外,此方法在降低特征子集维度的同时能够提高分类准确度。相似文献

11.

A classifier for multi-dimensional datasets based on Bayesian multiple kernel grouping learning

Fangli Dong 《Journal of Statistical Computation and Simulation》2019,89(11):2151-2174

This paper proposes an algorithm for the classification of multi-dimensional datasets based on the conjugate Bayesian Multiple Kernel Grouping Learning (BMKGL). Using conjugate Bayesian framework improves the computation efficiency. Multiple kernels instead of a single kernel avoid the kernel selection problem which is also a computationally expensive work. Through grouping parameter learning, BMKGL can simultaneously integrate information from different dimensions and find the dimensions which contribute more to the variations of the outcome for the purpose of interpretable property. Meanwhile, BMKGL can select the most suitable combination of kernels for different dimensions so as to extract the most appropriate measure for each dimension and improve the accuracy of classification results. The simulation results illustrate that our learning process has better performance in prediction results and stability compared to some popular classifiers, such as k-nearest neighbours algorithm, support vector machine algorithm and naive Bayes classifier. BMKGL also outperforms previous methods in terms of accuracy and interpretation for the heart disease and EEG datasets. 相似文献

12.

非平衡数据集的改进SMOTE再抽样算法 总被引：1，自引：0，他引：1

下载免费PDF全文

薛薇《统计研究》2012,29(6):95-98

非平衡数据集的不均衡学习特点通常表现为负类的分类效果不理想。改进SMOTE再抽样算法,将过抽样和欠抽样方式有机结合,有针对性地选择近邻并采用不同策略合成样本。实验表明,分类器在经此算法处理后的非平衡数据集的正负两类上,均可获得较理想的分类效果。相似文献

13.

SEQUENTIAL,BOTTOM‐UP VARIABLE SELECTION FOR HIGH‐DIMENSIONAL CLASSIFICATION

Peter Hall Hugh Miller 《Australian & New Zealand Journal of Statistics》2010,52(4):403-421

Most methods for variable selection work from the top down and steadily remove features until only a small number remain. They often rely on a predictive model, and there are usually significant disconnections in the sequence of methodologies that leads from the training samples to the choice of the predictor, then to variable selection, then to choice of a classifier, and finally to classification of a new data vector. In this paper we suggest a bottom‐up approach that brings the choices of variable selector and classifier closer together, by basing the variable selector directly on the classifier, removing the need to involve predictive methods in the classification decision, and enabling the direct and transparent comparison of different classifiers in a given problem. Specifically, we suggest ‘wrapper methods’, determined by classifier type, for choosing variables that minimize the classification error rate. This approach is particularly useful for exploring relationships among the variables that are chosen for the classifier. It reveals which variables have a high degree of leverage for correct classification using different classifiers; it shows which variables operate in relative isolation, and which are important mainly in conjunction with others; it permits quantification of the authority with which variables are selected; and it generally leads to a reduced number of variables for classification, in comparison with alternative approaches based on prediction. 相似文献

14.

基于KPCA和RS理论的支持向量分类机及其应用研究

王桂明袁美玲《统计与信息论坛》2008,23(12):9-14

核主成分分析KPCA是近年来提出的一个十分有效的数据降维方法,但它并不能保证所提取的第一主成分最适用于降维后的数据分类。粗糙集RS理论是处理这类问题的一个有效方法。提出一个基于KPCA与RS理论的支持向量分类机SVC,利用RS理论和信息熵原理对运用KP（A进行特征提取后的训练样本进行特征选择,保留重要特征,力求减小求解问题的规模,提高SVC的性能。在构建2006年上市公司财务困境预警模型的数值实验中,以KPCA、RS理论作为前置系统的SVC取得了良好效果。相似文献

15.

Covariance structure approximation via gLasso in high-dimensional supervised classification

Tatjana Pavlenko Anders Björkström Annika Tillander 《Journal of applied statistics》2012,39(8):1643-1666

Recent work has shown that the Lasso-based regularization is very useful for estimating the high-dimensional inverse covariance matrix. A particularly useful scheme is based on penalizing the ?₁ norm of the off-diagonal elements to encourage sparsity. We embed this type of regularization into high-dimensional classification. A two-stage estimation procedure is proposed which first recovers structural zeros of the inverse covariance matrix and then enforces block sparsity by moving non-zeros closer to the main diagonal. We show that the block-diagonal approximation of the inverse covariance matrix leads to an additive classifier, and demonstrate that accounting for the structure can yield better performance accuracy. Effect of the block size on classification is explored, and a class of asymptotically equivalent structure approximations in a high-dimensional setting is specified. We suggest a variable selection at the block level and investigate properties of this procedure in growing dimension asymptotics. We present a consistency result on the feature selection procedure, establish asymptotic lower an upper bounds for the fraction of separative blocks and specify constraints under which the reliable classification with block-wise feature selection can be performed. The relevance and benefits of the proposed approach are illustrated on both simulated and real data. 相似文献

16.

Multi-Step Classification Trees

Youngjae Chang 《统计学通讯:模拟与计算》2013,42(9):1728-1744

Many algorithms originated from decision trees have been developed for classification problems. Although they are regarded as good algorithms, most of them suffer from loss of prediction accuracy, namely high misclassification rates when there are many irrelevant variables. We propose multi-step classification trees with adaptive variable selection (the multi-step GUIDE classification tree (MG) and the multi-step CRUISE classification tree (MC) to handle this problem. The variable selection step and the fitting step comprise the multi-step method.

We compare the performance of classification trees in the presence of irrelevant variables. MG and MC perform better than Random Forest and C4.5 with an extremely noisy dataset. Furthermore, the prediction accuracy of our proposed algorithm is relatively stable even when the number of irrelevant variables increases, while that of other algorithms worsens. 相似文献

17.

A modified area under the ROC curve and its application to marker selection and classification

《Journal of the Korean Statistical Society》2014,43(2):161-175

The area under the ROC curve (AUC) can be interpreted as the probability that the classification scores of a diseased subject is larger than that of a non-diseased subject for a randomly sampled pair of subjects. From the perspective of classification, we want to find a way to separate two groups as distinctly as possible via AUC. When the difference of the scores of a marker is small, its impact on classification is less important. Thus, a new diagnostic/classification measure based on a modified area under the ROC curve (mAUC) is proposed, which is defined as a weighted sum of two AUCs, where the AUC with the smaller difference is assigned a lower weight, and vice versa. Using mAUC is robust in the sense that mAUC gets larger as AUC gets larger as long as they are not equal. Moreover, in many diagnostic situations, only a specific range of specificity is of interest. Under normal distributions, we show that if the AUCs of two markers are within similar ranges, the larger mAUC implies the larger partial AUC for a given specificity. This property of mAUC will help to identify the marker with the higher partial AUC, even when the AUCs are similar. Two nonparametric estimates of an mAUC and their variances are given. We also suggest the use of mAUC as the objective function for classification, and the use of the gradient Lasso algorithm for classifier construction and marker selection. Application to simulation datasets and real microarray gene expression datasets show that our method finds a linear classifier with a higher ROC curve than some other existing linear classifiers, especially in the range of low false positive rates. 相似文献

18.

Reduced-rank multi-label classification

Ting Yuan Junhui Wang 《Statistics and Computing》2017,27(1):181-191

Multi-label classification is a natural generalization of the classical binary classification for classifying multiple class labels. It differs from multi-class classification in that the multiple class labels are not exclusive. The key challenge is to improve the classification accuracy by incorporating the intrinsic dependency structure among the multiple class labels. In this article we propose to model the dependency structure via a reduced-rank multi-label classification model, and to enforce a group lasso regularization for sparse estimation. An alternative optimization scheme is developed to facilitate the computation, where a constrained manifold optimization technique and a gradient descent algorithm are alternated to maximize the resultant regularized log-likelihood. Various simulated examples and two real applications are conducted to demonstrate the effectiveness of the proposed method. More importantly, its asymptotic behavior is quantified in terms of the estimation and variable selection consistencies, as well as the model selection consistency via the Bayesian information criterion. 相似文献

19.

Adaptive kernel scaling support vector machine with application to a prostate cancer image study

Xin Liu Wenqing He 《Journal of applied statistics》2022,49(6):1465

The support vector machine (SVM) is a popularly used classifier in applications such as pattern recognition, texture mining and image retrieval owing to its flexibility and interpretability. However, its performance deteriorates when the response classes are imbalanced. To enhance the performance of the support vector machine classifier in the imbalanced cases we investigate a new two stage method by adaptively scaling the kernel function. Based on the information obtained from the standard SVM in the first stage, we conformally rescale the kernel function in a data adaptive fashion in the second stage so that the separation between two classes can be effectively enlarged with incorporation of observation imbalance. The proposed method takes into account the location of the support vectors in the feature space, therefore is especially appealing when the response classes are imbalanced. The resulting algorithm can efficiently improve the classification accuracy, which is confirmed by intensive numerical studies as well as a real prostate cancer imaging data application. 相似文献

20.

Sparse concordance-based ordinal classification

Yiwei Fan Jiaqi Gu Guosheng Yin 《Scandinavian Journal of Statistics》2023,50(3):934-961

Ordinal classification is an important area in statistical machine learning, where labels exhibit a natural order. One of the major goals in ordinal classification is to correctly predict the relative order of instances. We develop a novel concordance-based approach to ordinal classification, where a concordance function is introduced and a penalized smoothed method for optimization is designed. Variable selection using the

L_{1} $$ {L}_1 $$

penalty is incorporated for sparsity considerations. Within the set of classification rules that maximize the concordance function, we find optimal thresholds to predict labels by minimizing a loss function. After building the classifier, we derive nonparametric estimation of class conditional probabilities. The asymptotic properties of the estimators as well as the variable selection consistency are established. Extensive simulations and real data applications show the robustness and advantage of the proposed method in terms of classification accuracy, compared with other existing methods. 相似文献