共查询到10条相似文献,搜索用时 0 毫秒
1.
Krystallenia Drosou 《Journal of applied statistics》2017,44(3):533-553
One of the major issues in medical field constitutes the correct diagnosis, including the limitation of human expertise in diagnosing the disease in a manual way. Nowadays, the use of machine learning classifiers, such as support vector machines (SVM), in medical diagnosis is increasing gradually. However, traditional classification algorithms can be limited in their performance when they are applied on highly imbalanced data sets, in which negative examples (i.e. negative to a disease) outnumber the positive examples (i.e. positive to a disease). SVM constitutes a significant improvement and its mathematical formulation allows the incorporation of different weights so as to deal with the problem of imbalanced data. In the present work an extensive study of four medical data sets is conducted using a variant of SVM, called proximal support vector machine (PSVM) proposed by Fung and Mangasarian [9]. Additionally, in order to deal with the imbalanced nature of the medical data sets we applied both a variant of SVM, referred as two-cost support vector machine and a modification of PSVM referred as modified PSVM. Both algorithms incorporate different weights one for each class examples. 相似文献
2.
3.
4.
Birsen Eygi Erdogan 《Journal of Statistical Computation and Simulation》2013,83(8):1543-1555
The purpose of this study was to apply support vector machines (SVMs) to bank bankruptcy analysis using practical steps. Although the prediction of the financial distress of companies is done using several statistical and machine learning techniques, bank classification and bankruptcy prediction still need to be investigated because few investigations have been conducted in this field of banking. In this study, SVMs were implemented to analyse financial ratios. Data sets from Turkish commercial banks were used. This study shows that SVMs with the Gaussian kernel are capable of extracting useful information from financial data and can be used as part of an early warning system. 相似文献
5.
Unbalanced data classification has been a long-standing issue in the field of medical vision science. We introduced the methods of support vector machines (SVM) with active learning (AL) to improve prediction of unbalanced classes in the medical imaging field. A standard SVM algorithm with four different AL approaches are proposed: (1) The first one uses random sampling to select the initial pool with AL algorithm; (2) the second doubles the training instances of the rare category to reduce the unbalanced ratio before the AL algorithm; (3) the third uses a balanced pool with equal number from each category; and (4) the fourth uses a balanced pool and implements balanced sampling throughout the AL algorithm. Grid pixel data of two scleroderma lung disease patterns, lung fibrosis (LF), and honeycomb (HC) were extracted from computed tomography images of 71 patients to produce a training set of 348 HC and 3009 LF instances and a test set of 291 HC and 2665 LF. From our research, SVM with AL using balanced sampling compared to random sampling increased the test sensitivity of HC by 56% (17.5% vs. 73.5%) and 47% (23% vs. 70%) for the original and denoised dataset, respectively. SVM with AL with balanced sampling can improve the classification performances of unbalanced data. 相似文献
6.
Andreas Christmann 《Allgemeines Statistisches Archiv》2004,88(4):375-396
Summary: This paper describes common features in data sets from motor vehicle insurance
companies and proposes a general approach which exploits knowledge of such
features in order to model high–dimensional data sets with a complex dependency structure.
The results of the approach can be a basis to develop insurance tariffs. The approach
is applied to a collection of data sets from several motor vehicle insurance companies. As
an example, we use a nonparametric approach based on a combination of two methods
from modern statistical machine learning, i.e. kernel logistic regression and -support
vector regression.*This work was supported by the Deutsche Forschungsgemeinschaft (SFB 475, Reduction
of complexity in multivariate data structures) and by the Forschungsband Do-MuS from the University of Dortmund. I am grateful to Mr. A. Wolfstein and Dr. W.
Terbeck from the Verband öffentlicher Versicherer in Düsseldorf, Germany, for making
available the data set and for many helpful discussions. 相似文献
7.
Mohammad Moqaddasi Amiri Leili Tapak 《Journal of Statistical Computation and Simulation》2019,89(15):2801-2812
Hierarchical study design often occurs in many areas such as epidemiology, psychology, sociology, public health, engineering, and agriculture. This imposes correlation in data structure that needs to be account for in modelling process. In this study, a three-level mixed-effects least squares support vector regression (MLS-SVR) model is proposed to extend the standard least squares support vector regression (LS-SVR) model for handling cluster correlated data. The MLS-SVR model incorporates multiple random effects which allow handling unequal number of observations for each case at non-fixed time points (a very unbalanced situation) and correlation between subjects simultaneously. The methodology consists of a regression modelling step that is performed straightforwardly by solving a linear system. The proposed model is illustrated through numerical studies on simulated data sets and a real data example on human Brucellosis frequency. The generalization performance of the proposed MLS-SVR is evaluated by comparing to ordinary LS-SVR and some other parametric models. 相似文献
8.
To enhance modeling flexibility, the authors propose a nonparametric hazard regression model, for which the ordinary and weighted least squares estimation and inference procedures are studied. The proposed model does not assume any parametric specifications on the covariate effects, which is suitable for exploring the nonlinear interactions between covariates, time and some exposure variable. The authors propose the local ordinary and weighted least squares estimators for the varying‐coefficient functions and establish the corresponding asymptotic normality properties. Simulation studies are conducted to empirically examine the finite‐sample performance of the new methods, and a real data example from a recent breast cancer study is used as an illustration. The Canadian Journal of Statistics 37: 659–674; 2009 © 2009 Statistical Society of Canada 相似文献
9.
F. DuBois Bowman Amita K. Manatunga 《Journal of the Royal Statistical Society. Series C, Applied statistics》2005,54(2):301-316
Summary. In many longitudinal studies, a subject's response profile is closely associated with his or her risk of experiencing a related event. Examples of such event risks include recurrence of disease, relapse, drop-out and non-compliance. When evaluating the effect of a treatment, it is sometimes of interest to consider the joint process consisting of both the response and the risk of an associated event. Motivated by a prevention of depression study among patients with malignant melanoma, we examine a joint model that incorporates the risk of discontinuation into the analysis of serial depression measures. We present a maximum likelihood estimator for the mean response and event risk vectors. We test hypotheses about functions of mean depression and withdrawal risk profiles from our joint model, predict depression from updated patient histories, characterize associations between components of the joint process and estimate the probability that a patient's depression and risk of withdrawal exceed specified levels. We illustrate the application of our joint model by using the depression data. 相似文献
10.
In many case-control studies, it is common to utilize paired data when treatments are being evaluated. In this article, we propose and examine an efficient distribution-free test to compare two independent samples, where each is based on paired observations. We extend and modify the density-based empirical likelihood ratio test presented by Gurevich and Vexler [7] to formulate an appropriate parametric likelihood ratio test statistic corresponding to the hypothesis of our interest and then to approximate the test statistic nonparametrically. We conduct an extensive Monte Carlo study to evaluate the proposed test. The results of the performed simulation study demonstrate the robustness of the proposed test with respect to values of test parameters. Furthermore, an extensive power analysis via Monte Carlo simulations confirms that the proposed method outperforms the classical and general procedures in most cases related to a wide class of alternatives. An application to a real paired data study illustrates that the proposed test can be efficiently implemented in practice. 相似文献