首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This article extends the work of DiPillo (1976) on the Biased Minimum x2 Rule. The optimum value of k (the biasing factor) Is determined and the true probability of misclassification is found. The proportion improvements reported in the 1976 paper are shown to be conservative. Some suggestions for algorithms to determine the optimal value of k are presented.  相似文献   

2.
This article extends the biased minimum x2 rule to the unequal covariance matrix case and to the case of several populations, The biased procedure is shown to improve the performance of the commonly used classification procedures. Results of sampling experiments over a broad range of conditions are provided to demonstrate this improvement.  相似文献   

3.
Class specific stratified posterior probability estimators of misclassification probabilities in discriminant analysis simulations are introduced. These estimators afford a significant variance reduction over the usual count estimators. Sufficient conditions for a variance reduction are given. The stratified posterior probability estimator is generalized to other class specific expectations.  相似文献   

4.
In this paper the rank method for forced discrimination in two population problems, introduced by Randies, Broffitt, Ramberg and Hogg (1978), is extended to cover settings involving more than two populations. Several methods of ranking are compared to the normal theory procedure in a Monte Carlo study. Asymptotic theory is included which confirms that the rank method does balance the limiting probabilities of misclassification in a two population setting.  相似文献   

5.
The influence of observations in estimating the misclassification probability in multiple discriminant analysis is studied using the common omission approach. An empirical influence function for the misclassification probability is also derived, It can give a very good approximation to the omission approach, but the computational load is much reduced, Various extensions of the measures are suggested. The proposed measures are applied to the famous Iris data set. The same three observations are identified as having the most influence under different measures.  相似文献   

6.
Approximated QDF misclassification probabilities have been derived for bivariate normal populations with known parameter values. Tne effect of unequal covariances and population distance on the misclassification probabilities are examined  相似文献   

7.
The purpose of this paper is to examine the multiple group (>2) discrimination problem in which the group sizes are unequal and the variables used in the classification are correlated with skewed distributions. Using statistical simulation based on data from a clinical study, we compare the performances, in terms of misclassification rates, of nine statistical discrimination methods. These methods are linear and quadratic discriminant analysis applied to untransformed data, rank transformed data, and inverse normal scores data, as well as fixed kernel discriminant analysis, variable kernel discriminant analysis, and variable kernel discriminant analysis applied to inverse normal scores data. It is found that the parametric methods with transformed data generally outperform the other methods, and the parametric methods applied to inverse normal scores usually outperform the parametric methods applied to rank transformed data. Although the kernel methods often have very biased estimates, the variable kernel method applied to inverse normal scores data provides considerable improvement in terms of total nonerror rate.  相似文献   

8.
ABSTRACT

When a binary dependent variable is misclassified, that is, recorded in the category other than where it really belongs, probit and logit estimates are biased and inconsistent. In some cases, the probability of misclassification may vary systematically with covariates, and thus be endogenous. In this paper, we develop an estimation approach that corrects for endogenous misclassification, validate our approach using a simulation study, and apply it to the analysis of a treatment program designed to improve family dynamics. Our results show that endogenous misclassification could lead to potentially incorrect conclusions unless corrected using an appropriate technique.  相似文献   

9.
We propose a hybrid two-group classification method that integrates linear discriminant analysis, a polynomial expansion of the basis (or variable space), and a genetic algorithm with multiple crossover operations to select variables from the expanded basis. Using new product launch data from the biochemical industry, we found that the proposed algorithm offers mean percentage decreases in the misclassification error rate of 50%, 56%, 59%, 77%, and 78% in comparison to a support vector machine, artificial neural network, quadratic discriminant analysis, linear discriminant analysis, and logistic regression, respectively. These improvements correspond to annual cost savings of $4.40–$25.73 million.  相似文献   

10.
We consider M -estimation under a two-sample semiparametric model in which the log ratio of two unknown density functions has a known parametric form. This two-sample semiparametric model, arising naturally from case-control studies and logistic discriminant analysis, can be regarded as a biased sampling model. A new class of M -estimators are constructed on the basis of the maximum semiparametric likelihood estimator of the underlying distribution function. It is shown that the proposed M -estimators are consistent and asymptotically normally distributed. A simulation study is presented to demonstrate the performance of the proposed M -estimators.  相似文献   

11.
We have compared the efficacy of five imputation algorithms readily available in SAS for the quadratic discriminant function. Here, we have generated several different parametric-configuration training data with missing data, including monotone missing-at-random observations, and used a Monte Carlo simulation to examine the expected probabilities of misclassification for the two-class quadratic statistical discrimination problem under five different imputation methods. Specifically, we have compared the efficacy of the complete observation-only method and the mean substitution, regression, predictive mean matching, propensity score, and Markov Chain Monte Carlo (MCMC) imputation methods. We found that the MCMC and propensity score multiple imputation approaches are, in general, superior to the other imputation methods for the configurations and training-sample sizes we considered.  相似文献   

12.
The quadratic discriminant function is commonly used for the two group classification problem when the covariance matrices in the two populations are substantially unequal. This procedure is optimal when both populations are multivariate normal with known means and covariance matrices. This study examined the robustness of the QDF to non-normality. Sampling experiments were conducted to estimate expected actual error rates for the QDF when sampling from a variety of non-normal distributions. Results indicated that the QDF was robust to non-normality except when the distributions were highly skewed, in which case relatively large deviations from optimal were observed. In all cases studied the average probabilities of misclassification were relatively stable while the individual population error rates exhibited considerable variability.  相似文献   

13.
In simulation studies for discriminant analysis, misclassification errors are often computed using the Monte Carlo method, by testing a classifier on large samples generated from known populations. Although large samples are expected to behave closely to the underlying distributions, they may not do so in a small interval or region, and thus may lead to unexpected results. We demonstrate with an example that the LDA misclassification error computed via the Monte Carlo method may often be smaller than the Bayes error. We give a rigorous explanation and recommend a method to properly compute misclassification errors.  相似文献   

14.
In this article we study the effect of truncation on the performance of an open vector-at-a-time sequential sampling procedure (P* B) proposed by Bechhofer, Kiefer and Sobel , for selecting the multinomial event which has the largest probability. The performance of the truncated version (P* B T) is compared to that of the original basic procedure (P* B). The performance characteristics studied include the probability of a correct selection, the expected number of vector-observations (n) to terminate sampling, and the variance of n. Both procedures guarantee the specified probability of a correct selection. Exact results and Monte Carlo sampling results are obtained. It is shown that P* B Tis far superior to P* B in terms of E{n} and Var{n}, particularly when the event probabilities are equal.The performance of P* B T is also compared to that of a closed vector-at-a-time sequential sampling procedure proposed for the same problem by Ramey and Alam; this procedure has here to fore been claimed to be the best one for this problem. It is shown that p* B T is superior to the Ramey-Alam procedure for most of the specifications of practical interest.  相似文献   

15.
Five biased estimators of the slope in straight line regression are considered. For each, the estimate of the “bias parameter”, k, is a function of N, the number of observations, and [rcirc]2 , the square of the least squares estimate of the standardized slope, β. The estimators include that of Farebrother, the ridge estimator of Hoerl, Kennard, and Baldwin, Vinod's shrunken estimators., and a new modification of one of the latter. Properties of the estimators are studied for 13 combinations of N and 3. Results of simulation experiments provide empirical evidence concerning the values of means and variances of the biased estimators of the slope and estimates of the “bias parameter”, the mean square errors of the estimators, and the frequency of improvement relative to least squares. Adjustments to degrees of freedom in the biased regression analysis of variance table are also considered. An extension of the new modification to the case of p> 1 independent variables is presented in an Appendix.  相似文献   

16.
The purpose of this study was to predict placement and nonplacement outcomes for mildly handicapped three through five year old children given knowledge of developmental screening test data. Discrete discriminant analysis (Anderson, 1951; Cochran & Hopkins, 1961; Goldstein & Dillon, 1978) was used to classify children into either a placement or nonplacement group using developmental information retrieved from longitudinal Child Find records (1982-89). These records were located at the Florida Diagnostic and Learning Resource System (FDLRS) in Sarasota, Florida and provided usable data for 602 children. The developmental variables included performance on screening test activities from the Comprehensive Identification Process (Zehrbach, 1975), and consisted of: (a) gross motor skills, (b) expressive language skills, and (c) social-emotional skills. These three dichotomously scored developmental variables generated eight mutually exclusive and exhaustive combinations of screening data. Combined with one of three different types of cost-of-misclassification functions, each child in a random cross-validation sample of 100 was classified into one of the two outcome groups minimizing the expected cost of misclassification based on the remaining 502 children. For each cost function designed by the researchers a comparison was made between classifications from the discrete discriminant analysis procedure and actual placement outcomes for the 100 children. A logit analysis and a standard discriminant analysis were likewise conducted using the 502 children and compared with results of the discrete discriminant analysis for selected cost functions.  相似文献   

17.
When the probability of selecting an individual in a population is propor­tional to its lifelength, it is called length biased sampling. A nonparametric maximum likelihood estimator (NPMLE) of survival in a length biased sam­ple is given in Vardi (1982). In this study, we examine the performance of Vardi's NPMLE in estimating the true survival curve when observations are from a length biased sample. We also compute estimators based on a linear combination (LCE) of empirical distribution function (EDF) estimators and weighted estimators. In our simulations, we consider observations from a mix­ture of two different distributions, one from F and the other from G which is a length biased distribution of F. Through a series of simulations with vari­ous proportions of length biasing in a sample, we show that the NPMLE and the LCE closely approximate the true survival curve. Throughout the sur­vival curve, the EDF estimators overestimate the survival. We also consider a case where the observations are from three different weighted distributions, Again, both the NPMLE and the LCE closely approximate the true distribu­tion, indicating that the length biasedness is properly adjusted for. Finally, an efficiency study shows that Vardi's estimators are more efficient than the EDF estimators in the lower percentiles of the survival curves.  相似文献   

18.
The well-known chi-squared goodness-of-fit test for a multinomial distribution is generally biased when the observations are subject to misclassification. In Pardo and Zografos (2000) the problem was considered using a double sampling scheme and ø-divergence test statistics. A new problem appears if the null hypothesis is not simple because it is necessary to give estimators for the unknown parameters. In this paper the minimum ø-divergence estimators are considered and some of their properties are established. The proposed ø-divergence test statistics are obtained by calculating ø-divergences between probability density functions and by replacing parameters by their minimum ø-divergence estimators in the derived expressions. Asymptotic distributions of the new test statistics are also obtained. The testing procedure is illustrated with an example.  相似文献   

19.
Continuing me work of Draper, Guttman and Kanemasu (1971) we obtain the appropriate distribution and evaluate some actual probability levels tor entry of variables in stepwise regression when the denominator of the F-statistic is a biased estimate of the residual variance and there are two possible entry candidates.  相似文献   

20.
Estimated associations between an outcome variable and misclassified covariates tend to be biased when the methods of estimation that ignore the classification error are applied. Available methods to account for misclassification often require the use of a validation sample (i.e. a gold standard). In practice, however, such a gold standard may be unavailable or impractical. We propose a Bayesian approach to adjust for misclassification in a binary covariate in the random effect logistic model when a gold standard is not available. This Markov Chain Monte Carlo (MCMC) approach uses two imperfect measures of a dichotomous exposure under the assumptions of conditional independence and non-differential misclassification. A simulated numerical example and a real clinical example are given to illustrate the proposed approach. Our results suggest that the estimated log odds of inpatient care and the corresponding standard deviation are much larger in our proposed method compared with the models ignoring misclassification. Ignoring misclassification produces downwardly biased estimates and underestimate uncertainty.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号