首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This article proposes a discriminant function and an algorithm to analyze the data addressing the situation, where the data are positively skewed. The performance of the suggested algorithm based on the suggested discriminant function (LNDF) has been compared with the conventional linear discriminant function (LDF) and quadratic discriminant function (QDF) as well as with the nonparametric support vector machine (SVM) and the Random Forests (RFs) classifiers, using real and simulated datasets. A maximum reduction of approximately 81% in the error rates as compared to QDF for ten-variate data was noted. The overall results are indicative of better performance of the proposed discriminant function under certain circumstances.  相似文献   

2.
The sample linear discriminant function (LDF) is known to perform poorly when the number of features p is large relative to the size of the training samples, A simple and rarely applied alternative to the sample LDF is the sample Euclidean distance classifier (EDC). Raudys and Pikelis (1980) have compared the sample LDF with three other discriminant functions, including thesample EDC, when classifying individuals from two spherical normal populations. They have concluded that the sample EDC outperforms the sample LDF when p is large relative to the training sample size. This paper derives conditions for which the two classifiers are equivalent when all parameters are known and employs a Monte Carlo simulation to compare the sample EDC with the sample LDF no only for the spherical normal case but also for several nonspherical parameter configurations. Fo many practical situations, the sample EDC performs as well as or superior to the sample LDF, even for nonspherical covariance configurations.  相似文献   

3.
The purpose of thls paper is to investlgate the performance of the LDF (linear discrlmlnant functlon) and QDF (quadratic dlscrminant functlon) for classlfylng observations from the three types of univariate and multivariate non-normal dlstrlbutlons on the basls of the mlsclasslficatlon rate. The theoretical and the empirical results are described for unlvariate distributions, and the empirical results are presented for multivariate distributions. It 1s also shown that the sign of the skewness of each population and the kurtosis have essential effects on the performance of the two discriminant functions. The variations of the populatlon speclflc mlsclasslflcatlon rates are greatly depend on the sample slze. For the large dlmenslonal populatlon dlstributlons, if the sample sizes are sufflclent, the QDF performs better than the LDF. We show the crlterla of a cholce between the two discriminant functions as an application.  相似文献   

4.
The linear discriminant function (LDF) is known to be optimal in the sense of achieving an optimal error rate when sampling from multivariate normal populations with equal covariance matrices. Use of the LDF in nonnormal situations is known to lead to some strange results. This paper will focus on an evaluation of misclassification probabilities when the power transformation could have been used to achieve at least approximate normality and equal covariance matrices in the sampled populations for the distribution of the observed random variables. Attention is restricted to the two-population case with bivariate distributions.  相似文献   

5.
Using the techniques developed by Subrahmaniam and Ching’anda (1978), we study the robustness to nonnormality of the linear discriminant functions. It is seen that the LDF procedure is quite robust against the likelihood ratio rule. The latter yields in all cases much smaller overall error rates; however, the disparity between the error rates of the LDF and LR procedures is not large enough to warrant the recommendation to use the more complicated LR procedure.  相似文献   

6.
In this article, a randomized estimator of the empirical distribution function (EDF) called random weighting empirical distribution function (RWEDF) is introduced, one special case of which is just equivalent to the Bayesian bootstrap. The consistency of the RWEDF is established under certain conditions. By substituting this new EDF for the classical EDF, we obtain new versions of some EDF test statistics for goodness-of-fit. The simulation results show that the new tests are more powerful than the corresponding tests based on the classical EDF under some cases.  相似文献   

7.
The procedure of statistical discrimination Is simple in theory but so simple in practice. An observation x0possibly uiultivariate, is to be classified into one of several populations π1,…,πk which have respectively, the density functions f1(x), ? ? ? , fk(x). The decision procedure is to evaluate each density function at X0 to see which function gives the largest value fi(X0) , and then to declare that X0 belongs to the population corresponding to the largest value. If these den-sities can be assumed to be normal with equal covariance matricesthen the decision procedure is known as Fisher’s linear discrimi-nant function (LDF) method. In the case of unequal covariance matrices the procedure is called the quadratic discriminant func-tion (QDF) method. If the densities cannot be assumed to be nor-mal then the LDF and QDF might not perform well. Several different procedures have appeared in the literature which offer discriminant procedures for nonnormal data. However, these pro-cedures are generally difficult to use and are not readily available as canned statistical programs.

Another approach to discriminant analysis is to use some sortof mathematical trans format ion on the samples so that their distribution function is approximately normal, and then use the convenient LDF and QDF methods. One transformation that:applies to all distributions equally well is the rank transformation. The result of this transformation is that a very simple and easy to use procedure is made available. This procedure is quite robust as is evidenced by comparisons of the rank transform results with several published simulation studies.  相似文献   

8.
This article reports results of an extensive simulation study which investigated the performances of some commonly used methods of estimating error rates in discriminant analysis. Earlier research papers limited their comparisons of these methods to independent training data. This study allows for a simple auto-regressive dependence among the training data. The results suggest that the estimation methods based on the normal distribution perform adequately well under conditions of negative or mild positive correlation in the data, and small dimensions (p) of the observation vectors. For large p or strong positive correlation structures the conclusion is that one of the better non-parametric methods should be used. Special circumstances and conditions which notably affect the relative performances of the methods are identified.  相似文献   

9.
ABSTRACT

Classification of data consisting of both categorical and continuous variables between two groups is often handled by the sample location linear discriminant function confined to each of the locations specified by the observed values of the categorical variables. Homoscedasticity of across-location conditional dispersion matrices of the continuous variables is often assumed. Quite often, interactions between continuous and categorical variables cause across-location heteroscedasticity. In this article, we examine the effect of heterogeneous across-location conditional dispersion matrices on the overall expected and actual error rates associated with the sample location linear discriminant function. Performance of the sample location linear discriminant function is evaluated against the results for the restrictive classifier adjusted for across-location heteroscedasticity. Conclusions based on a Monte Carlo study are reported.  相似文献   

10.
The parametric and nonparametric methods for estimating the error rates in linear discriminant analysis are examined both in normal and in nonnormal situations. A Monte Carlo experiment was carried out under the assumption that two population distributions were characterized by a mixture of two multivariate normal distributions. The bootstrap bias-corrected apparent error rate compares favourably to other available estimators for nonnormal populations with small Mahalanobis distance. The methods for error estimation are also applied to a practical problem in medical diagnosis  相似文献   

11.
When the probability of selecting an individual in a population is propor­tional to its lifelength, it is called length biased sampling. A nonparametric maximum likelihood estimator (NPMLE) of survival in a length biased sam­ple is given in Vardi (1982). In this study, we examine the performance of Vardi's NPMLE in estimating the true survival curve when observations are from a length biased sample. We also compute estimators based on a linear combination (LCE) of empirical distribution function (EDF) estimators and weighted estimators. In our simulations, we consider observations from a mix­ture of two different distributions, one from F and the other from G which is a length biased distribution of F. Through a series of simulations with vari­ous proportions of length biasing in a sample, we show that the NPMLE and the LCE closely approximate the true survival curve. Throughout the sur­vival curve, the EDF estimators overestimate the survival. We also consider a case where the observations are from three different weighted distributions, Again, both the NPMLE and the LCE closely approximate the true distribu­tion, indicating that the length biasedness is properly adjusted for. Finally, an efficiency study shows that Vardi's estimators are more efficient than the EDF estimators in the lower percentiles of the survival curves.  相似文献   

12.
In this paper we derive some new tests for goodness-of-fit based on Rubin's empirical distribution function (EDF). Substituting Rubin's EDF for the classical EDF in the Kolmogorov–Smirnov, Cramér–von Mises, Anderson–Darling statistics, since Rubin's EDF for a given sample is a randomized distribution function, randomized statistics are derived, of which the qth quantile and the expectation are chosen as test statistics. We show that the new tests are consistent under simple hypothesis. Several power comparisons are also performed to show that the new tests are generally more powerful than the classical ones.  相似文献   

13.
This article presents parametric bootstrap (PB) approaches for hypothesis testing and interval estimation for the regression coefficients of panel data regression models with incomplete panels. Some simulation results are presented to compare the performance of the PB approaches with the approximate inferences. Our studies show that the PB approaches perform satisfactorily for various sample sizes and parameter configurations, and the performance of PB approaches is mostly better than the approximate methods with respect to the coverage probabilities and the Type I error rates. The PB inferences have almost exact coverage probabilities and Type I error rates. Furthermore, the PB procedure can be simply carried out by a few simulation steps, and the derivation is easier to understand and to be extended to the multi-way error component regression models with unbalanced panels. Finally, the proposed approaches are illustrated by using a real data example.  相似文献   

14.
In this article, we propose some tests of fit based on sample entropy for the composite Gumbel (Extreme Value) hypothesis. The proposed test statistics are constructed using different entropy estimates. Through a Monte Carlo simulation, critical values of the test statistics for various sample sizes are obtained. Since the tests based on the empirical distribution function (EDF) are commonly used in practice, the power values of the entropy-based tests with those of the EDF tests are compared against various alternatives and different sample sizes. Finally, two real data sets are modeled by the Gumbel distribution.KEYWORDS: Entropy estimator, Gumbel distribution, Monte Carlo simulation, test power  相似文献   

15.
Abstract

This article focuses on reducing the additional variance due to randomization of the responses. The idea of additive scrambling and its inverse has been used along with (i) split sample approach and (ii) double response approach. Specifically, our proposal is based on Gupta et al. (2006) randomized response model. We selected this model for improvement because it provides estimator of mean and sensitivity level of a sensitive variable and is better than all of its competitors proposed earlier to it and even Gupta et al. (2006) sensitivity estimator is better than that of Gupta et al. (2010). Our suggested estimators are unbiased estimators and perform better than Gupta et al. (2006) estimator. The issue of privacy protection is also discussed.  相似文献   

16.
In this article, we propose a flexible parametric (FP) approach for adjusting for covariate measurement errors in regression that can accommodate replicated measurements on the surrogate (mismeasured) version of the unobserved true covariate on all the study subjects or on a sub-sample of the study subjects as error assessment data. We utilize the general framework of the FP approach proposed by Hossain and Gustafson in 2009 for adjusting for covariate measurement errors in regression. The FP approach is then compared with the existing non-parametric approaches when error assessment data are available on the entire sample of the study subjects (complete error assessment data) considering covariate measurement error in a multiple logistic regression model. We also developed the FP approach when error assessment data are available on a sub-sample of the study subjects (partial error assessment data) and investigated its performance using both simulated and real life data. Simulation results reveal that, in comparable situations, the FP approach performs as good as or better than the competing non-parametric approaches in eliminating the bias that arises in the estimated regression parameters due to covariate measurement errors. Also, it results in better efficiency of the estimated parameters. Finally, the FP approach is found to perform adequately well in terms of bias correction, confidence coverage, and in achieving appropriate statistical power under partial error assessment data.  相似文献   

17.
This article introduces a method of nonparametric bivariate density estimation based on a bivariate sample level crossing function, which leads to the construction of a bivariate level crossing empirical distribution function (BLCEDF). An efficiency function for this BLCEDF relative to the classical empirical distribution function (EDF), is derived. The BLCEDF gives more efficient estimates than the EDF in the tails of any underlying continuous distribution, for both small and large sample sizes. On the basis of BLCEDF we define a bivariate level crossing kernel density estimator (BLCKDE) and study its properties. We apply the BLCEDF and BLCKDE for various distributions and provide results of simulations that confirm the theoretical properties. A real-world example is given.  相似文献   

18.
Errors of misclassification and their probabilities are studied for classification problems associated with univariate inverse Gaussian distributions. The effects of applying the linear discriminant function (LDF), based on normality, to inverse Gaussian populations are assessed by comparing probabilities (optimum and conditional) based on the LDF with those based on the likelihood ratio rule (LR) for the inverse Gaussian, Both theoretical and empirical results are presented  相似文献   

19.
In this paper we study the problem of reducing the bias of the ratio estimator of the population mean in a ranked set sampling (RSS) design. We first propose a jackknifed RSS-ratio estimator and then introduce a class of almost unbiased RSS-ratio estimators of the population mean. We also present an unbiased RSS-ratio estimator of the mean using the idea of Hartley and Ross (Nature 174:270?C271, 1954) which performs better than its counterpart with simple random sample data. We show that under certain conditions the proposed unbiased and almost unbiased RSS-ratio estimators perform better than the commonly used (biased) RSS-ratio estimator in estimating the population mean in terms of the mean square error. The theoretical results are augmented by a simulation study using a wheat yield data set from the Iranian Ministry of Agriculture to demonstrate the practical benefits of our proposed ratio-type estimators relative to the RSS-ratio estimator in reducing the bias in estimating the average wheat production.  相似文献   

20.
A new approach of randomization is proposed to construct goodness of fit tests generally. Some new test statistics are derived, which are based on the stochastic empirical distribution function (EDF). Note that the stochastic EDF for a set of given sample observations is a randomized distribution function. By substituting the stochastic EDF for the classical EDF in the Kolmogorov–Smirnov, Cramér–von Mises, Anderson–Darling, Berk–Jones, and Einmahl–Mckeague statistics, randomized statistics are derived, of which the qth quantile and the expectation are chosen as test statistics. In comparison to existing tests, it is shown, by a simulation study, that the new test statistics are generally more powerful than the corresponding ones based on the classical EDF or modified EDF in most cases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号