首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The Wilcoxon-Mann-Whitney test is a frequently applied nonparametric procedure for testing the equality of two distributions. In this paper formulae for the necessary sample sizes and lower bounds for the power are derived applicable to arbitrary distributions. In particular, the case of discrete random variables is investigated. It is shown how the power bounds increase and the necessary sample sizes decrease when the number of possible outcomes decreases.  相似文献   

2.
The location linear discriminant function is used in a two-population classification problem when the available data are generated from both binary and continuous random variables. Asymptotic distribution of the studentized location linear discriminant function is derived directly without the inversion of the corresponding characteristic function. The resulting plug-in estimate of the overall error of misclassification consists of the estimate based on the limiting distribution of the discriminant plus a correction term up to the second order. By comparison, our estimate avoids exact knowledge of the Mahalanobis distances which is necessary when the expansions of Vlachonikolis (1985) are used in the case of an arbitrary cut-off point. An example is re-examined and analysed in the present context.  相似文献   

3.
Most discriminant functions refer to qualitatively district groups. Talis et al. (1975) introduced the probit discriminant function for distinguishing between two ordered groups. They showed how to estimate this function for mixture sampling and continuous predictor variables. Here an estimation system is given for the more common separate sampling which is applicable to continuous and/or discrete predictor variables. When used solely with continuous variables) this method of estimation is more robust than Tallis!

The relationship of probit and logistic discrimination is discussed.  相似文献   

4.
The construction of a joint model for mixed discrete and continuous random variables that accounts for their associations is an important statistical problem in many practical applications. In this paper, we use copulas to construct a class of joint distributions of mixed discrete and continuous random variables. In particular, we employ the Gaussian copula to generate joint distributions for mixed variables. Examples include the robit-normal and probit-normal-exponential distributions, the first for modelling the distribution of mixed binary-continuous data and the second for a mixture of continuous, binary and trichotomous variables. The new class of joint distributions is general enough to include many mixed-data models currently available. We study properties of the distributions and outline likelihood estimation; a small simulation study is used to investigate the finite-sample properties of estimates obtained by full and pairwise likelihood methods. Finally, we present an application to discriminant analysis of multiple correlated binary and continuous data from a study involving advanced breast cancer patients.  相似文献   

5.
We consider the problem of the effect of sample designs on discriminant analysis. The selection of the learning sample is assumed to depend on the population values of auxiliary variables. Under a superpopulation model with a multivariate normal distribution, unbiasedness and consistency are examined for the conventional estimators (derived under the assumptions of simple random sampling), maximum likelihood estimators, probability-weighted estimators and conditionally unbiased estimators of parameters. Four corresponding sampled linear discriminant functions are examined. The rates of misclassification of these four discriminant functions and the effect of sample design on these four rates of misclassification are discussed. The performances of these four discriminant functions are assessed in a simulation study.  相似文献   

6.
A new discrete counterpart of gamma distribution for modelling discrete life data is defined based on similar mathematical form and properties of the continuous version. The main statistical and reliability properties of this distribution are derived and it is shown that this model can deal with both over and under-dispersed data. Geometric variables and finite sum of geometric variables, i.e., negative binomial are shown to be special cases of the proposed discrete gamma. Also, the size-biased discrete gamma distribution is derived and discussed. Moreover, different estimation methods of the underlying parameters of this distribution are utilized and comparisons of their performance have been made. Finally, an application in real-life data is used to elucidate the earlier results of this article.  相似文献   

7.
The purpose of this study was to predict placement and nonplacement outcomes for mildly handicapped three through five year old children given knowledge of developmental screening test data. Discrete discriminant analysis (Anderson, 1951; Cochran & Hopkins, 1961; Goldstein & Dillon, 1978) was used to classify children into either a placement or nonplacement group using developmental information retrieved from longitudinal Child Find records (1982-89). These records were located at the Florida Diagnostic and Learning Resource System (FDLRS) in Sarasota, Florida and provided usable data for 602 children. The developmental variables included performance on screening test activities from the Comprehensive Identification Process (Zehrbach, 1975), and consisted of: (a) gross motor skills, (b) expressive language skills, and (c) social-emotional skills. These three dichotomously scored developmental variables generated eight mutually exclusive and exhaustive combinations of screening data. Combined with one of three different types of cost-of-misclassification functions, each child in a random cross-validation sample of 100 was classified into one of the two outcome groups minimizing the expected cost of misclassification based on the remaining 502 children. For each cost function designed by the researchers a comparison was made between classifications from the discrete discriminant analysis procedure and actual placement outcomes for the 100 children. A logit analysis and a standard discriminant analysis were likewise conducted using the 502 children and compared with results of the discrete discriminant analysis for selected cost functions.  相似文献   

8.
Some general remarks are made about likelihood factorizations, distinguishing parameter-based factorizations and concentration-graph factorizations. Two parametric families of distributions for mixed discrete and continuous variables are discussed. Conditions on graphs are given for the circumstances under which their joint analysis can be split into separate analyses, each involving a reduced set of component variables and parameters. The result shows marked differences between the two families although both involve the same necessary condition on prime graphs. This condition is both necessary and sufficient for simplified estimation in Gaussian and for discrete log linear models.  相似文献   

9.
The normal and Laplace are the two earliest known continuous distributions in statistics and the two most popular models for analyzing symmetric data. In this note, the exact distribution of the ratio | X / Y | is derived when X and Y are respectively normal and Laplace random variables distributed independently of each other. A MAPLE program is provided for computing the associated percentage points. An application of the derived distribution is provided to a discriminant problem.  相似文献   

10.
Nadarajah and Mitov [Communications in Statistics—Theory and Methods, 32, 2003, 47–60] derived an expectation formula for continuous multivariate random variables involving the joint survival function. Their result is extended here for discrete multivariate random variables. Examples proposing new discrete bivariate distributions are given.  相似文献   

11.
A method of regularized discriminant analysis for discrete data, denoted DRDA, is proposed. This method is related to the regularized discriminant analysis conceived by Friedman (1989) in a Gaussian framework for continuous data. Here, we are concerned with discrete data and consider the classification problem using the multionomial distribution. DRDA has been conceived in the small-sample, high-dimensional setting. This method has a median position between multinomial discrimination, the first-order independence model and kernel discrimination. DRDA is characterized by two parameters, the values of which are calculated by minimizing a sample-based estimate of future misclassification risk by cross-validation. The first parameter is acomplexity parameter which provides class-conditional probabilities as a convex combination of those derived from the full multinomial model and the first-order independence model. The second parameter is asmoothing parameter associated with the discrete kernel of Aitchison and Aitken (1976). The optimal complexity parameter is calculated first, then, holding this parameter fixed, the optimal smoothing parameter is determined. A modified approach, in which the smoothing parameter is chosen first, is discussed. The efficiency of the method is examined with other classical methods through application to data.  相似文献   

12.
Abstract

When the elements of a random vector take any real values, formulas of product moments are obtained for continuous and discrete random variables using distribution/survival functions. The random product can be that of strictly increasing functions of random variables. For continuous cases, the derivation based on iterated integrals is employed. It is shown that Hoeffding’s covariance lemma is algebraically equal to a special case of this result. For discrete cases, the elements of a random vector can be non-integers and/or unequally spaced. A discrete version of Hoeffding’s covariance lemma is derived for real-valued random variables.  相似文献   

13.
A Sampling experiment performed using data collected for a large clinical trial shows that the discriminant function estimates of the logistic regression coefficients for discrete variables may be severely biased. The simulations show that the mixed variable location model coefficient estimates have bias which is of the same magnitude as the bias in the coefficient estimates obtained using conditional maximum likelihood estimates but require about one-tenth of the computer time.  相似文献   

14.
We study the problem of classifying an individual into one of several populations based on mixed nominal, continuous, and ordinal data. Specifically, we obtain a classification procedure as an extension to the so-called location linear discriminant function, by specifying a general mixed-data model for the joint distribution of the mixed discrete and continuous variables. We outline methods for estimating misclassification error rates. Results of simulations of the performance of proposed classification rules in various settings vis-à-vis a robust mixed-data discrimination method are reported as well. We give an example utilizing data on croup in children.  相似文献   

15.
ABSTRACT

Classification of data consisting of both categorical and continuous variables between two groups is often handled by the sample location linear discriminant function confined to each of the locations specified by the observed values of the categorical variables. Homoscedasticity of across-location conditional dispersion matrices of the continuous variables is often assumed. Quite often, interactions between continuous and categorical variables cause across-location heteroscedasticity. In this article, we examine the effect of heterogeneous across-location conditional dispersion matrices on the overall expected and actual error rates associated with the sample location linear discriminant function. Performance of the sample location linear discriminant function is evaluated against the results for the restrictive classifier adjusted for across-location heteroscedasticity. Conclusions based on a Monte Carlo study are reported.  相似文献   

16.
We consider classifying an object based on mixed continuous and discrete variables between two populations. Mixed discrete and continuous covariates with identical means in both populations are amongst the variables. Under the location model with homogeneous location specific conditional dispersion matrices for both populations, the Bayes rule is given. Classification is implemented by a plug-in version of the Bayes rule with full covariate adjustment. An asymptotic expansion of the overall expected error of the procedure is derived. Our findings generalize several classical results.  相似文献   

17.
判别分析已越来越受到人们的重视并取得了重要的应用成果,但应用中存在着简单套用的情况,对判别分析的适用性、判别效果的显著性、判别变量的判别能力以及判别函数的判别能力的检验等问题重视不够。为了更好地应用判别分析,就应对判别分析进行统计检验并建立统计检验体系,统计检验体系应包括:判别分析适用性检验,判别效果显著性检验,判别变量的判别能力检验和判别函数的判别能力检验。  相似文献   

18.
This paper discusses a supervised classification approach for the differential diagnosis of Raynaud's phenomenon (RP). The classification of data from healthy subjects and from patients suffering for primary and secondary RP is obtained by means of a set of classifiers derived within the framework of linear discriminant analysis. A set of functional variables and shape measures extracted from rewarming/reperfusion curves are proposed as discriminant features. Since the prediction of group membership is based on a large number of these features, the high dimension/small sample size problem is considered to overcome the singularity problem of the within-group covariance matrix. Results on a data set of 72 subjects demonstrate that a satisfactory classification of the subjects can be achieved through the proposed methodology.  相似文献   

19.
This article provides a strategy to identify the existence and direction of a causal effect in a generalized nonparametric and nonseparable model identified by instrumental variables. The causal effect concerns how the outcome depends on the endogenous treatment variable. The outcome variable, treatment variable, other explanatory variables, and the instrumental variable can be essentially any combination of continuous, discrete, or “other” variables. In particular, it is not necessary to have any continuous variables, none of the variables need to have large support, and the instrument can be binary even if the corresponding endogenous treatment variable and/or outcome is continuous. The outcome can be mismeasured or interval-measured, and the endogenous treatment variable need not even be observed. The identification results are constructive, and can be empirically implemented using standard estimation results.  相似文献   

20.
基于Fisher变换的Bayes判别方法探索   总被引:1,自引:0,他引:1       下载免费PDF全文
判别分析是三大多元统计分析方法之一,在许多领域都有广泛的应用。通常认为距离判别、Fisher判别和Bayes判别是三种不同的判别分析方法,本文的研究表明,距离判别与Bayes判别是两种实质的判别方法,前者实际依据的是百分位点或置信区间,后者实际依据的是概率。而著名的Fisher判别,只是依据方差分析的思想,对判别变量进行线性变换,然后用于距离判别,其实不能算是一种实质的判别方法。本文将Fisher变换与Bayes判别结合起来,即先做Fisher变换,再利用概率最大原则做Bayes判别,得到一种新的判别途径,可进一步提高判别效率。理论与实证分析表明,基于Fisher变换的Bayes判别,适用场合广泛,判别效率最高。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号