首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
In credit scoring, it is well known that AUC (the area under curve) can be calculated geometrically, by the probability of a correct ranking of a good and bad pair, and by the Wilcoxon Rank-Sum statistic. This three-way equivalence was first present by Hanley and McNeil in 1982 without considering tied scores and without giving analytical proofs. In this paper, we extend the three-way equivalence to the case with tied scores and provide analytic proofs for the three-way equivalence.  相似文献   

This paper provides a partial solution to a problem posed by J. Neyman (1965) regarding the characterization of multivariate negative binomial distribution based on the properties of regression. It is shown that some of the properties of regression characterize the form of the nonsingular dispersion matrix of the parent distribution, which, interestingly enough, corresponds to only two types viz. those of positive and negative multivariate binomial distributions.  相似文献   

In the area of diagnostics, it is common practice to leverage external data to augment a traditional study of diagnostic accuracy consisting of prospectively enrolled subjects to potentially reduce the time and/or cost needed for the performance evaluation of an investigational diagnostic device. However, the statistical methods currently being used for such leveraging may not clearly separate study design and outcome data analysis, and they may not adequately address possible bias due to differences in clinically relevant characteristics between the subjects constituting the traditional study and those constituting the external data. This paper is intended to draw attention in the field of diagnostics to the recently developed propensity score-integrated composite likelihood approach, which originally focused on therapeutic medical products. This approach applies the outcome-free principle to separate study design and outcome data analysis and can mitigate bias due to imbalance in covariates, thereby increasing the interpretability of study results. While this approach was conceived as a statistical tool for the design and analysis of clinical studies for therapeutic medical products, here, we will show how it can also be applied to the evaluation of sensitivity and specificity of an investigational diagnostic device leveraging external data. We consider two common scenarios for the design of a traditional diagnostic device study consisting of prospectively enrolled subjects, which is to be augmented by external data. The reader will be taken through the process of implementing this approach step-by-step following the outcome-free principle that preserves study integrity.  相似文献   

The Receiver Operating Characteristic (ROC) curve and the Area Under the Curve (AUC) of the ROC curve are widely used in discovery to compare the performance of diagnostic and prognostic assays. The ROC curve has the advantage that it is independent of disease prevalence. However, in this note, we remind scientists and clinicians that the performance of an assay upon translation to the clinic is critically dependent upon that very same prevalence. Without an understanding of prevalence in the test population, even robust bioassays with excellent ROC characteristics may perform poorly in the clinic. While the exact prevalence in the target population is not always known, simple plots of candidate assay performance as a function of prevalence rate give a better understanding of the likely real‐world performance and a greater understanding of the likely impact of variation in that prevalence on translation to the clinic.  相似文献   

Many exploratory studies such as microarray experiments require the simultaneous comparison of hundreds or thousands of genes. It is common to see that most genes in many microarray experiments are not expected to be differentially expressed. Under such a setting, a procedure that is designed to control the false discovery rate (FDR) is aimed at identifying as many potential differentially expressed genes as possible. The usual FDR controlling procedure is constructed based on the number of hypotheses. However, it can become very conservative when some of the alternative hypotheses are expected to be true. The power of a controlling procedure can be improved if the number of true null hypotheses (m 0) instead of the number of hypotheses is incorporated in the procedure [Y. Benjamini and Y. Hochberg, On the adaptive control of the false discovery rate in multiple testing with independent statistics, J. Edu. Behav. Statist. 25(2000), pp. 60–83]. Nevertheless, m 0 is unknown, and has to be estimated. The objective of this article is to evaluate some existing estimators of m 0 and discuss the feasibility of these estimators in incorporating into FDR controlling procedures under various experimental settings. The results of simulations can help the investigator to choose an appropriate procedure to meet the requirement of the study.  相似文献   

Some matrix representations of diverse diagonal arrays are studied in this work; the results allow new definitions of classes of elliptical distributions indexed by kernels mixing Hadamard and usual products. A number of applications are derived in the setting of prior densities from the Bayesian multivariate regression model and families of non-elliptical distributions, such as the matrix multivariate generalized Birnbaum–Saunders density. The philosophy of the research about matrix representations of quadratic and inverse quadratic forms can be extended as a methodology for exploring possible new applications in non-standard distributions, matrix transformations and inference.  相似文献   

We introduce a matrix operator, which we call “vecd” operator. This operator stacks up “diagonals” of a symmetric matrix. This operator is more convenient for some statistical analyses than the commonly used “vech” operator. We show an explicit relationship between the vecd and vech operators. Using this relationship, various properties of the vecd operator are derived. As applications of the vecd operator, we derive concise and explicit expressions of the Wald and score tests for equal variances of a multivariate normal distribution and for the diagonality of variance coefficient matrices in a multivariate generalized autoregressive conditional heteroscedastic (GARCH) model, respectively.  相似文献   

We derive Bayesian interval estimators for the differences in the true positive rates and false positive rates of two dichotomous diagnostic tests applied to the members of two distinct populations. The populations have varying disease prevalences with unverified negatives. We compare the performance of the Bayesian credible interval to the Wald interval using Monte Carlo simulation for a spectrum of different TPRs, FPRs, and sample sizes. For the case of a low TPR and low FPR, we found that a Bayesian credible interval with relatively noninformative priors performed well. We obtain similar interval comparison results for the cases of a high TPR and high FPR, a high TPR and low FPR, and of a high TPR and mixed FPR after incorporating mildly informative priors.  相似文献   

In this article, by using the constant and random selection matrices, several properties of the maximum likelihood (ML) estimates and the ML estimator of a normal distribution with missing data are derived. The constant selection matrix allows us to obtain an explicit form of the ML estimates and the exact relationship between the EM algorithm and the score function. The random selection matrix allows us to clarify how the missing-data mechanism works in the proof of the consistency of the ML estimator, to derive the asymptotic properties of the sequence by the EM algorithm, and to derive the information matrix.  相似文献   

Yo Sheena† 《Statistics》2013,47(5):387-399
We consider the orthogonally invariant estimation problem of the inverse of the scale matrix of Wishart distribution using Stein's loss (entropy loss). In this problem Krishnamoorthy and Gupta [2] Krishnamoorthy, K. and Gupta, A. K. (1989). Improved minimax estimation of a normal precision matrix. Canad. J. Statist., 17: 91102. [Crossref], [Web of Science ®] [Google Scholar] proposed an estimator and showed its good performance in a Monte Carlo simulation. They conjectured their estimator is minimax. Perron [3] Perron, F. (1997). On a conjecture of Krishnamoorthy and Gupta. J. Multivariate Anal., 62: 110120.  [Google Scholar] proved its minimaxity for p?=?2. In this paper we prove it for p?=?3 by using a new method.  相似文献   

The accuracy of a binary diagnostic test is usually measured in terms of its sensitivity and its specificity, or through positive and negative predictive values. Another way to describe the validity of a binary diagnostic test is the risk of error and the kappa coefficient of the risk of error. The risk of error is the average loss that is caused when incorrectly classifying a non-diseased or a diseased patient, and the kappa coefficient of the risk of error is a measure of the agreement between the diagnostic test and the gold standard. In the presence of partial verification of the disease, the disease status of some patients is unknown, and therefore the evaluation of a diagnostic test cannot be carried out through the traditional method. In this paper, we have deduced the maximum likelihood estimators and variances of the risk of error and of the kappa coefficient of the risk of error in the presence of partial verification of the disease. Simulation experiments have been carried out to study the effect of the verification probabilities on the coverage of the confidence interval of the kappa coefficient.  相似文献   

For the characteristic values T1 of the matrix V:=Diag(p)-ppT with p=(p1,...,pk), p1≥p2≥...≥pk≥pk+1>0 and p1+p2+...+pk+pk+1=1 the inequalities p1≥τ1≥p2≥τ2≥...≥pk≥τk>0 are given by RONNING (1982). These inequalities give, if p and pk+1 are unknown, the upper bound 1≥T1. However, in this note the bound 1/2≥T1 is derived. V is proportional to the covariance matrix for multinomial, Dirichlet and multivariate hypergeometric distributions. A statistical application for the multinomial distribution is given.  相似文献   

This paper introduces a new information-theoretic measure of complexity called ICOMP as a decision rule for model selection and evaluation for multivariate linear models. The development of ICOMP is based on the generalization and utilization of the covariance complexity index of van Emden (1971) in estimation of the multivariate linear model. ICOMP is motivated by Akaike's (1973) Information Criterion (AIC), but it is a different procedure than AIC. In linear or nonlinear statistical models ICOMP uses an information-based characterization of: (i) the covariance matrix properties of the parameter estimates of a model starting from their finite sampling distributions, and (ii) the complexity of the inverse-Fisher information matrix (i-FIM) as a new criterion of achievable accuracy of the model As a result, it provides a trade-off between the accuracy of the parameter estimates and the interaction of the residuals of a model via the measure of complexity of their respective covariances. It controls the risks of both insufficient and overparameterized models, and incorporates the assumption of dependence and the independence of the residuals in one criterion function. A model with minimum ICOMP is chosen to be the best model among all possible competing alternative models. ICOMP relieves the researcher of any need to consider the parameter dimension of a model explicitly. A real numerical example is shown in subset selection of variables in multivariate regression analysis to demonstrate the utility and versatility of the new approach.  相似文献   

Two results on the unimodality of the Dirichlet-multinomial distribution are proved, and a further result is alos proved on the identifiability of mixtures of multinomial distributions. These properties are used in developing a method for eliciting a Dirchlet prior distribution. The elicitation method is based on the mode, and region around the mode, of the Dirichlet-multinomial predictive distribution.  相似文献   

Genomic selection is today a hot topic in genetics. It consists in predicting breeding values of selection candidates, using the large number of genetic markers now available owing to the recent progress in molecular biology. One of the most popular methods chosen by geneticists is ridge regression. We focus on some predictive aspects of ridge regression and present theoretical results regarding the accuracy criteria, that is, the correlation between predicted value and true value. We show the influence of singular values, the regularization parameter, and the projection of the signal on the space spanned by the rows of the design matrix. Asymptotic results in a high‐dimensional framework are given; in particular, we prove that the convergence to optimal accuracy highly depends on a weighted projection of the signal on each subspace. We discuss on how to improve the prediction. Last, illustrations on simulated and real data are proposed.  相似文献   


Examining the robustness properties of maximum likelihood (ML) estimators of parameters in exponential power and generalized t distributions has been considered together. The well-known asymptotic properties of ML estimators of location, scale and added skewness parameters in these distributions are studied. The ML estimators for location, scale and scale variant (skewness) parameters are represented as an iterative reweighting algorithm (IRA) to compute the estimates of these parameters simultaneously. The artificial data are generated to examine performance of IRA for ML estimators of parameters simultaneously. We make a comparison between these two distributions to test the fitting performance on real data sets. The goodness of fit test and information criteria approve that robustness and fitting performance should be considered together as a key for modeling issue to have the best information from real data sets.  相似文献   


This paper develops almost sure convergence for sums of negatively superadditive dependent random vectors in Hilbert spaces, we obtain Chung type SLLN and the Jaite type SLLN for sequences of negatively superadditive dependent random vectors in Hilbert spaces. Rate of convergence is studied through considering almost sure convergence to 0 of tail series. As an application, the almost sure convergence of degenerate von Mises-statistics is investigated.  相似文献   

We derive the optimal regression function (i.e., the best approximation in the L2 sense) when the vector of covariates has a random dimension. Furthermore, we consider applications of these results to problems in statistical regression and classification with missing covariates. It will be seen, perhaps surprisingly, that the correct regression function for the case with missing covariates can sometimes perform better than the usual regression function corresponding to the case with no missing covariates. This is because even if some of the covariates are missing, an indicator random variable δδ, which is always observable, and is equal to 1 if there are no missing values (and 0 otherwise), may have far more information and predictive power about the response variable Y than the missing covariates do. We also propose kernel-based procedures for estimating the correct regression function nonparametrically. As an alternative estimation procedure, we also consider the least-squares method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号