首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Kappa and B assess agreement between two observers independently classifying N units into k categories. We study their behavior under zero cells in the contingency table and unbalanced asymmetric marginal distributions. Zero cells arise when a cross-classification is never endorsed by both observers; biased marginal distributions occur when some categories are preferred differently between the observers. Simulations studied the distributions of the unweighted and weighted statistics for k=4, under fixed proportions of diagonal agreement and different patterns off-diagonal, with various sample sizes, and under various zero cell count scenarios. Marginal distributions were first uniform and homogeneous, and then unbalanced asymmetric distributions. Results for unweighted kappa and B statistics were comparable to work of Muñoz and Bangdiwala, even with zero cells. A slight increased variation was observed as the sample size decreased. Weighted statistics did show greater variation as the number of zero cells increased, with weighted kappa increasing substantially more than weighted B. Under biased marginal distributions, weighted kappa with Cicchetti weights were higher than with squared weights. Both statistics for observer agreement behaved well under zero cells. The weighted B was less variable than the weighted kappa under similar circumstances and different weights. In general, B's performance and graphical interpretation make it preferable to kappa under the studied scenarios.  相似文献   

2.
Summary.  We detail a general method for measuring agreement between two statistics. An application is two ratios of directly standardized rates which differ only by the choice of the standard. If the statistics have a high value for the coefficient of agreement then the expected squared difference between the statistics is small relative to the variance of the average of the two statistics, and inferences vary little by changing statistics. The estimation of a coefficient of agreement between two statistics is not straightforward because there is only one pair of observed values, each statistic calculated from the data. We introduce estimators of the coefficient of agreement for two statistics and discuss their use, especially as applied to functions of standardized rates.  相似文献   

3.
It is often of interest to measure the agreement between a number of raters when an outcome is nominal or ordinal. The kappa statistic is used as a measure of agreement. The statistic is highly sensitive to the distribution of the marginal totals and can produce unreliable results. Other statistics such as the proportion of concordance, maximum attainable kappa and prevalence and bias adjusted kappa should be considered to indicate how well the kappa statistic represents agreement in the data. Each kappa should be considered and interpreted based on the context of the data being analysed. Copyright © 2014 JohnWiley & Sons, Ltd.  相似文献   

4.
Dichotomization of continuous variables to discriminate a dichotomous outcome is often useful in statistical applications. If a true threshold for a continuous variable exists, the challenge is identifying it. This paper examines common methods for dichotomization to identify which ones recover a true threshold. We provide mathematical and numeric proofs demonstrating that maximizing the odds ratio, Youden’s statistic, Gini Index, chi-square statistic, relative risk and kappa statistic all theoretically recover a true threshold. A simulation study evaluating the ability of these statistics to recover a threshold when sampling from a population indicates that maximizing the chi-square statistic and Gini Index have the smallest bias and variability when the probability of being larger than the threshold is small while maximizing Kappa or Youden’s statistics is best when this probability is larger. Maximizing odds ratio is the most variable and biased of the methods.  相似文献   

5.
A new statistical procedure for testing normality is proposed. The Q statistic is derived as the ratio of two linear combinations of the ordered random observations. The coefficients of the linear combinations are utilizing the expected values of the order statistics from the standard normal distribution. This test is omnibus to detect the deviations from normality that result from either skewness or kurtosis. The statistic is independent of the origin and the scale under the null hypothesis of normality, and the null distribution of Q can be very well approximated by the Cornish-Fisher expansion. The powers for various alternative distributions were compared with several other test statistics by simulations.  相似文献   

6.
We consider a fixed number of arbitrarily dependent random variables with a common symmetric marginal distribution. For each order statistic based on the variables, we determine a common optimal bound, dependent in a simple way on the sample size and number of order statistics, for various measures of dispersion of the order statistics, expressed in terms of the same dispersion measure of the single original variable. The dispersion measures are connected with the notion of M-functional of a random variable location with respect to a symmetric and convex loss function. The measure is defined as the expected loss paid for the discrepancy between the M-functional and the variable. The most popular examples are the median absolute deviation and variance.  相似文献   

7.
The authors describe a model‐based kappa statistic for binary classifications which is interpretable in the same manner as Scott's pi and Cohen's kappa, yet does not suffer from the same flaws. They compare this statistic with the data‐driven and population‐based forms of Scott's pi in a population‐based setting where many raters and subjects are involved, and inference regarding the underlying diagnostic procedure is of interest. The authors show that Cohen's kappa and Scott's pi seriously underestimate agreement between experts classifying subjects for a rare disease; in contrast, the new statistic is robust to changes in prevalence. The performance of the three statistics is illustrated with simulations and prostate cancer data.  相似文献   

8.
A discrepancy measure to assess model fitness against a nonparametric alternative is proposed. First, a Polya tree prior is constructed so that the centering distribution is the null. Second, the prior is updated in the light of data to obtain the posterior centering distribution as the alternative. Third, a Kullback-Leibler divergence type of test statistic is derived to assess the discrepancy between the two centering distributions. The properties of the test statistic are derived, and a power comparison with several well-known test statistics is conducted. The use of the test statistic is illustrated using network traffic data.  相似文献   

9.
In this paper the work of Pancheva (1984) for extreme order statistics under nonlinear normalization is extended to order statistics with variable ranks. Two new results are proved. The first is that under nonlinear normalization, the nondegenerate type (family of types) of the distribution functions with two finite growth points is a possible weak limit of any central order statistic with regular rank sequence. The second result is that the possible nondegenerate weak limits of any central order statistic with regular rank under the traditionally linear normalization and under the power normalization are the same. Finally, the class of all possible weak limits for lower and upper intermediate order statistics is derived under power normalization from the corresponding weak limits of extremes under power normalization.  相似文献   

10.
It is shown how various exact non-parametric inferences based on order statistics in one or two random samples can be generalized to situations with progressive type-II censoring, which is a kind of evolutionary right censoring. Ordinary type-II right censoring is a special case of such progressive censoring. These inferences include confidence intervals for a given parent quantile, prediction intervals for a given order statistic of a future sample, and related two-sample inferences based on exceedance probabilities. The proposed inferences are valid for any parent distribution with continuous distribution function. The key result is that each observable uncensored order statistic that becomes available with progressive type-II censoring can be represented as a mixture with known weights of underlying ordinary order statistics. The importance of this mixture representation lies in that various properties of such observable order statistics can be deduced immediately from well-known properties of ordinary order statistics.  相似文献   

11.
In this paper we consider the problem of testing exponentiality against IFR alternatives. A measure of deviation from exponentiality is developed and a class of test statistics are constructed on the basis of this measure. It is shown that the test statistic is an L-statistic. The asymptotic as well as the exact distributions of the test statistics are obtained and the test statistics are proved to be consistent. The Pitman efficiency has also been studied.  相似文献   

12.
Linear regression with compositional explanatory variables   总被引:1,自引:0,他引:1  
Compositional explanatory variables should not be directly used in a linear regression model because any inference statistic can become misleading. While various approaches for this problem were proposed, here an approach based on the isometric logratio (ilr) transformation is used. It turns out that the resulting model is easy to handle, and that parameter estimation can be done in like in usual linear regression. Moreover, it is possible to use the ilr variables for inference statistics in order to obtain an appropriate interpretation of the model.  相似文献   

13.
The maximum vertical distance between a receiver operating characteristic (ROC) curve and its chance diagonal is a common measure of effectiveness of the classifier that gives rise to this curve. This measure is known to be equivalent to a two-sample Kolmogorov–Smirnov statistic; so the absolute difference D between two such statistics is often used informally as a measure of difference between the corresponding classifiers. A significance test of D is of great practical interest, but the available Kolmogorov–Smirnov distribution theory precludes easy analytical construction of such a significance test. We, therefore, propose a Monte Carlo procedure for conducting the test, using the binormal model for the underlying ROC curves. We provide Splus/R routines for the computation, tabulate the results for a number of illustrative cases, apply the methods to some practical examples and discuss some implications.  相似文献   

14.
The purpose of this article is to investigate hypothesis testing in functional comparative calibration models. Wald type statistics are considered which are asymptotically distributed according to the chi-square distribution. The statistics are based on maximum likelihood, corrected score approach, and method of moment estimators of the model parameters, which are shown to be consistent and asymptotically normally distributed. Results of analytical and simulation studies seem to indicate that the Wald statistics based on the method of moment estimators and the corrected score estimators are, as expected, less efficient than the Wald type statistic based on the maximum likelihood estimators for small n. Wald statistic based on moment estimators are simpler to compute than the other Wald statistics tests and their performance improves significantly as n increases. Comparisons with an alternative F statistics proposed in the literature are also reported.  相似文献   

15.
Uniform order statistics generated by two simulation methods are compared by means of Pitman’s measure of closeness. This measure, as a probability, is shown to be asymptotically 1/2. Some results are also established for fixed points of the cumulative distribution function (CDF) for a uniform order statistic. These fixed points are important for calculations involving the joint distribution of these order statistics.  相似文献   

16.
Shiue and Bain proposed an approximate F statistic for testing equality of two gamma distribution scale parameters in presence of a common and unknown shape parameter. By generalizing Shiue and Bain's statistic we develop a new statistic for testing equality of L >= 2 gamma distribution scale parameters. We derive the distribution of the new statistic ESP for L = 2 and equal sample size situation. For other situations distribution of ESP is not known and test based on the ESP statistic has to be performed by using simulated critical values. We also derive a C(α) statistic CML and develop a likelihood ratio statistic, LR, two modified likelihood ratio statistics M and MLB and a quadratic statistic Q. The distribution of each of the statistics CML, LR, M, MLB and Q is asymptotically chi-square with L - 1 degrees of freedom. We then conducted a monte-carlo simulation study to compare the perfor- mance of the statistics ESP, LR, M, MLB, CML and Q in terms of size and power. The statistics LR, M, MLB and Q are in general liberal and do not show power advantage over other statistics. The statistic CML, based on its asymptotic chi-square distribution, in general, holds nominal level well. It is most powerful or nearly most powerful in most situations and is simple to use. Hence, we recommend the statistic CML for use in general. For better power the statistic ESP, based on its empirical distribution, is recommended for the special situation for which there is evidence in the data that λ1 < … < λL and n1 < … < nL, where λ1 …, λL are the scale parameters and n1,…, nL are the sample sizes.  相似文献   

17.
The Shapiro–Wilk statistic and modified statistics are widely used test statistics for normality. They are based on regression and correlation. The statistics for the complete data can be easily generalized to the censored data. In this paper, the distribution theory for the modified Shapiro–Wilk statistic is investigated when it is generalized to Type II right censored data. As a result, it is shown that the limit distribution of the statistic can be representable as the integral of a Brownian bridge. Also, the power comparison to the other procedure is performed.  相似文献   

18.
Given a probability measure on the unit square, the measure of the region under an empirical P – P -plot defines a two-sample rank statistic. Instances include trimmed and censored versions of the Mann–Whitney–Wilcoxon statistic and a class of statistics with applications in the analysis of receiver operating characteristic (ROC) curves. A large sample distribution for such a statistic is obtained, which is valid under sampling from general populations. Explicit results are presented for comparing arbitrary quantile segments of two populations. The results are not restricted to continuous data and incorporate adjustments for tied values in the discrete case. A multivariate version of the large sample distribution extends the class of tractable statistics in ROC analysis and facilitates the use of methods based on partial areas when the data are discrete.  相似文献   

19.
The aim of this paper is to investigate the robustness properties of likelihood inference with respect to rounding effects. Attention is focused on exponential families and on inference about a scalar parameter of interest, also in the presence of nuisance parameters. A summary value of the influence function of a given statistic, the local-shift sensitivity, is considered. It accounts for small fluctuations in the observations. The main result is that the local-shift sensitivity is bounded for the usual likelihood-based statistics, i.e. the directed likelihood, the Wald and score statistics. It is also bounded for the modified directed likelihood, which is a higher-order adjustment of the directed likelihood. The practical implication is that likelihood inference is expected to be robust with respect to rounding effects. Theoretical analysis is supplemented and confirmed by a number of Monte Carlo studies, performed to assess the coverage probabilities of confidence intervals based on likelihood procedures when data are rounded. In addition, simulations indicate that the directed likelihood is less sensitive to rounding effects than the Wald and score statistics. This provides another criterion for choosing among first-order equivalent likelihood procedures. The modified directed likelihood shows the same robustness as the directed likelihood, so that its gain in inferential accuracy does not come at the price of an increase in instability with respect to rounding.  相似文献   

20.
Power-divergence goodness-of-fit statistics have asymptotically a chi-squared distribution. Asymptotic results may not apply in small-sample situations, and the exact significance of a goodness-of-fit statistic may potentially be over- or under-stated by the asymptotic distribution. Several correction terms have been proposed to improve the accuracy of the asymptotic distribution, but their performance has only been studied for the equiprobable case. We extend that research to skewed hypotheses. Results are presented for one-way multinomials involving k = 2 to 6 cells with sample sizes N = 20, 40, 60, 80 and 100 and nominal test sizes f = 0.1, 0.05, 0.01 and 0.001. Six power-divergence goodness-of-fit statistics were investigated, and five correction terms were included in the study. Our results show that skewness itself does not affect the accuracy of the asymptotic approximation, which depends only on the magnitude of the smallest expected frequency (whether this comes from a small sample with the equiprobable hypothesis or a large sample with a skewed hypothesis). Throughout the conditions of the study, the accuracy of the asymptotic distribution seems to be optimal for Pearson's X2 statistic (the power-divergence statistic of index u = 1) when k > 3 and the smallest expected frequency is as low as between 0.1 and 1.5 (depending on the particular k, N and nominal test size), but a computationally inexpensive improvement can be obtained in these cases by using a moment-corrected h2 distribution. If the smallest expected frequency is even smaller, a normal correction yields accurate tests through the log-likelihood-ratio statistic G2 (the power-divergence statistic of index u = 0).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号