首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
Cohen's kappa coefficient is traditionally used to quantify the degree of agreement between two raters on a nominal scale. Correlated kappas occur in many settings (e.g., repeated agreement by raters on the same individuals, concordance between diagnostic tests and a gold standard) and often need to be compared. While different techniques are now available to model correlated κ coefficients, they are generally not easy to implement in practice. The present paper describes a simple alternative method based on the bootstrap for comparing correlated kappa coefficients. The method is illustrated by examples and its type I error studied using simulations. The method is also compared with the generalized estimating equations of the second order and the weighted least-squares methods.  相似文献   

2.
The kappa coefficient is a widely used measure for assessing agreement on a nominal scale. Weighted kappa is an extension of Cohen's kappa that is commonly used for measuring agreement on an ordinal scale. In this article, it is shown that weighted kappa can be computed as a function of unweighted kappas. The latter coefficients are kappa coefficients that correspond to smaller contingency tables that are obtained by merging categories.  相似文献   

3.
Cohen's kappa statistic is the conventional method that is used widely in measuring agreement between two responses when they are categorical. In this article, we develop a fixed-effects modeling of Cohen's kappa for bivariate multinomial data which reduces to Cohen's kappa under certain conditions and hence can be considered as a generalization of the conventional Cohen's kappa. Also, this method can easily be adapted as a generalization of Cohen's weighted kappa. Properties of the proposed method are provided. Large sample performance is investigated through bootstrap simulation studies followed by two illustrative examples.  相似文献   

4.
In this paper, we examine the performance of Anderson's classification statistic with covariate adjustment in comparison with the usual Anderson's classification statistic without covariate adjustment in a two-population normal covariate classification problem. The same problem has been investigated using different methods of comparison by some authors. See the bibliography. The aim of this paper is to give a direct comparison based upon the asymptotic probabilities of misclassification. It is shown that for large equal sample size of a training sample from each population, Anderson's classification statistic with covariate adjustment and cut-off point equal to zero, has better performance.  相似文献   

5.
The authors show how Kendall's tau can be adapted to test against serial dependence in a univariate time series context. They provide formulas for the mean and variance of circular and noncircular versions of this statistic, and they prove its asymptotic normality under the hypothesis of independence. They present also a Monte Carlo study comparing the power and size of a test based on Kendall's tau with the power and size of competing procedures based on alternative parametric and nonparametric measures of serial dependence. In particular, their simulations indicate that Kendall's tau outperforms Spearman's rho in detecting first‐order autoregressive dependence, despite the fact that these two statistics are asymptotically equivalent under the null hypothesis, as well as under local alternatives.  相似文献   

6.
It is often of interest to measure the agreement between a number of raters when an outcome is nominal or ordinal. The kappa statistic is used as a measure of agreement. The statistic is highly sensitive to the distribution of the marginal totals and can produce unreliable results. Other statistics such as the proportion of concordance, maximum attainable kappa and prevalence and bias adjusted kappa should be considered to indicate how well the kappa statistic represents agreement in the data. Each kappa should be considered and interpreted based on the context of the data being analysed. Copyright © 2014 JohnWiley & Sons, Ltd.  相似文献   

7.
Cohen’s kappa, a special case of the weighted kappa, is a chance‐corrected index used extensively to quantify inter‐rater agreement in validation and reliability studies. In this paper, it is shown that in inter‐rater agreement for 2 × 2 tables, for two raters having the same number of opposite ratings, the weighted kappa, Cohen’s kappa, Peirce, Yule, Maxwell and Pilliner and Fleiss indices are identical. This implies that the weights in the weighted kappa are less important under such assumptions. Equivalently, it is shown that for two partitions of the same data set, resulting from two clustering algorithms having the same number of clusters with equal cluster sizes, these similarity indices are identical. Hence, an important characterisation is formulated relating equal numbers of clusters with the same cluster sizes to the presence/absence of a trait in a reliability study. Two numerical examples that exemplify the implication of this relationship are presented.  相似文献   

8.
ABSTRACT

A simple test based on Gini's mean difference is proposed to test the hypothesis of equality of population variances. Using 2000 replicated samples and empirical distributions, we show that the test compares favourably with Bartlett's and Levene's test for the normal population. Also, it is more powerful than Bartlett's and Levene's tests for some alternative hypotheses for some non-normal distributions and more robust than the other two tests for large sample sizes under some alternative hypotheses. We also give an approximate distribution to the test statistic to enable one to calculate the nominal levels and P-values.  相似文献   

9.
Dichotomization of continuous variables to discriminate a dichotomous outcome is often useful in statistical applications. If a true threshold for a continuous variable exists, the challenge is identifying it. This paper examines common methods for dichotomization to identify which ones recover a true threshold. We provide mathematical and numeric proofs demonstrating that maximizing the odds ratio, Youden’s statistic, Gini Index, chi-square statistic, relative risk and kappa statistic all theoretically recover a true threshold. A simulation study evaluating the ability of these statistics to recover a threshold when sampling from a population indicates that maximizing the chi-square statistic and Gini Index have the smallest bias and variability when the probability of being larger than the threshold is small while maximizing Kappa or Youden’s statistics is best when this probability is larger. Maximizing odds ratio is the most variable and biased of the methods.  相似文献   

10.
Kappa and B assess agreement between two observers independently classifying N units into k categories. We study their behavior under zero cells in the contingency table and unbalanced asymmetric marginal distributions. Zero cells arise when a cross-classification is never endorsed by both observers; biased marginal distributions occur when some categories are preferred differently between the observers. Simulations studied the distributions of the unweighted and weighted statistics for k=4, under fixed proportions of diagonal agreement and different patterns off-diagonal, with various sample sizes, and under various zero cell count scenarios. Marginal distributions were first uniform and homogeneous, and then unbalanced asymmetric distributions. Results for unweighted kappa and B statistics were comparable to work of Muñoz and Bangdiwala, even with zero cells. A slight increased variation was observed as the sample size decreased. Weighted statistics did show greater variation as the number of zero cells increased, with weighted kappa increasing substantially more than weighted B. Under biased marginal distributions, weighted kappa with Cicchetti weights were higher than with squared weights. Both statistics for observer agreement behaved well under zero cells. The weighted B was less variable than the weighted kappa under similar circumstances and different weights. In general, B's performance and graphical interpretation make it preferable to kappa under the studied scenarios.  相似文献   

11.
Two statistics are suggested for testing the equality of two normal percentiles where population means and variances are unknown. The first is based on the generalized likelihood ratio test (LRT), the second on Cochran's statistic used in the Behrens-Fisher problem. Size and power comparisons are made by using simulation and asympototic theory.  相似文献   

12.
In this paper, multivariate two-sample testing problems were examined based on the Jure?ková–Kalina's ranks of distances. The multivariate two-sample rank test based on the modified Baumgartner statistic for the two-sided alternative was proposed. The proposed statistic was a randomized statistic. Simulations were used to investigate the power of the suggested statistic for various population distributions.  相似文献   

13.
The Bartlett's test (1937) for equality of variances is based on the χ2 distribution approximation. This approximation deteriorates either when the sample size is small (particularly < 4) or when the population number is large. According to a simulation investigation, we find a similar varying trend for the mean differences between empirical distributions of Bartlett's statistics and their χ2 approximations. By using the mean differences to represent the distribution departures, a simple adjustment approach on the Bartlett's statistic is proposed on the basis of equal mean principle. The performance before and after adjustment is extensively investigated under equal and unequal sample sizes, with number of populations varying from 3 to 100. Compared with the traditional Bartlett's statistic, the adjusted statistic is distributed more closely to χ2 distribution, for homogeneity samples from normal populations. The type I error is well controlled and the power is a little higher after adjustment. In conclusion, the adjustment has good control on the type I error and higher power, and thus is recommended for small samples and large population number when underlying distribution is normal.  相似文献   

14.
The authors discuss a graph‐based approach for testing spatial point patterns. This approach falls under the category of data‐random graphs, which have been introduced and used for statistical pattern recognition in recent years. The authors address specifically the problem of testing complete spatial randomness against spatial patterns of segregation or association between two or more classes of points on the plane. To this end, they use a particular type of parameterized random digraph called a proximity catch digraph (PCD) which is based on relative positions of the data points from various classes. The statistic employed is the relative density of the PCD, which is a U‐statistic when scaled properly. The authors derive the limiting distribution of the relative density, using the standard asymptotic theory of U‐statistics. They evaluate the finite‐sample performance of their test statistic by Monte Carlo simulations and assess its asymptotic performance via Pitman's asymptotic efficiency, thereby yielding the optimal parameters for testing. They further stress that their methodology remains valid for data in higher dimensions.  相似文献   

15.
In this paper, a hypothesis test for heteroscedasticity is proposed in a nonparametric regression model. The test statistic, which uses the residuals from a nonparametric fit of the mean function, is based on an adaptation of the well-known Levene's test. Using the recent theory for analysis of variance when the number of factor levels goes to infinity, the asymptotic distribution of the test statistic is established under the null hypothesis of homocedasticity and under local alternatives. Simulations suggest that the proposed test performs well in several situations, especially when the variance is a nonlinear function of the predictor.  相似文献   

16.
We derive two C(α) statistics and the likelihood-ratio statistic for testing the equality of several correlation coefficients, from k ≥ 2 independent random samples from bivariate normal populations. The asymptotic relationship of the C(α) tests, the likelihood-ratio test, and a statistic based on the normality assumption of Fisher's Z-transform of the sample correlation coefficient is established. A comparative performance study, in terms of size and power, is then conducted by Monte Carlo simulations. The likelihood-ratio statistic is often too liberal, and the statistic based on Fisher's Z-transform is conservative. The performance of the two C(α) statistics is identical. They maintain significance level well and have almost the same power as the other statistics when empirically calculated critical values of the same size are used. The C(α) statistic based on a noniterative estimate of the common correlation coefficient (based on Fisher's Z-transform) is recommended.  相似文献   

17.
We encountered a problem in which a study's experimental design called for the use of paired data, but the pairing between subjects had been lost during the data collection procedure. Thus we were presented with a data set consisting of pre and post responses but with no way of determining the dependencies between our observed pre and post values. The aim of the study was to assess whether an intervention called Self-Revelatory Performance had an impact on participant's perceptions of Alzheimer's disease. The participant's responses were measured on an Affect grid before the intervention and on a separate grid after. To address the underlying question in light of the lost pairing we utilized a modified bootstrap approach to create a null hypothesized distribution for our test statistic, which was the distance between the two Affect Grids' Centers of Mass. Using this approach we were able to reject our null hypothesis and conclude that there was evidence the intervention influenced perceptions about the disease.  相似文献   

18.
This study examines extensions of McNemar's Test with multinomial responses, and proposes a linear weighting scheme, based on the distance of the response change, that is applied to one of these extensions (Bowker's test). This weighted version of Bowker's test is then appropriate for ordinal response variables. A Monte Carlo simulation was conducted to examine the Type I error rate of the weighted Bowker's test for a cross-classification table based on a five-category ordinal response scale. The weighted Bowker's test was also applied to a data set involving change in student attitudes towards mathematics. The results of the weighted Bowker's test were compared with the results of Bowker's test applied to the same set of data.  相似文献   

19.
The weighted kappa coefficient of a binary diagnostic test is a measure of the beyond-chance agreement between the diagnostic test and the gold standard, and is a measure that allows us to assess and compare the performance of binary diagnostic tests. In the presence of partial disease verification, the comparison of the weighted kappa coefficients of two or more binary diagnostic tests cannot be carried out ignoring the individuals with an unknown disease status, since the estimators obtained would be affected by verification bias. In this article, we propose a global hypothesis test based on the chi-square distribution to simultaneously compare the weighted kappa coefficients when in the presence of partial disease verification the missing data mechanism is ignorable. Simulation experiments have been carried out to study the type I error and the power of the global hypothesis test. The results have been applied to the diagnosis of coronary disease.  相似文献   

20.
The performance of Anderson's classification statistic based on a post-stratified random sample is examined. It is assumed that the training sample is a random sample from a stratified population consisting of two strata with unknown stratum weights. The sample is first segregated into the two strata by post-stratification. The unknown parameters for each of the two populations are then estimated and used in the construction of the plug-in discriminant. Under this procedure, it is shown that additional estimation of the stratum weight will not seriously affect the performance of Anderson's classification statistic. Furthermore, our discriminant enjoys a much higher efficiency than the procedure based on an unclassified sample from a mixture of normals investigated by Ganesalingam and McLachlan (1978).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号