首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
Judging scholarly posters creates a challenge to assign the judges efficiently. If there are many posters and few reviews per judge, the commonly used balanced incomplete block design is not a feasible option. An additional challenge is an unknown number of judges before the event. We propose two connected near-balanced incomplete block designs that both satisfy the requirements of our setting: one that generates a connected assignment and balances the treatments and another one that further balances pairs of treatments. We describe both fixed and random effects models to estimate the population marginal means of the poster scores and rationalize the use of the random effects model. We evaluate the estimation accuracy and efficiency, especially the winning chance of the truly best posters, of the two designs in comparison with a random assignment via simulation studies. The two proposed designs both demonstrate accuracy and efficiency gain over the random assignment.  相似文献   

Relationships between concordance measures and the number of matches are established when there are n objects to be ranked by each of k judges under the hypothesis of independent and random ordering. These relationships generalize earlier results of Barton and David, who dealt with the case k = 2.  相似文献   

In several research areas such as psychology, social science, and medicine, studies are conducted in which objects should be ranked by different judges/raters and the concordance of the different rankings is then analyzed. In such studies, it is also frequently of interest to compare the rankings between different groups of judges, e.g. female vs. male judges or judges from different professions. In the two-group case, the two-group concordance test of Schucany & Frawley can be employed for such a comparison. In this article, we propose an extension of this test enabling the comparison of rankings from more than two groups of judges. This test aims to detect disagreement in the average rankings of the objects between k groups with an at least moderate intra-group concordance. We evaluate this test in an extensive simulation study and in an application to data from an aesthetics study. This simulation study shows that the proposed test is able to detect differences between average rankings and performs well even in situations in which the disagreement is comparably small or the intra-group concordance is inhomogeneous.  相似文献   

As assumed hypothetical consensus category corresponding to a case being classified provides a basis for assessment of reliability of judges. Equivalent judges are characterised by the joint probability distribution of the judge assignment and the consensus category. Estimates of the conditional probabilities of judge assignment given consensus category and of consensus category given judge assignments are indices of reliability. All parameters can be estimated if data include classifications of a number of cases by 3 or more judges. Restrictive assumptions are imposed to obtain models for data from classifications by two judges. Maximum likelihood estimation is discussed and illustrated by example for the 3 or more judges case.  相似文献   

When analysing ranking data from one or more groups of judges one may wish to allow for the possibility that the judges have paid more attention to the allocation of the extreme ranks, rather than to the intermediate ranks. In some cases they may have only worried about assigning the top ranks (1 and 2, say) while randomly allocating the remaining ones.

In another context, the analystmay wish to only take account of the agreement among judges with respect to extreme ranks (top or bottom, or both),

In such situations an analysis of concordance within, and between groups, if appropriate, should be able to deal with extreme ranks specifically. We propose a data analyticapproach, related to an analysis of diversity,which actyally permits an analysis of concordance for each rank separately.  相似文献   

A definition of concordance is derived from Rao's concept of a perfect diversity measure in order to identify aspects about which two populations of judges agree. In the case where each judge independently ranks a fixed set of items, the class of concordance measures based on the marginal distributions of the ranks is characterized by bi-affine similarity functions that measure how well pairs of judges tend to agree. This class contains population versions of several familiar indices of concordance, including Kendall's W. Concordance between two populations, refered to as intergroup concordance, is also scaled against its corresponding intragroup measures. Small sample properties of estimators of the ratio of inter-to-intra group concordance are investigated in a Monte Carlo study. An example is given to illustrate components of concordance attributable to subsets of items, and to contrast the proposed methods with previous analyses.  相似文献   

In numerous situations, we use ranks dataset to exhibit preferences of a group of respondents towards a set of items. While assigning ranks, judges may consider several factors contributing to overall ranks of items. In this study, an attempt is made to model factors influencing the judges’ evaluations of items through mixture models for preference datasets. Both the probabilistic features of the mixture distribution and inferential as well as computational issues emerging out of the maximum likelihood estimation are addressed. Moreover, empirical evidences from observed dataset confirming the plausibility of the proposed model to preference dataset are provided.  相似文献   

In the method of paired comparisons (PCs), treatments are compared on the basis of qualitative characteristics they possess, in the light of their sensory evaluations made by judges. However, there may emerge the situations where in addition to qualitative merits/worths, judges may assign quantitative weights to reflect/specify the relative importance of the treatments. In this study, an attempt is made to reconcile the qualitative and the quantitative PCs through assigning quantitative weights to treatments having qualitative merits using/extending the Bradley–Terry (BT) model. Behaviors of the existing BT model and the proposed weighted BT model are studied through the test of goodness-of-fit. Experimental and simulated data sets are used for illustration.  相似文献   

Ranking objects by a panel of judges is commonly used in situations where objective attributes cannot easily be measured or interpreted. Under the assumption that the judge independently arrive at their rankings by making pair wise comparisons among the objects in an attempt to reproduce a common baseline ranking w0, we develop and explore confidence regions and Bayesian highest posterior density credible regions for w0 with emphasis on very small sample sizes.  相似文献   

董毅 《统计教育》2004,(5):23-25
由于算术平均数易受极端数据的影响,现行由评委集体评选计分的方法,往往不能反映大多数评委的意见。本文分析了现行评分方法中的问题,提出了三种改进方法。  相似文献   

A right-censored ranking is what results when a judge ranks only the “top K” of M objects. Complete uncensored rankings constitute a special case. We present two measures of concordance among the rankings of N ≥ 2 such judges, both based on Spearman's footrule. One measure is unweighted, while the other gives greatest weight to the first rank, less to the second, and so on. We consider methods for calculating or estimating the P-values of the corresponding tests of the hypothesis of random ranking.  相似文献   

Preference decisions will usually depend on the characteristics of both the judges and the objects being judged. In the analysis of paired comparison data concerning European universities and students' characteristics, it is demonstrated how to incorporate subject-specific information into Bradley–Terry-type models. Using this information it is shown that preferences for universities and therefore university rankings are dramatically different for different groups of students. A log-linear representation of a generalized Bradley–Terry model is specified which allows simultaneous modelling of subject- and object-specific covariates and interactions between them. A further advantage of this approach is that standard software for fitting log-linear models, such as GLIM, can be used.  相似文献   

Nowadays, sensory properties of materials are subject to growing attention both in an hedonic point of view and in an utilitarian one. Hence, the formulation of the foundations of an instrumental metrological approach that will allow for the characterization of visual similarities between textures belonging to the same type becomes a challenge of the research activities in the domain of perception. In this paper, our specific objective is to link an instrumental approach of metrology of the assessment of visual textures with a metrology approach based on a softcopy experiment performed by human judges. The experiment consisted in ranking of isochromatic colored textures according to the visual contrast. A fixed effects additive model is considered for the analysis of the rank data collected from the softcopy experiment. The model is fitted to the data using a least-squares criterion. The resulting data analysis gives rise to a sensory scale that shows a non-linear correlation and a monotonic functional relationship with the physical attribute on which the ranking experiment is based. Furthermore, the capacity of the judges to discriminate the textures according to the visual contrast varies according to the color ranges and the textures types.  相似文献   


The presence of a maverick judge, one whose rankings differ greatly from the other members of a panel, can result in incorrect rankings and a sense of unfairness among contestants. We develop and explore the properties of a likelihood ratio test, assuming a Mallows type distribution, for the presence of a maverick judge when each judge selects his or her best k out of n objects, k ≤ n. Detection of a maverick judge, who may be viewed as a multivariate outlier, turns out to be very difficult unless the judges are very consistent and there are repeat observations on the panel.  相似文献   

David (1963) and Davidson & Farquhar(1976) contain extensive bibliographies of proposed approaches to problems involving paired comparisons. However, each of the discussed methods that is based on a hypothesis test, relies heavily on the assumption that all paired comparisons are made independently. In this paper we eliminate this assumption and develop a new procedure based on an adaptation of a statistic considered by Kendall & Babington Smith (1940). We show that their original test procedure substantially underestimates the true significance level if the comparisons are not made independently. Our modification utilizes the approach developed in Costello & Wolfe (1985) for the problem of agreement between two groups of judges and relies heavily on computer-generated tables.  相似文献   

Various procedures, mainly graphical are presented for analyzing large sets of ranking data in which the permutations are not equally likely. One method is based on box plots, the others are motivated by a model originally proposed by Mallows. The model is characterised by two parameters corresponding to location and dispersion. Graphical methods based on Q-Q plots are also discussed for comparing two groups of judges. The proposed methods are illustrated on an empirical data set.  相似文献   

The criminal courts in England and Wales may request the probation service to submit pre-sentence reports which are considered by magistrates and judges before making their sentencing decision. Pre-sentence reports must include an assessment of the risk of reoffending and the risk of harm to the public which the convicted offender presents. The offender group reconviction scale is a statistical aid to such risk assessment. We describe the scale and the statistical analysis on which it is based, and we discuss some statistical aspects of its interpretation and use.  相似文献   

In psychology, marketing research and sensory analysis paired comparisons which demand judges to evaluate the trade-off between two alternatives constitute a popular method of data collection. For this situation we present optimal designs in a discrete setting when the alternatives are specified by an analysis of variance model with main effects only. We employ combinatorial tools to achieve optimal designs which have sufficiently small sample sizes. Moreover, optimal designs are constructed when the number of factors presented is restricted for each pair of alternatives.  相似文献   

This paper presents a study on symmetry of repeated bi-phased data signals, in particular, on quantification of the deviation between the two parts of the signal. Three symmetry scores are defined using functional data techniques such as smoothing and registration. One score is related to the L 2-distance between the two parts of the signal, whereas the other two are constructed to specifically measure differences in amplitude and phase. Moreover, symmetry scores based on functional principal component analysis (PCA) are examined. The scores are applied to acceleration signals from a study on equine gait. The scores turn out to be highly associated with lameness, and their applicability for lameness quantification and detection is investigated. Four classification approaches turn out to give similar results. The scores describing amplitude and phase variation turn out to outperform the PCA scores when it comes to the classification of lameness.  相似文献   

The trend test is often used for the analysis of 2×K ordered categorical data, in which K pre-specified increasing scores are used. There have been discussions on how to assign these scores and the impact of the outcomes on different scores. The scores are often assigned based on the data-generating model. When this model is unknown, using the trend test is not robust. We discuss the weighted average of a trend test over all scientifically plausible choices of scores or models. This approach is more computationally efficient than a commonly used robust test MAX when K is large. Our discussion is for any ordered 2×K table, but simulation and applications to real data are focused on case-control genetic association studies. Although there is no single test optimal for all choices of scores, our numerical results show that some score averaging tests can achieve the performance of MAX.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号