首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
Judging scholarly posters creates a challenge to assign the judges efficiently. If there are many posters and few reviews per judge, the commonly used balanced incomplete block design is not a feasible option. An additional challenge is an unknown number of judges before the event. We propose two connected near-balanced incomplete block designs that both satisfy the requirements of our setting: one that generates a connected assignment and balances the treatments and another one that further balances pairs of treatments. We describe both fixed and random effects models to estimate the population marginal means of the poster scores and rationalize the use of the random effects model. We evaluate the estimation accuracy and efficiency, especially the winning chance of the truly best posters, of the two designs in comparison with a random assignment via simulation studies. The two proposed designs both demonstrate accuracy and efficiency gain over the random assignment.  相似文献   

2.
A definition of concordance is derived from Rao's concept of a perfect diversity measure in order to identify aspects about which two populations of judges agree. In the case where each judge independently ranks a fixed set of items, the class of concordance measures based on the marginal distributions of the ranks is characterized by bi-affine similarity functions that measure how well pairs of judges tend to agree. This class contains population versions of several familiar indices of concordance, including Kendall's W. Concordance between two populations, refered to as intergroup concordance, is also scaled against its corresponding intragroup measures. Small sample properties of estimators of the ratio of inter-to-intra group concordance are investigated in a Monte Carlo study. An example is given to illustrate components of concordance attributable to subsets of items, and to contrast the proposed methods with previous analyses.  相似文献   

3.
In several research areas such as psychology, social science, and medicine, studies are conducted in which objects should be ranked by different judges/raters and the concordance of the different rankings is then analyzed. In such studies, it is also frequently of interest to compare the rankings between different groups of judges, e.g. female vs. male judges or judges from different professions. In the two-group case, the two-group concordance test of Schucany & Frawley can be employed for such a comparison. In this article, we propose an extension of this test enabling the comparison of rankings from more than two groups of judges. This test aims to detect disagreement in the average rankings of the objects between k groups with an at least moderate intra-group concordance. We evaluate this test in an extensive simulation study and in an application to data from an aesthetics study. This simulation study shows that the proposed test is able to detect differences between average rankings and performs well even in situations in which the disagreement is comparably small or the intra-group concordance is inhomogeneous.  相似文献   

4.
A right-censored ranking is what results when a judge ranks only the “top K” of M objects. Complete uncensored rankings constitute a special case. We present two measures of concordance among the rankings of N ≥ 2 such judges, both based on Spearman's footrule. One measure is unweighted, while the other gives greatest weight to the first rank, less to the second, and so on. We consider methods for calculating or estimating the P-values of the corresponding tests of the hypothesis of random ranking.  相似文献   

5.
Abstract

The presence of a maverick judge, one whose rankings differ greatly from the other members of a panel, can result in incorrect rankings and a sense of unfairness among contestants. We develop and explore the properties of a likelihood ratio test, assuming a Mallows type distribution, for the presence of a maverick judge when each judge selects his or her best k out of n objects, k ≤ n. Detection of a maverick judge, who may be viewed as a multivariate outlier, turns out to be very difficult unless the judges are very consistent and there are repeat observations on the panel.  相似文献   

6.
多种综合评价方法的优劣判断研究   总被引:24,自引:1,他引:23       下载免费PDF全文
一、评价方法优劣判据的提出对于多指标多对象的综合评价 ,人们已提出许多不同的综合评价方法。传统评价方法有综合评分法、标准化计分法和功效系数法 ;多目标决策评价方法有优序法、引进次序法、双基点法和投影法 ;现代评价方法有主成分分析法 ,因子分析法、熵值法、模糊评价法、灰色关联分析评价法和层次分析评价法等。这些方法的应用相当活跃。但由于各种评价方法对原始数据的处理、权数的确定、评价方法本身掌握的标准和计算方法的不同 ,使评价结果存在着差异。这就向我们提出一个问题 ,即如何对各种综合评价方法进行优劣判断。我们认为…  相似文献   

7.
Abstract

An aspect of cluster analysis which has been widely studied in recent years is the weighting and selection of variables. Procedures have been proposed which are able to identify the cluster structure present in a data matrix when that structure is confined to a subset of variables. Other methods assess the relative importance of each variable as revealed by a suitably chosen weight. But when a cluster structure is present in more than one subset of variables and is different from one subset to another, those solutions as well as standard clustering algorithms can lead to misleading results. Some very recent methodologies for finding consensus classifications of the same set of units can be useful also for the identification of cluster structures in a data matrix, but each one seems to be only partly satisfactory for the purpose at hand. Therefore a new more specific procedure is proposed and illustrated by analyzing two real data sets; its performances are evaluated by means of a simulation experiment.  相似文献   

8.
Ranking objects by a panel of judges is commonly used in situations where objective attributes cannot easily be measured or interpreted. Under the assumption that the judge independently arrive at their rankings by making pair wise comparisons among the objects in an attempt to reproduce a common baseline ranking w0, we develop and explore confidence regions and Bayesian highest posterior density credible regions for w0 with emphasis on very small sample sizes.  相似文献   

9.

Engineers who conduct reliability tests need to choose the sample size when designing a test plan. The model parameters and quantiles are the typical quantities of interest. The large-sample procedure relies on the property that the distribution of the t -like quantities is close to the standard normal in large samples. In this paper, we use a new procedure based on both simulation and asymptotic theory to determine the sample size for a test plan. Unlike the complete data case, the t -like quantities are not pivotal quantities in general when data are time censored. However we show that the distribution of the t -like quantities only depend on the expected proportion failing and obtain the distributions by simulation for both complete and time censoring case when data follow Weibull distribution. We find that the large-sample procedure usually underestimates the sample size even when it is said to be 200 or more. The sample size given by the proposed procedure insures the requested nominal accuracy and confidence of the estimation when the test plan results in complete or time censored data. Some useful figures displaying the required sample size for the new procedure are also presented.  相似文献   

10.
In a Poisson process, it is well-known that the forward and backward recurrence times at a given time point t are independent random variables. In a renewal process, although the joint distribution of these quantities is known (asymptotically), it seems that very few results regarding their covariance function exist. In the present paper, we study this covariance and, in particular, we state both necessary and sufficient conditions for it to be positive, zero or negative in terms of reliability classifications and the coefficient of variation of the underlying inter-renewal and the associated equilibrium distribution. Our results apply either for an ordinary renewal process in the steady state or for a stationary process.  相似文献   

11.
When analysing ranking data from one or more groups of judges one may wish to allow for the possibility that the judges have paid more attention to the allocation of the extreme ranks, rather than to the intermediate ranks. In some cases they may have only worried about assigning the top ranks (1 and 2, say) while randomly allocating the remaining ones.

In another context, the analystmay wish to only take account of the agreement among judges with respect to extreme ranks (top or bottom, or both),

In such situations an analysis of concordance within, and between groups, if appropriate, should be able to deal with extreme ranks specifically. We propose a data analyticapproach, related to an analysis of diversity,which actyally permits an analysis of concordance for each rank separately.  相似文献   

12.
The authors consider the situation of incomplete rankings in which n judges independently rank ki ∈ {2, …, t} objects. They wish to test the null hypothesis that each judge picks the ranking at random from the space of ki! permutations of the integers 1, …, ki. The statistic considered is a generalization of the Friedman test in which the ranks assigned by each judge are replaced by real‐valued functions a(j, ki), 1 ≤ jkit of the ranks. The authors define a measure of pairwise similarity between complete rankings based on such functions, and use averages of such similarities to construct measures of the level of concordance of the judges' rankings. In the complete ranking case, the resulting statistics coincide with those defined by Hájek & ?idák (1967, p. 118), and Sen (1968). These measures of similarity are extended to the situation of incomplete rankings. A statistic is derived in this more general situation and its properties are investigated.  相似文献   

13.
ABSTRACT

Both philosophically and in practice, statistics is dominated by frequentist and Bayesian thinking. Under those paradigms, our courses and textbooks talk about the accuracy with which true model parameters are estimated or the posterior probability that they lie in a given set. In nonparametric problems, they talk about convergence to the true function (density, regression, etc.) or the probability that the true function lies in a given set. But the usual paradigms' focus on learning the true model and parameters can distract the analyst from another important task: discovering whether there are many sets of models and parameters that describe the data reasonably well. When we discover many good models we can see in what ways they agree. Points of agreement give us more confidence in our inferences, but points of disagreement give us less. Further, the usual paradigms’ focus seduces us into judging and adopting procedures according to how well they learn the true values. An alternative is to judge models and parameter values, not procedures, and judge them by how well they describe data, not how close they come to the truth. The latter is especially appealing in problems without a true model.  相似文献   

14.
The problem of n judges ranking r objects is considered in the situation where ties are permitted. Asymtotic distributions under the null hypothesis of complete randomness in the rankings are derived for the test statistics of average rank correlations between all pairs of ranking where the rank correlations are measured either by Spearman rho or Kendall tau. The relative efficeincies of these average rank correlatins are derived using approximate Bahadur slope and limiting pitman efficiency, and in both cases the Kendall statistic is shown to be more efficient. Some interpretatins of these and related results are also given.  相似文献   

15.
Storage reliability that measures the ability of products in a dormant state to keep their required functions is studied in this paper. Unlike the operational reliability, storage reliability for certain types of products may not be always 100% at the beginning of storage since there are existing possible initial failures that are normally neglected in the models of storage reliability. In this paper, a new combinatorial approach, the nonparametric measure for the estimates of the number of failed products and the current reliability at each testing time in storage, and the parametric measure for the estimates of the initial reliability and the failure rate based on the exponential reliability function, is proposed for estimating and predicting the storage reliability with possible initial failures. The proposed method has taken into consideration that the initial failure and the reliability testing data, before and during the storage process, are available for providing more accurate estimates of both initial failure probability and the probability of storage failures. When storage reliability prediction that is the main concern in this field should be made, the nonparametric estimates of failure numbers can be used into the parametric models for the failure process in storage. In the case of exponential models, the assessment and prediction method for storage reliability is provided in this paper. Finally, numerical examples are given to illustrate the method. Furthermore, a detailed comparison between the proposed method and the traditional method, for examining the rationality of assessment and prediction on the storage reliability, is presented. The results should be useful for planning a storage environment, decision-making concerning the maximum length of storage, and identifying the production quality.  相似文献   

16.
Suppose it is desired to partition a distribution into k groups (classes) using squared error or absolute error as the mea¬sure of information retained. An algorithm to obtain the optimal boundaries (or class probabilities) is giTen. For the case of squared error optimal class probabilities were obtained for k = 2 to 15 for beta (for various values of the parameters), chi-square (12 d.f.) exponential, normals and uniform distributions. Results obtained are compared and analysed in light of existing papers, Special attention is given to the case k =5, corresponding to the assignment of the letter grades A, B, C, D9 P in courses, and to the case k = 9 corresponding to stanines.  相似文献   

17.
Frequently in clinical and epidemiologic studies, the event of interest is recurrent (i.e., can occur more than once per subject). When the events are not of the same type, an analysis which accounts for the fact that events fall into different categories will often be more informative. Often, however, although event times may always be known, information through which events are categorized may potentially be missing. Complete‐case methods (whose application may require, for example, that events be censored when their category cannot be determined) are valid only when event categories are missing completely at random. This assumption is rather restrictive. The authors propose two multiple imputation methods for analyzing multiple‐category recurrent event data under the proportional means/rates model. The use of a proper or improper imputation technique distinguishes the two approaches. Both methods lead to consistent estimation of regression parameters even when the missingness of event categories depends on covariates. The authors derive the asymptotic properties of the estimators and examine their behaviour in finite samples through simulation. They illustrate their approach using data from an international study on dialysis.  相似文献   

18.
This article analyzes scores given by judges of figure skating at the 1980 Winter Olympics. Judges' scores are found to be highly correlated, with little evidence of scoring along political lines. However, an analysis of variance shows a small but consistent “patriotic” bias; judges tend to give higher scores to contestants from their own country. The influence of such effects on final placings is estimated.  相似文献   

19.
In market research and some other areas, it is common that a sample of n judges (consumers, evaluators, etc.) are asked to independently rank a series of k objects or candidates. It is usually difficult to obtain the judges' full cooperation to completely rank all k objects. A practical way to overcome this difficulty is to give each judge the freedom to choose the number of top candidates he is willing to rank. A frequently encountered question in this type of survey is how to select the best object or candidate from the incompletely ranked data. This paper proposes a subset selection procedure which constructs a random subset of all the k objects involved in the survey such that the best object is included in the subset with a prespecified confidence. It is shown that the proposed subset selection procedure is distribution-free over a very broad class of underlying distributions. An example from a market research study is used to illustrate the proposed procedure.  相似文献   

20.
Cohen’s kappa is a weighted average   总被引:1,自引:0,他引:1  
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号