首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 453 毫秒
1.
Summary. In the psychosocial and medical sciences, some studies are designed to assess the agreement between different raters and/or different instruments. Often the same sample will be used to compare the agreement between two or more assessment methods for simplicity and to take advantage of the positive correlation of the ratings. Although sample size calculations have become an important element in the design of research projects, such methods for agreement studies are scarce. We adapt the generalized estimating equations approach for modelling dependent κ -statistics to estimate the sample size that is required for dependent agreement studies. We calculate the power based on a Wald test for the equality of two dependent κ -statistics. The Wald test statistic has a non-central χ 2-distribution with non-centrality parameter that can be estimated with minimal assumptions. The method proposed is useful for agreement studies with two raters and two instruments, and is easily extendable to multiple raters and multiple instruments. Furthermore, the method proposed allows for rater bias. Power calculations for binary ratings under various scenarios are presented. Analyses of two biomedical studies are used for illustration.  相似文献   

2.
Abstract.  Hazard rate estimation is an alternative to density estimation for positive variables that is of interest when variables are times to event. In particular, it is here shown that hazard rate estimation is useful for seismic hazard assessment. This paper suggests a simple, but flexible, Bayesian method for non-parametric hazard rate estimation, based on building the prior hazard rate as the convolution mixture of a Gaussian kernel with an exponential jump-size compound Poisson process. Conditions are given for a compound Poisson process prior to be well-defined and to select smooth hazard rates, an elicitation procedure is devised to assign a constant prior expected hazard rate while controlling prior variability, and a Markov chain Monte Carlo approximation of the posterior distribution is obtained. Finally, the suggested method is validated in a simulation study, and some Italian seismic event data are analysed.  相似文献   

3.
Sample covariance matrices play a central role in numerous popular statistical methodologies, for example principal components analysis, Kalman filtering and independent component analysis. However, modern random matrix theory indicates that, when the dimension of a random vector is not negligible with respect to the sample size, the sample covariance matrix demonstrates significant deviations from the underlying population covariance matrix. There is an urgent need to develop new estimation tools in such cases with high‐dimensional data to recover the characteristics of the population covariance matrix from the observed sample covariance matrix. We propose a novel solution to this problem based on the method of moments. When the parametric dimension of the population spectrum is finite and known, we prove that the proposed estimator is strongly consistent and asymptotically Gaussian. Otherwise, we combine the first estimation method with a cross‐validation procedure to select the unknown model dimension. Simulation experiments demonstrate the consistency of the proposed procedure. We also indicate possible extensions of the proposed estimator to the case where the population spectrum has a density.  相似文献   

4.
ABSTRACT

Online consumer product ratings data are increasing rapidly. While most of the current graphical displays mainly represent the average ratings, Ho and Quinn proposed an easily interpretable graphical display based on an ordinal item response theory (IRT) model, which successfully accounts for systematic interrater differences. Conventionally, the discrimination parameters in IRT models are constrained to be positive, particularly in the modeling of scored data from educational tests. In this article, we use real-world ratings data to demonstrate that such a constraint can have a great impact on the parameter estimation. This impact on estimation was explained through rater behavior. We also discuss correlation among raters and assess the prediction accuracy for both the constrained and the unconstrained models. The results show that the unconstrained model performs better when a larger fraction of rater pairs exhibit negative correlations in ratings.  相似文献   

5.
"This article demonstrates the value of microdata for understanding the effect of wages on life cycle fertility dynamics. Conventional estimates of neoclassical economic fertility models obtained from linear aggregate time series regressions are widely criticized for being nonrobust when adjusted for serial correlation. Moreover, the forecasting power of these aggregative neoclassical models has been shown to be inferior when compared with conventional time series models that assign no role to wages. This article demonstrates that, when neoclassical models of fertility are estimated on microdata using methods that incorporate key demographic restrictions and when they are properly aggregated, they have considerable forecasting power." Data are from the 1981 Swedish Fertility Survey.  相似文献   

6.
Partial specification of a prior distribution can be appealing to an analyst, but there is no conventional way to update a partial prior. In this paper, we show how a framework for Bayesian updating with data can be based on the Dirichlet(a) process. Within this framework, partial information predictors generalize standard minimax predictors and have interesting multiple-point shrinkage properties. Approximations to partial-information estimators for squared error loss are defined straightforwardly, and an estimate of the mean shrinks the sample mean. The proposed updating of the partial prior is a consequence of four natural requirements when the Dirichlet parameter a is continuous. Namely, the updated partial posterior should be calculable from knowledge of only the data and partial prior, it should be faithful to the full posterior distribution, it should assign positive probability to every observed event {X,}, and it should not assign probability to unobserved events not included in the partial prior specification.  相似文献   

7.
Summary.  An authentic food is one that is what it purports to be. Food processors and consumers need to be assured that, when they pay for a specific product or ingredient, they are receiving exactly what they pay for. Classification methods are an important tool in food authenticity studies where they are used to assign food samples of unknown type to known types. A classification method is developed where the classification rule is estimated by using both the labelled and the unlabelled data, in contrast with many classical methods which use only the labelled data for estimation. This methodology models the data as arising from a Gaussian mixture model with parsimonious covariance structure, as is done in model-based clustering. A missing data formulation of the mixture model is used and the models are fitted by using the EM and classification EM algorithms. The methods are applied to the analysis of spectra of food-stuffs recorded over the visible and near infra-red wavelength range in food authenticity studies. A comparison of the performance of model-based discriminant analysis and the method of classification proposed is given. The classification method proposed is shown to yield very good misclassification rates. The correct classification rate was observed to be as much as 15% higher than the correct classification rate for model-based discriminant analysis.  相似文献   

8.
In a previous paper, it was demonstrated that distinctly different prediction methods when applied to 2435 American college and professional football games resulted in essentially the same fraction of correct selections of the winning team and essentially the same average absolute error for predicting the margin of victory. These results are now extended to 1446 Australian rules football games. Two distinctly different prediction methods are applied. A least-squares method provides a set of ratings. The predicted margin of victory in the next contest is less than the rating difference, corrected for home-ground advantage, while a 0.75 power method shrinks the ratings compared with those found by the least-squares technique and then performs predictions based on the rating difference and home-ground advantage. Both methods operate upon past margins of victory corrected for home advantage to obtain the ratings. It is shown that both methods perform similarly, based on the fraction of correct selections of the winning team and the average absolute error for predicting the margin of victory. That is, differing predictors using the same information tend to converge to a limiting level of accuracy. The least-squares approach also provides estimates of the accuracy of each prediction. The home advantage is evaluated for all teams collectively and also for individual teams. The data permit comparisons with other sports in other countries. The home team appears to have an advantage (the visiting team has a disadvantage) due to three factors:the visiting team suffers from travel fatigue; crowd intimidation by the home team fans; lack of familiarity with the playing conditions  相似文献   

9.
Summary. The peer review of grant proposals is very important to academics from all disciplines. Although there is limited research on the reliability of assessments for grant proposals, previously reported single-rater reliabilities have been disappointingly low (between 0.17 and 0.37). We found that the single-rater reliability of the overall assessor rating for Australian Research Council grants was 0.21 for social science and humanities (2870 ratings, 1928 assessors and 687 proposals) and 0.19 for science (7153 ratings, 4295 assessors and 1644 proposals). We used a multilevel, cross-classification approach (level 1, assessor and proposal cross-classification; level 2, field of study), taking into account that 34% of the assessors evaluated more than one proposal. Researcher-nominated assessors (those chosen by the authors of the research proposal) gave higher ratings than panel-nominated assessors chosen by the Australian Research Council, and proposals from more prestigious universities received higher ratings. In the social sciences and humanities, the status of Australian universities had significantly more effect on Australian assessors than on overseas assessors. In science, ratings were higher when assessors rated fewer proposals and apparently had a more limited frame of reference for making such ratings and when researchers were professors rather than non-professors. Particularly, the methodology of this large scale study is applicable to other forms of peer review (publications, job interviews, awarding of prizes and election to prestigious societies) where peer review is employed as a selection process.  相似文献   

10.
We consider fitting the so‐called Emax model to continuous response data from clinical trials designed to investigate the dose–response relationship for an experimental compound. When there is insufficient information in the data to estimate all of the parameters because of the high dose asymptote being ill defined, maximum likelihood estimation fails to converge. We explore the use of either bootstrap resampling or the profile likelihood to make inferences about effects and doses required to give a particular effect, using limits on the parameter values to obtain the value of the maximum likelihood when the high dose asymptote is ill defined. The results obtained show these approaches to be comparable with or better than some others that have been used when maximum likelihood estimation fails to converge and that the profile likelihood method outperforms the method of bootstrap resampling used. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

11.
In many clinical studies more than one observer may be rating a characteristic measured on an ordinal scale. For example, a study may involve a group of physicians rating a feature seen on a pathology specimen or a computer tomography scan. In clinical studies of this kind, the weighted κ coefficient is a popular measure of agreement for ordinally scaled ratings. Our research stems from a study in which the severity of inflammatory skin disease was rated. The investigators wished to determine and evaluate the strength of agreement between a variable number of observers taking into account patient-specific (age and gender) as well as rater-specific (whether board certified in dermatology) characteristics. This suggested modelling κ as a function of these covariates. We propose the use of generalized estimating equations to estimate the weighted κ coefficient. This approach also accommodates unbalanced data which arise when some subjects are not judged by the same set of observers. Currently an estimate of overall κ for a simple unbalanced data set without covariates involving more than two observers is unavailable. In the inflammatory skin disease study none of the covariates were significantly associated with κ, thus enabling the calculation of an overall weighted κ for this unbalanced data set. In the second motivating example (multiple sclerosis), geographic location was significantly associated with κ. In addition we also compared the results of our method with current methods of testing for heterogeneity of weighted κ coefficients across strata (geographic location) that are available for balanced data sets.  相似文献   

12.
A method is presented for the sequential analysis of experiments involving two treatments to which response is dichotomous. Composite hypotheses about the difference in success probabilities are tested, and covariate information is utilized in the analysis. The method is based upon a generalization of Bartlett’s (1946) procedure for using the maximum likelihood estimate of a nuisance parameter in a Sequential Probability Ratio Test (SPRT). Treatment assignment rules studied include pure randomization, randomized blocks, and an adaptive rule which tends to assign the superior treatment to the majority of subjects. It is shown that the use of covariate information can result in important reductions in the expected sample size for specified error probabilities, and that the use of covariate information is essential for the elimination of bias when adaptive assignment rules are employed. Designs of the type presented are easily generated, as the termination criterion is the same as for a Wald SPRT of simple hypotheses.  相似文献   

13.
We study bias arising from rounding categorical variables following multivariate normal (MVN) imputation. This task has been well studied for binary variables, but not for more general categorical variables. Three methods that assign imputed values to categories based on fixed reference points are compared using 25 specific scenarios covering variables with k=3, …, 7 categories, and five distributional shapes, and for each k=3, …, 7, we examine the distribution of bias arising over 100,000 distributions drawn from a symmetric Dirichlet distribution. We observed, on both empirical and theoretical grounds, that one method (projected-distance-based rounding) is superior to the other two methods, and that the risk of invalid inference with the best method may be too high at sample sizes n≥150 at 50% missingness, n≥250 at 30% missingness and n≥1500 at 10% missingness. Therefore, these methods are generally unsatisfactory for rounding categorical variables (with up to seven categories) following MVN imputation.  相似文献   

14.
I consider the design of multistage sampling schemes for epidemiologic studies involving latent variable models, with surrogate measurements of the latent variables on a subset of subjects. Such models arise in various situations: when detailed exposure measurements are combined with variables that can be used to assign exposures to unmeasured subjects; when biomarkers are obtained to assess an unobserved pathophysiologic process; or when additional information is to be obtained on confounding or modifying variables. In such situations, it may be possible to stratify the subsample on data available for all subjects in the main study, such as outcomes, exposure predictors, or geographic locations. Three circumstances where analytic calculations of the optimal design are possible are considered: (i) when all variables are binary; (ii) when all are normally distributed; and (iii) when the latent variable and its measurement are normally distributed, but the outcome is binary. In each of these cases, it is often possible to considerably improve the cost efficiency of the design by appropriate selection of the sampling fractions. More complex situations arise when the data are spatially distributed: the spatial correlation can be exploited to improve exposure assignment for unmeasured locations using available measurements on neighboring locations; some approaches for informative selection of the measurement sample using location and/or exposure predictor data are considered.  相似文献   

15.
Instead of using traditional separate phase I and II trials, in this article, we propose using a parallel three-stage phase I/II design, incorporating a dose expansion approach to flexibly evaluate the safety and efficacy of dose levels, and to select the optimal dose. In the proposed design, both the toxicity and efficacy responses are binary endpoints. A 3+3-based procedure is used for initial period of dose escalation at stage 1; at this level, the dose can be expanded to stage 2 for exploratory efficacy studies of phase IIa, while simultaneously, the safety testing can advance to a higher dose level. A beta-binomial model is used to model the efficacy responses. There are two placebo-controlled randomization interim monitoring analyses at stage 2 to select the promising doses to be recommended to stage 3 for further efficacy studies of phase IIb. An adaptive randomization approach is used to assign more patients to doses with higher efficacy levels at stage 3. We examine the properties of the proposed design through extensive simulation studies by using R programming language, and also compare the new design with the conventional design and a competing adaptive Bayesian design. The simulation results show that our design can efficiently assign more patients to doses with higher efficacy levels and is superior to the two competing designs in terms of total sample size reduction.  相似文献   

16.
We present a methodology for rating in real-time the creditworthiness of public companies in the U.S. from the prices of traded assets. Our approach uses asset pricing data to impute a term structure of risk neutral survival functions or default probabilities. Firms are then clustered into ratings categories based on their survival functions using a functional clustering algorithm. This allows all public firms whose assets are traded to be directly rated by market participants. For firms whose assets are not traded, we show how they can be indirectly rated by matching them to firms that are traded based on observable characteristics. We also show how the resulting ratings can be used to construct loss distributions for portfolios of bonds. Finally, we compare our ratings to Standard & Poors and find that, over the period 2005 to 2011, our ratings lead theirs for firms that ultimately default.  相似文献   

17.
In constructing a scorecard, we partition each characteristic variable into a few attributes and assign weights to those attributes. For the task, a simulated annealing algorithm has been proposed. A drawback of simulated annealing is that the number of cutpoints separating each characteristic variable into attributes is required as an input. We introduce a scoring method, called a classification spline machine (CSM), which determines cutpoints automatically via a stepwise basis selection. In this paper, we compare performances of CSM and simulated annealing on simulated datasets. The results indicate that the CSM can be useful in the construction of scorecards.  相似文献   

18.
ABSTRACT

In this study, Monte Carlo simulation experiments were employed to examine the performance of four statistical two-group classification methods when the data distributions are skewed and misclassification costs are unequal, conditions frequently encountered in business and economic applications. The classification methods studied are linear and quadratic parametric, nearest neighbor and logistic regression methods. It was found that when skewness is moderate, the parametric methods tend to give best results. Depending on the specific data condition, when skewness is high, either the linear parametric, logistic regression, or the nearest-neighbor method gives the best results. When misclassification costs differ widely across groups, the linear parametric method is favored over the other methods for many of the data conditions studied.  相似文献   

19.
The traditional and readily available multivariate analysis of variance (MANOVA) tests such as Wilks' Lambda and the Pillai–Bartlett trace start to suffer from low power as the number of variables approaches the sample size. Moreover, when the number of variables exceeds the number of available observations, these statistics are not available for use. Ridge regularisation of the covariance matrix has been proposed to allow the use of MANOVA in high‐dimensional situations and to increase its power when the sample size approaches the number of variables. In this paper two forms of ridge regression are compared to each other and to a novel approach based on lasso regularisation, as well as to more traditional approaches based on principal components and the Moore‐Penrose generalised inverse. The performance of the different methods is explored via an extensive simulation study. All the regularised methods perform well; the best method varies across the different scenarios, with margins of victory being relatively modest. We examine a data set of soil compaction profiles at various positions relative to a ridgetop, and illustrate how our results can be used to inform the selection of a regularisation method.  相似文献   

20.
A system for calculating relative playing strengths of tiddlywinks players is described. The method can also be used for other sports. It is specifically designed to handle cases where the number of games played in a season varies greatly between players, and thus the confidence that one can have in an assigned rating also varies greatly between players. In addition, the method is designed to handle situations in which some games in the tournament are played as individuals ("singles'), while others are played with a partner ("pairs'). These factors make application of some statistical treatments, such as the Elo rating system used in chess, difficult to apply. The new method characterizes each player's ability by a numerical rating together with an associated uncertainty in that player's rating. After each tournament, a "tournament rating' is calculated for each player based on how many points the player achieved and the relative strength of partner(s) and opponent(s). Statistical analysis is then used to estimate the likely error in the calculated tournament rating. Both the tournament rating and its estimated error are used in the calculation of new ratings. The method has been applied to calculate tiddlywinks world ratings based on over 13 r 000 national tournament games in Britain and the USA going back to 1985.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号