首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
《统计学通讯:理论与方法》2012,41(16-17):2944-2958
The focus of this article is on the choice of suitable prior distributions for item parameters within item response theory (IRT) models. In particular, the use of empirical prior distributions for item parameters is proposed. Firstly, regression trees are implemented in order to build informative empirical prior distributions. Secondly, model estimation is conducted within a fully Bayesian approach through the Gibbs sampler, which makes estimation feasible also with increasingly complex models. The main results show that item parameter recovery is improved with the introduction of empirical prior information about item parameters, also when only a small sample is available.  相似文献   

2.
The maximum likelihood (MLE), the weighted maximum likelihood (WMLE), and the maximum a posteriori (MAP or BMLE) have been widely used to estimate ability parameters in item response theory (IRT), and their precisions and biases have been studied and compared. Multidimensional IRT (MIRT) has been shown to provide better subscore estimates in both paper-and-pencil and computer adaptive tests; thus, it is very important to have an accurate score estimate for the MIRT model. The purpose of this article is to compare the performances of the three estimation methods in the MIRT framework for tests of mixed item types that have both dichotomous and polytomously scored items, and for tests of mixed structured items (simple structured and complex structured). It is found that all three methods perform well for all conditions. For all models studied (one-, two-, three-, and four- dimensional model), WMLE has smaller BIAS and higher reliabilities, but larger RMSE and SE. WMLE and MLE are closer to each other than to BMLE. However, for higher dimensions, BMLE is recommended, especially when there are correlations between the dimensions.  相似文献   

3.
Item response theory (IRT) models are commonly used in educational and psychological testing to assess the (latent) ability of examinees and the effectiveness of the test items in measuring this underlying trait. The focus of this paper is on the assessment of item fit for unidimensional IRT models for dichotomous items using a Bayesian method. This paper will illustrate and compare the effectiveness of several discrepancy measures, used within the posterior predictive model check procedure, in detecting misfitted items. The effectiveness of the different discrepancy measures are illustrated in a simulation study using artificially altered simulated data. Using the best discrepancy measure among those studied, this method was applied to real data coming from a mathematics placement exam.  相似文献   

4.
ABSTRACT

This research examines the statistical methodology that is used to estimate the parameters in item response models. An integral part of an item response model is the normalization rule that is used to identify the distributional parameters. The main result shown here is that only Verhelst–Glas normalizations that arbitrarily set one difficulty and one dispersion parameter to unity are consistent with the basic assumptions underlying the two-parameter logistic model. Failure to employ this type of normalization will lead to scores that depend on the item composition of the test and differential item difficulty (DIF) will compromise the validity of the estimated ability scores when different groups are being compared. It is also shown that some of the tests for DIF fail when the data are generated by an IRT model with a random effect. Most of the results are based on simulations of a four item model. Because the data generation mechanism is known, it is possible to determine the effect on ability scores and parameter estimates when different normalizations or different distribution parameter values are used.  相似文献   

5.
Markov chain Monte Carlo (MCMC) algorithms have been shown to be useful for estimation of complex item response theory (IRT) models. Although an MCMC algorithm can be very useful, it also requires care in use and interpretation of results. In particular, MCMC algorithms generally make extensive use of priors on model parameters. In this paper, MCMC estimation is illustrated using a simple mixture IRT model, a mixture Rasch model (MRM), to demonstrate how the algorithm operates and how results may be affected by some commonly used priors. Priors on the probabilities of mixtures, label switching, model selection, metric anchoring, and implementation of the MCMC algorithm using WinBUGS are described, and their effects illustrated on parameter recovery in practical testing situations. In addition, an example is presented in which an MRM is fitted to a set of educational test data using the MCMC algorithm and a comparison is illustrated with results from three existing maximum likelihood estimation methods.  相似文献   

6.
We propose Bayesian parameter estimation in a multidimensional item response theory model using the Gibbs sampling algorithm. We apply this approach to dichotomous responses to a questionnaire on sleep quality. The analysis helps determine the underlying dimensions.  相似文献   

7.
空间回归模型由于引入了空间地理信息而使得其参数估计变得复杂,因为主要采用最大似然法,致使一般人认为在空间回归模型参数估计中不存在最小二乘法。通过分析空间回归模型的参数估计技术,研究发现,最小二乘法和最大似然法分别用于估计空间回归模型的不同的参数,只有将两者结合起来才能快速有效地完成全部的参数估计。数理论证结果表明,空间回归模型参数最小二乘估计量是最佳线性无偏估计量。空间回归模型的回归参数可以在估计量为正态性的条件下而实施显著性检验,而空间效应参数则不可以用此方法进行检验。  相似文献   

8.
In this paper, multidimensional item response theory models for dichotomous data, developed in the fields of psychometrics and ability assessment, are discussed in connection with the problem of evaluating customer satisfaction. These models allow us to take into account latent constructs at various degrees of complexity and provide interesting new perspectives for services quality assessment. Markov chain Monte Carlo techniques are considered for estimation. An application to a real data set is also presented.  相似文献   

9.
This article proposes a test to determine whether “big data” nowcasting methods, which have become an important tool to many public and private institutions, are monotonically improving as new information becomes available. The test is the first to formalize existing evaluation procedures from the nowcasting literature. We place particular emphasis on models involving estimated factors, since factor-based methods are a leading case in the high-dimensional empirical nowcasting literature, although our test is still applicable to small-dimensional set-ups like bridge equations and MIDAS models. Our approach extends a recent methodology for testing many moment inequalities to the case of nowcast monotonicity testing, which allows the number of inequalities to grow with the sample size. We provide results showing the conditions under which both parameter estimation error and factor estimation error can be accommodated in this high-dimensional setting when using the pseudo out-of-sample approach. The finite sample performance of our test is illustrated using a wide range of Monte Carlo simulations, and we conclude with an empirical application of nowcasting U.S. real gross domestic product (GDP) growth and five GDP sub-components. Our test results confirm monotonicity for all but one sub-component (government spending), suggesting that the factor-augmented model may be misspecified for this GDP constituent. Supplementary materials for this article are available online.  相似文献   

10.
Very often, in psychometric research, as in educational assessment, it is necessary to analyze item response from clustered respondents. The multiple group item response theory (IRT) model proposed by Bock and Zimowski [12] provides a useful framework for analyzing such type of data. In this model, the selected groups of respondents are of specific interest such that group-specific population distributions need to be defined. The usual assumption for parameter estimation in this model, which is that the latent traits are random variables following different symmetric normal distributions, has been questioned in many works found in the IRT literature. Furthermore, when this assumption does not hold, misleading inference can result. In this paper, we consider that the latent traits for each group follow different skew-normal distributions, under the centered parameterization. We named it skew multiple group IRT model. This modeling extends the works of Azevedo et al. [4], Bazán et al. [11] and Bock and Zimowski [12] (concerning the latent trait distribution). Our approach ensures that the model is identifiable. We propose and compare, concerning convergence issues, two Monte Carlo Markov Chain (MCMC) algorithms for parameter estimation. A simulation study was performed in order to evaluate parameter recovery for the proposed model and the selected algorithm concerning convergence issues. Results reveal that the proposed algorithm recovers properly all model parameters. Furthermore, we analyzed a real data set which presents asymmetry concerning the latent traits distribution. The results obtained by using our approach confirmed the presence of negative asymmetry for some latent trait distributions.  相似文献   

11.
We discuss the use of latent variable models with observed covariates for computing response propensities for sample respondents. A response propensity score is often used to weight item and unit responders to account for item and unit non-response and to obtain adjusted means and proportions. In the context of attitude scaling, we discuss computing response propensity scores by using latent variable models for binary or nominal polytomous manifest items with covariates. Our models allow the response propensity scores to be found for several different items without refitting. They allow any pattern of missing responses for the items. If one prefers, it is possible to estimate population proportions directly from the latent variable models, so avoiding the use of propensity scores. Artificial data sets and a real data set extracted from the 1996 British Social Attitudes Survey are used to compare the various methods proposed.  相似文献   

12.
The paper deals with the introduction of empirical prior information in the estimation of candidate’s ability within computerized adaptive testing (CAT). CAT is generally applied to improve efficiency of test administration. In this paper, it is shown how the inclusion of background variables both in the initialization and the ability estimation is able to improve the accuracy of ability estimates. In particular, a Gibbs sampler scheme is proposed in the phases of interim and final ability estimation. By using both simulated and real data, it is proved that the method produces more accurate ability estimates, especially for short tests and when reproducing boundary abilities. This implies that operational problems of CAT related to weak measurement precision under particular conditions, can be reduced as well. In the empirical examples, the methods were applied to CAT for intelligence testing in the area of personnel selection and to educational measurement. Other promising applications would be in the medical world, where testing efficiency is of paramount importance as well.  相似文献   

13.
在一般题组反应模型基础上,通过构造题组判别参数,提出了带有题组判别参数的两参数正态卵形题组反应模型。在贝叶斯理论框架下,基于数据扩充技术,使用Gibbs抽样方法,研究了该模型参数估计问题以及该模型的适应性问题,模拟研究表明,在参数返真方面,新提出的模型在一定程度上优于原有相应的项目反应模型和题组反应模型。  相似文献   

14.
To collect sensitive data, survey statisticians have designed many strategies to reduce nonresponse rates and social desirability response bias. In recent years, the item count technique has gained considerable popularity and credibility as an alternative mode of indirect questioning survey, and several variants of this technique have been proposed as new needs and challenges arise. The item sum technique (IST), which was introduced by Chaudhuri and Christofides (Indirect questioning in sample surveys, Springer-Verlag, Berlin, 2013) and Trappmann et al. (J Surv Stat Methodol 2:58–77, 2014), is one such variant, used to estimate the mean of a sensitive quantitative variable. In this approach, sampled units are asked to respond to a two-list of items containing a sensitive question related to the study variable and various innocuous, nonsensitive, questions. To the best of our knowledge, very few theoretical and applied papers have addressed the IST. In this article, therefore, we present certain methodological advances as a contribution to appraising the use of the IST in real-world surveys. In particular, we employ a generic sampling design to examine the problem of how to improve the estimates of the sensitive mean when auxiliary information on the population under study is available and is used at the design and estimation stages. A Horvitz–Thompson-type estimator and a calibration-type estimator are proposed and their efficiency is evaluated by means of an extensive simulation study. Using simulation experiments, we show that estimates obtained by the IST are nearly equivalent to those obtained using “true data” and that in general they outperform the estimates provided by a competitive randomized response method. Moreover, variance estimation may be considered satisfactory. These results open up new perspectives for academics, researchers and survey practitioners and could justify the use of the IST as a valid alternative to traditional direct questioning survey modes.  相似文献   

15.
ABSTRACT

Online consumer product ratings data are increasing rapidly. While most of the current graphical displays mainly represent the average ratings, Ho and Quinn proposed an easily interpretable graphical display based on an ordinal item response theory (IRT) model, which successfully accounts for systematic interrater differences. Conventionally, the discrimination parameters in IRT models are constrained to be positive, particularly in the modeling of scored data from educational tests. In this article, we use real-world ratings data to demonstrate that such a constraint can have a great impact on the parameter estimation. This impact on estimation was explained through rater behavior. We also discuss correlation among raters and assess the prediction accuracy for both the constrained and the unconstrained models. The results show that the unconstrained model performs better when a larger fraction of rater pairs exhibit negative correlations in ratings.  相似文献   

16.
In an expert knowledge elicitation exercise, experts face a carefully constructed list of questions that they answer according to their knowledge. The elicitation process concludes when a probability distribution is found that adequately captures the experts' beliefs in the light of those answers. In many situations, it is very difficult to create a set of questions that will efficiently capture the experts' knowledge, since experts might not be able to make precise probabilistic statements about the parameter of interest. We present an approach for capturing expert knowledge based on item response theory, in which a set of binary response questions is proposed to the expert, trying to capture responses directly related to the quantity of interest. As a result, the posterior distribution of the parameter of interest will represent the elicited prior distribution that does not assume any particular parametric form. The method is illustrated by a simulated example and by an application involving the elicitation of rain prophets' predictions for the rainy season in the north-east of Brazil.  相似文献   

17.
Estimators of the intercept parameter of a simple linear regression model involves the slope estimator. In this article, we consider the estimation of the intercept parameters of two linear regression models with normal errors, when it is a priori suspected that the two regression lines are parallel, but in doubt. We also introduce a coefficient of distrust as a measure of degree of lack of trust on the uncertain prior information regarding the equality of two slopes. Three different estimators of the intercept parameters are defined by using the sample data, the non sample uncertain prior information, an appropriate test statistic, and the coefficient of distrust. The relative performances of the unrestricted, shrinkage restricted and shrinkage preliminary test estimators are investigated based on the analyses of the bias and risk functions under quadratic loss. If the prior information is precise and the coefficient of distrust is small, the shrinkage preliminary test estimator overperforms the other estimators. An example based on a medical study is used to illustrate the method.  相似文献   

18.
Our main interest is parameter estimation using maximum entropy methods in the prediction of future events for Homogeneous Poisson Processes when the distribution governing the distribution of the parameters is unknown. We intend to use empirical Bayes techniques and the maximum entropy principle to model the prior information. This approach has also been motivated by the success of the gamma prior for this problem, since it is well known that the gamma maximizes Shannon entropy under appropriately chosen constraints. However, as an alternative, we propose here to apply one of the often used methods to estimate the parameters of the maximum entropy prior. It consists of moment matching, that is, maximizing the entropy subject to the constraint that the first two moments equal the empirical ones and we obtain the truncated normal distribution (truncated below at the origin) as a solution. We also use maximum likelihood estimation (MLE) methods to estimate the parameters of the truncated normal distribution for this case. These two solutions, the gamma and the truncated normal, which maximize the entropy under different constraints are tested as to their effectiveness for prediction of future events for homogeneous Poisson processes by measuring their coverage probabilities, the suitably normalized lengths of their prediction intervals and their goodness-of-fit measured by the Kullback–Leibler criterion and a discrepancy measure. The estimators obtained by these methods are compared in an extensive simulation study to each other as well as to the estimators obtained using the completely noninformative Jeffreys’ prior and the usual frequency methods. We also consider the problem of choosing between the two maximum entropy methods proposed here, that is, the gamma prior and the truncated normal prior, estimated both by matching of the first two moments and, by maximum likelihood, when faced with data and we advocate the use of the sample skewness and kurtosis. The methods are also illustrated on two examples: one concerning the occurrence of mammary tumors in laboratory animals taking part in a carcinogenicity experiment and the other, a warranty dataset from the automobile industry.  相似文献   

19.
Differences in type I error and power rates for majority and minority groups are investigated when differential item functioning (DIF) contamination in a test is unbalanced. Typically, type I error and power rates are aggregated across groups, however cumulative results can be misleading if subgroups are affected differently by study conditions. With unbalanced DIF contamination, type I error and power rates are reduced for groups with more DIF items favoring them, and increased for groups with less DIF contamination. Even when aggregated impacts appear small, differing subgroup impacts can result in a larger proportional bias than in the original data.  相似文献   

20.
The aim of the article is to propose a Bayesian estimation through Markov chain Monte Carlo of a multidimensional item response theory model for graded responses with an additive structure with correlated latent traits. A simulation study is conducted to evaluate the model parameter recovery under different conditions (sample size, test and subtest length, number of response categories, and correlation structure). The results show that the parameters are well reproduced when the sample size is sufficiently large (n = 1, 000), while the worst recovery is observed for small sample size (n = 500), and four response categories with a short number of test items.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号