首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 250 毫秒
1.
ABSTRACT

In modern test theory, differential item functioning (DIF) appears where respondents from two different groups have the same ability but different probability to respond to an item correctly. If some items favouring one group lead to the appearance of DIF in any other item favouring the other group, this type of problem is called artificial differential item functioning (A-DIF). The purpose of this paper is to deal with the effect of different factors causing A-DIF under the Rasch theoretical model for dichotomous responses. A simulation study was conducted to explore how various factors affect real DIF and simultaneous A-DIF expected proportion including total sample size, percentage of individuals in focal and reference groups, percentage of items exhibiting real DIF and DIF magnitude for two item sets including 10 and 20 items. It is concluded that DIF magnitude is the most essential factor while evaluating A-DIF for each item set. This is followed by percent of items exhibiting real-DIF.  相似文献   

2.
ABSTRACT

This research examines the statistical methodology that is used to estimate the parameters in item response models. An integral part of an item response model is the normalization rule that is used to identify the distributional parameters. The main result shown here is that only Verhelst–Glas normalizations that arbitrarily set one difficulty and one dispersion parameter to unity are consistent with the basic assumptions underlying the two-parameter logistic model. Failure to employ this type of normalization will lead to scores that depend on the item composition of the test and differential item difficulty (DIF) will compromise the validity of the estimated ability scores when different groups are being compared. It is also shown that some of the tests for DIF fail when the data are generated by an IRT model with a random effect. Most of the results are based on simulations of a four item model. Because the data generation mechanism is known, it is possible to determine the effect on ability scores and parameter estimates when different normalizations or different distribution parameter values are used.  相似文献   

3.
Differences in type I error and power rates for majority and minority groups are investigated when differential item functioning (DIF) contamination in a test is unbalanced. Typically, type I error and power rates are aggregated across groups, however cumulative results can be misleading if subgroups are affected differently by study conditions. With unbalanced DIF contamination, type I error and power rates are reduced for groups with more DIF items favoring them, and increased for groups with less DIF contamination. Even when aggregated impacts appear small, differing subgroup impacts can result in a larger proportional bias than in the original data.  相似文献   

4.
Abstract

Experiments in various countries with “last week” and “last month” reference periods for reporting of households’ food consumption have generally found that “week”-based estimates are higher. In India the National Sample Survey (NSS) has consistently found that “week”-based estimates are higher than month-based estimates for a majority of food item groups. But why are week-based estimates higher than month-based estimates? It has long been believed that the reason must be recall lapse, inherent in a long reporting period such as a month. But is household consumption of a habitually consumed item “recalled” in the same way as that of an item of infrequent consumption? And why doesn’t memory lapse cause over-reporting (over-assessment) as often as under-reporting? In this paper, we provide an alternative hypothesis, involving a “quantity floor effect” in reporting behavior, under which “week” may cause over-reporting for many items. We design a test to detect the effect postulated by this hypothesis and carry it out on NSS 68th round HCES data. The test results strongly suggest that our hypothesis provides a better explanation of the difference between week-based and month-based estimates than the recall lapse theory.  相似文献   

5.

The RESET test for functional misspecification is generalised to cover systems of equations, and the properties of 7 versions are studied using Monte Carlo methods. The Rao F -test clearly exhibits the best performance as regards correct size, whilst the commonly used LRT (uncorrected for degrees-of-freedom), and LM and Wald tests (both corrected and uncorrected) behave badly even in single equations. The Rao test exhibits correct size even in ten equation systems, which is better than previous research concerning autocorrelation tests. The power of the test is low, however, when the number of equations grows and the correlation between the omitted variables and the RESET proxies is small.  相似文献   

6.
Abstract

In this article, we developed a model for a convertible item (or product) where initial form of the item converts into another product by consuming conversion cost and time both. After duration, it converts again into a new product of a different nature. It is a sequential-type conversion from initial into two other products over states. The demand pattern and deterioration rate differ at each converted state. An inventory model is developed for such a kind of sequential convertible item. Expressions for total cost and other related costs (as per states) are derived and optimal time to convert the product in different states are calculated under model assumptions. A numerical example is incorporated in support of the theoretical findings and it validates the strength of the model.  相似文献   

7.
ABSTRACT

Background: Instrumental variables (IVs) have become much easier to find in the “Big data era” which has increased the number of applications of the Two-Stage Least Squares model (TSLS). With the increased availability of IVs, the possibility that these IVs are weak has increased. Prior work has suggested a ‘rule of thumb’ that IVs with a first stage F statistic at least ten will avoid a relative bias in point estimates greater than 10%. We investigated whether or not this threshold was also an efficient guarantee of low false rejection rates of the null hypothesis test in TSLS applications with many IVs.

Objective: To test how the ‘rule of thumb’ for weak instruments performs in predicting low false rejection rates in the TSLS model when the number of IVs is large.

Method: We used a Monte Carlo approach to create 28 original data sets for different models with the number of IVs varying from 3 to 30. For each model, we generated 2000 observations for each iteration and conducted 50,000 iterations to reach convergence in rejection rates. The point estimate was set to 0, and probabilities of rejecting this hypothesis were recorded for each model as a measurement of false rejection rate. The relationship between the endogenous variable and IVs was carefully adjusted to let the F statistics for the first stage model equal ten, thus simulating the ‘rule of thumb.’

Results: We found that the false rejection rates (type I errors) increased when the number of IVs in the TSLS model increased while holding the F statistics for the first stage model equal to 10. The false rejection rate exceeds 10% when TLSL has 24 IVs and exceed 15% when TLSL has 30 IVs.

Conclusion: When more instrumental variables were applied in the model, the ‘rule of thumb’ was no longer an efficient guarantee for good performance in hypothesis testing. A more restricted margin for F statistics is recommended to replace the ‘rule of thumb,’ especially when the number of instrumental variables is large.  相似文献   

8.
ABSTRACT

The procedure for online control by attribute consists of inspecting a single item at every m items produced (m ≥ 2). On each inspection, it is determined whether the fraction of the produced conforming items decreased. If the inspected item is classified as non conforming, the productive process is adjusted so that the conforming fraction returns to its original status. A generalization observed in the literature is to consider inspection errors and vary the inspection interval. This study presents an extension of this model by considering that the inspected item can be rated independently r (r ≥ 1) times. The process is adjusted every time the number of conforming classifications is less than a, 1 ≤ a ≤ r. This method uses the properties of an ergodic Markov chain to obtain the expression for the average cost of this control system. The genetic algorithm methodology is used to search for the optimal parameters that minimize the expected cost. The procedure is illustrated by a numerical example.  相似文献   

9.

The problem of comparing several samples to decide whether the means and/or variances are significantly different is considered. It is shown that with very non-normal distributions even a very robust test to compare the means has poor properties when the distributions have different variances, and therefore a new testing scheme is proposed. This starts by using an exact randomization test for any significant difference (in means or variances) between the samples. If a non-significant result is obtained then testing stops. Otherwise, an approximate randomization test for mean differences (but allowing for variance differences) is carried out, together with a bootstrap procedure to assess whether this test is reliable. A randomization version of Levene's test is also carried out for differences in variation between samples. The five possible conclusions are then that (i) there is no evidence of any differences, (ii) evidence for mean differences only, (iii) evidence for variance differences only, (iv) evidence for mean and variance differences, or (v) evidence for some indeterminate differences. A simulation experiment to assess the properties of the proposed scheme is described. From this it is concluded that the scheme is useful as a robust, conservative method for comparing samples in cases where they may be from very non-normal distributions.  相似文献   

10.
The aim of the article is to propose a Bayesian estimation through Markov chain Monte Carlo of a multidimensional item response theory model for graded responses with an additive structure with correlated latent traits. A simulation study is conducted to evaluate the model parameter recovery under different conditions (sample size, test and subtest length, number of response categories, and correlation structure). The results show that the parameters are well reproduced when the sample size is sufficiently large (n = 1, 000), while the worst recovery is observed for small sample size (n = 500), and four response categories with a short number of test items.  相似文献   

11.

In this article we examine the effect that logarithmic and power transformations have on the order of integration in raw time series. For this purpose, we use a version of the tests of Robinson (1994) that permits us to test I ( d ) statistical models. The results, obtained via Monte Carlo, show that there is no effect in the degree of dependence of the series when this type of transformations are employed, resulting thus in useful mechanisms to be applied when a more plausible economic interpretation of the data is required.  相似文献   

12.
Using randomly censored data, we develop a test of the null hypothesis that a new item has stochastically the same residual lifelength as does a used item of specified age t0, versus the alternative hypothesis that a new item has stochastically greater residual lifelength than does a used item of age t0. We also compare our test with a related test, developed for a complete-data model, in order to study the loss in efficiency because of censoring.  相似文献   

13.
ABSTRACT

The problem of estimation of the regression coefficients in a multiple regression model is considered under a multicollinearity situation when it is suspected that the regression coefficients may be restricted to a subspace. The objective of this paper is to compare the usual preliminary test estimator and the preliminary test ridge regression estimator in the sense of the dispersion matrix of one dominating that of the other. In particular we proved two results giving necessary and sufficient conditions for the superiority of the preliminary test ridge regression estimator over the preliminary test estimator associated with the δ = 0 (or Δ = 0) and δ ≠ 0 (or Δ ≠ 0).  相似文献   

14.

Sign test using median ranked set samples (MRSS) is introduced and investigated. We show that, this test is more powerful than the sign tests based on simple random sample (SRS) and ranked set sample (RSS) for finite sample size. It is found that, when the set size of MRSS is odd, the null distribution of the MRSS sign test is the same as the sign test obtained by using SRS. The exact null distributions and the power functions, in case of finite sample sizes, of these tests are derived. Also, the asymptotic distribution of the MRSS sign tests are derived. Numerical comparison of the MRSS sign test power with the power of the SRS sign test and the RSS sign test is given. Illustration of the procedure, using real data set of bilirubin level in Jaundice babies who stay in neonatal intensive care is introduced.  相似文献   

15.
ABSTRACT

Online consumer product ratings data are increasing rapidly. While most of the current graphical displays mainly represent the average ratings, Ho and Quinn proposed an easily interpretable graphical display based on an ordinal item response theory (IRT) model, which successfully accounts for systematic interrater differences. Conventionally, the discrimination parameters in IRT models are constrained to be positive, particularly in the modeling of scored data from educational tests. In this article, we use real-world ratings data to demonstrate that such a constraint can have a great impact on the parameter estimation. This impact on estimation was explained through rater behavior. We also discuss correlation among raters and assess the prediction accuracy for both the constrained and the unconstrained models. The results show that the unconstrained model performs better when a larger fraction of rater pairs exhibit negative correlations in ratings.  相似文献   

16.
The approach to preliminary test estimation based on comparing the weighted quadratic risk function of two competing estimators of β under the linear regression model {y,Xβ, σ2 I} is extended to the case when a given vector of parametric functions κ=Kβ is to be estimated under the general Gauss-Markov model.  相似文献   

17.

Suppose that an order restriction is imposed among several p-variate normal mean vectors. We are interested in the problems of estimating these mean vectors and testing their homogeneity under this restriction. These problems are multivariate extensions of Bartholomew's (1959) ones. For the bivariate case, these problems have been studied by Sasabuchi et al. (1983) and (1998) and some others. In the present paper we examine the convergence of an iterative algorithm for computing the maximum likelihood estimator when p is larger than two. We also study some test procedures for testing homogeneity when p is larger than two.  相似文献   

18.
ABSTRACT

Factor analysis (FA) is the most commonly used pattern recognition methodology in social and health research. A technique that may help to better retrieve true information from FA is the rotation of the information axes. The main goal is to test the reliability of the results derived through FA and to reveal the best rotation method under various scenarios. Based on the results of the simulations, it was observed that when applying non-orthogonal rotation, the results were more repeatable as compared to the orthogonal rotation, and, when no rotation was applied.  相似文献   

19.
ABSTRACT

Suppose F and G are two life distribution functions. It is said that F is more IFRA (increasing failure rate average) than G (written by F ? *G) if G? 1F(x) is star-shaped on (0, ∞). In this paper, the problem of testing H0: F = *G against H1: F ? *G and F*G is considered in both cases when G is known and when G is unknown. We propose a new test based on U-statistics and obtain the asymptotic distribution of the test statistics. The new test is compared with some well-known tests in the literature. In addition, we apply our test to a real data set in the context of reliability.  相似文献   

20.
Abstract

In this article, we extend the concept of univariate frailty to a bivariate case to quantify and visualize the loss of efficiency of the log-rank test when a dependence structure between failure and censoring times is being ignored. We assume that an unobservable frailty influences the risk of failure and the other affects the risk of censoring, and those two frailties are correlated. Under the model being compared as a benchmark, the dependence structure between failure and censoring times is assumed to be completely observed. Under the model where the log-rank test is constructed without considering the dependency between failure and censoring times, it is assumed that the unobservable dependence structure has been absorbed into the baseline distributions. We note in our particular example that the loss of efficiency is minimal under the proportional hazards model even when the correlation between potential failure and censoring times is strong unless the dependence censorship induces a severe nonproportionality.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号