首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
ABSTRACT

The impact of class size on student achievement remains an open question despite hundreds of empirical studies and the perception among parents, teachers, and policymakers that larger classes are a significant detriment to student development. This study sheds new light on this ambiguity by utilizing nonparametric tests for stochastic dominance to analyze unconditional and conditional test score distributions across students facing different class sizes. Analyzing the conditional distributions of test scores (purged of observables using class-size specific returns), we find that there is little causal effect of marginal reductions in class size on test scores within the range of 20 or more students. However, reductions in class size from above 20 students to below 20 students, as well as marginal reductions in classes with fewer than 20 students, increase test scores for students below the median, but decrease test scores above the median. This nonuniform impact of class size suggests that compensatory school policies, whereby lower-performing students are placed in smaller classes and higher-performing students are placed in larger classes, improves the academic achievement of not just the lower-performing students but also the higher-performing students.  相似文献   

2.
Selection of a parsimonious model as a basis for statistical inference from capture-recapture data is critical, especially when using open models in the analysis of multiple, interrelated data sets (e.g. males and females, with two to three age classes, over three to five areas and 10-15 years). The global (i.e. most general) model for such data sets might contain hundreds of survival and recapture parameters. Here, we focus on a series of nested models of the Cormack-Jolly-Seber type wherein the likelihood arises from products of multinomial distributions whose cell probabilities are reparameterized in terms of survival ( phi ) and mean capture ( p ) probabilities. This paper presents numerical results on two information-theoretic methods for model selection when the capture probabilities are heterogeneous over individual animals: Akaike's Information Criterion (AIC) and a dimension-consistent criterion (CAIC), derived from a Bayesian viewpoint. Quality of model selection was evaluated based on the relative Euclidian distance between standardized theta and theta (parameter theta is vector-valued and contains the survival ( phi ) and mean capture ( p ) probabilities); this quantity (RSS = sigma{(theta i - theta i )/ theta i } 2 ) is a sum of squared bias and variance. Thus, the quality of inference (RSS) was judged by comparing the performance of the two information criteria and the use of the true model (used to generate the data), in relation to the model that provided the smallest RSS. We found that heterogeneity in the capture probabilities had a negligible effect on model selection using AIC or CAIC. Model size increased as sample size increased with both AIC- and CAIC-selected models.  相似文献   

3.
On a multiple choice test in which each item has r alternative options, a given number c of which are correct, various scoring models have been proposed. In one case the test-taker is allowed to choose any size solution subset and he/she is graded according to whether the subset is small and according to how many correct answers the subset contains. In a second case the test-taker is allowed to select only solution subsets of a prespecified maximum size and is graded as above. The first case is analogous to the situation where the test-taker is given a set of r options with each question; each question calls for a solution which consists of selecting that subset of the r responses which he/she believes to be correct. In the second case, when the prespecified solution subset is restricted to be of size at most one, the resulting scoring model corresponds to the usual model, referred to below as standard. The number c of correct options per item is usually known to the test-taker in this case.

Scoring models are evaluated according to how well they correctly identify the total scores of the individuals in the class of test-takers. Loss functions are constructed which penalize scoring models resulting in student scores which are not associated with the students true (or average) total score on the exam. Scoring models are compared on the basis of cross-validated assessments of the loss incurred by using each of the given models. It is shown that in many cases the assessment of the loss for scoring models which allow students the opportunity to choose more than one option for each question are smaller than the assessment of the loss for the standard scoring model.  相似文献   

4.
Probability forecasting models can be estimated using weighted score functions that (by definition) capture the performance of the estimated probabilities relative to arbitrary “baseline” probability assessments, such as those produced by another model, by a bookmaker or betting market, or by a human probability assessor. Maximum likelihood estimation (MLE) is interpretable as just one such method of optimum score estimation. We find that when MLE-based probabilities are themselves treated as the baseline, forecasting models estimated by optimizing any of the proven families of power and pseudospherical economic score functions yield the very same probabilities as MLE. The finding that probabilities estimated by optimum score estimation respond to MLE-baseline probabilities by mimicking them supports reliance on MLE as the default form of optimum score estimation.  相似文献   

5.
Conventional computations use real numbers as input and produce real numbers as results without any indication of the accuracy. Interval analysis, instead, uses interval elements throughout the computation and produces intervals as output with the guarantee that the true results are contained in them. One major use for interval analysis in statistics is to get results of high-dimensional multivariate probabilities. With the efforts to decrease the length of the intervals that contain the theoretically true answers, we can obtain results to any arbitrary accuracy, which is demonstrated by multivariate normal and multivariate t integrations. This is an advantage over the approximation methods that are currently in use. Since interval analysis is more computationally intensive than traditional computing, a MasPar parallel computer is used in this research to improve performance.  相似文献   

6.
In a seeded knockout tournament, where teams have some preassigned strength, do we have any assurances that the best team in fact has won? Is there some insight to be gained by considering which teams beat which other teams solely examining the seeds? We pose an answer to these questions by using the difference in the seeds of the two players as the basis for a test statistic. We offer several models for the underlying probability structure to examine the null distribution and power functions and determine these for small tournaments (less than five teams). One structure each for 8 teams and 16 teams is examined, and we conjecture an asymptotic normal distribution for the test statistic.  相似文献   

7.
We study the estimation of the strength of signals corresponding to the high valued observations in multivariate binary data. These problems can arise in a variety of areas, such as mass spectrometry or function magnetic resonance imaging (fMRI), where the underlying signals could be interpreted as a proxy for biochemical or physiological response to a condition or treatment. More specifically, the problem we consider involves estimating the sum of a collection of binomial probabilities corresponding to large values of the associated binomial random variables. We emphasize the case where the dimension is much greater than the sample size, and most of the probabilities of the events of interest are close to zero. Two estimation approaches are proposed: conditional maximum likelihood and nonparametric empirical Bayes. We use these estimators to construct a test of homogeneity for two groups of high dimensional multivariate binary data. Simulation studies on the size and power of the proposed tests are given, and the tests are demonstrated using mass spectrometry data from a breast cancer study.  相似文献   

8.
In this paper, the dependence of transition probabilities on covariates and a test procedure for covariate dependent Markov models are examined. The nonparametric test for the role of waiting time proposed by Jones and Crowley [M. Jones, J. Crowley, Nonparametric tests of the Markov model for survival data Biometrika 79 (3) (1992) 513–522] has been extended here to transitions and reverse transitions. The limitation of the Jones and Crowley method is that it does not take account of other covariates that might have association with the probabilities of transition. A simple test procedure is proposed that can be employed for testing: (i) the significance of association between covariates and transition probabilities, and (ii) the impact of waiting time on the transition probabilities. The procedure is illustrated using panel data on hospitalization of the elderly population in the USA from the Health and Retirement Survey (HRS).  相似文献   

9.
The use of Monte Carlo methods to generate exam datasets is nowadays a well-established practice among econometrics and statistics examiners all over the world. Its advantages are well known: providing each student a different data set ensures that estimates are actually computed individually, rather than copied from someone sitting nearby. The method however has a major fault: initial “random errors,” such as mistakes in downloading the assigned dataset, might generate downward bias in student evaluation. We propose a set of calibration algorithms, typical of indirect estimation methods, that solve the issue of initial “random errors” and reduce evaluation bias. Ensuring round initial estimates of the parameters for each individual dataset, our calibration procedures allow the students to determine if they have started the exam correctly. When initial estimates are not round numbers, this random error in the initial stage of the exam can be corrected for immediately, thus reducing evaluation bias. The procedure offers the further advantage of rounding markers’ life by allowing them to check round numbers answers only, rather than lists of numbers with many decimal digits1.  相似文献   

10.
Wilcoxon's signed rank sum test, Wilcoxon's rank sum test and the Ansari-Bradley rank test are three well-known distribution-free tests. When the sample size is large enough, the lower tail probabilities P 0 {T n /< = x} , P 0 {W m,n /< = x} and P 0 {A m,n /< = x} may be easily computed, under H 0 , using some normal approximations. When the size of the samples is too small, these normal approximations become insufficient. Therefore, the main goal of our work is to find some fast algorithms which compute the exact lower tail probabilities P 0 {T n /< = x}, P 0 {W m,n /< = x} and P 0 {A m,n /< = x} when the normal approximation is inefficient.  相似文献   

11.
WILCOXON-TYPE RANK-SUM PRECEDENCE TESTS   总被引:1,自引:0,他引:1  
This paper introduces Wilcoxon‐type rank‐sum precedence tests for testing the hypothesis that two life‐time distribution functions are equal. They extend the precedence life‐test first proposed by Nelson in 1963. The paper proposes three Wilcoxon‐type rank‐sum precedence test statistics—the minimal, maximal and expected rank‐sum statistics—and derives their null distributions. Critical values are presented for some combinations of sample sizes, and the exact power function is derived under the Lehmann alternative. The paper examines the power properties of the Wilcoxon‐type rank‐sum precedence tests under a location‐shift alternative through Monte Carlo simulations, and it compares the power of the precedence test, the maximal precedence test and Wilcoxon rank‐sum test (based on complete samples). Two examples are presented for illustration.  相似文献   

12.
This article considers the use of sports board games to introduce or illustrate a wide variety of probability concepts to introductory statistics students in an integrated manner. We demonstrate the use of a single game (Strat-O-Matic® Baseball) to introduce probability distributions, sample spaces, the laws of addition and multiplication of probabilities, independence, mutual exclusivity, randomization and independence, conditional probability, and Bayes' Theorem. Empirical and anecdotal evidence suggests that student comprehension and retention are enhanced by use of examples constructed from the simple and interesting contexts provided by a sports board game.  相似文献   

13.
In survival analysis, it is routine to test equality of two survival curves, which is often conducted by using the log-rank test. Although it is optimal under the proportional hazards assumption, the log-rank test is known to have little power when the survival or hazard functions cross. To test the overall homogeneity of hazard rate functions, we propose a group of partitioned log-rank tests. By partitioning the time axis and taking the supremum of the sum of two partitioned log-rank statistics over different partitioning points, the proposed test gains enormous power for cases with crossing hazards. On the other hand, when the hazards are indeed proportional, our test still maintains high power close to that of the optimal log-rank test. Extensive simulation studies are conducted to compare the proposed test with existing methods, and three real data examples are used to illustrate the commonality of crossing hazards and the advantages of the partitioned log-rank tests.  相似文献   

14.
Data Driven Rank Test for Two-Sample Problem   总被引:2,自引:0,他引:2  
Traditional linear rank tests are known to possess low power for large spectrum of alternatives. In this paper we introduce a new rank test possessing a considerably larger range of sensitivity than linear rank tests. The new test statistic is a sum of squares of some linear rank statistics while the number of summands is chosen via a data-based selection rule. Simulations show that the new test possesses high and stable power in situations when linear rank tests completely break down, while simultaneously it has almost the same power under alternatives which can be detected by standard linear rank tests. Our approach is illustrated by some practical examples. Theoretical support is given by deriving asymptotic null distribution of the test statistic and proving consistency of the new test under essentially any alternative.  相似文献   

15.
We propose a Bayesian hierarchical model for multiple comparisons in mixed models where the repeated measures on subjects are described with the subject random effects. The model facilitates inferences in parameterizing the successive differences of the population means, and for them, we choose independent prior distributions that are mixtures of a normal distribution and a discrete distribution with its entire mass at zero. For the other parameters, we choose conjugate or vague priors. The performance of the proposed hierarchical model is investigated in the simulated and two real data sets, and the results illustrate that the proposed hierarchical model can effectively conduct a global test and pairwise comparisons using the posterior probability that any two means are equal. A simulation study is performed to analyze the type I error rate, the familywise error rate, and the test power. The Gibbs sampler procedure is used to estimate the parameters and to calculate the posterior probabilities.  相似文献   

16.
ABSTRACT

In this paper, we investigate the performance of cumulative sum (CUSUM) stopping rules for the online detection of unknown change point in a time homogeneous Markov chain. Under the condition that the post-change transition probabilities are unknown, we proposed two CUSUM type schemes for the detection. The first scheme is based on the maximum likelihood estimates of the post-change transition probabilities. This scheme is limited by its computation burden, which is mitigated by another scheme based on the reference transition probabilities selected from a prior known region. We give the bounds of the mean delay time and the mean time between false alarms to illustrate the effectiveness of the proposed schemes. The results of the simulation also demonstrate the feasibility of the proposed schemes.  相似文献   

17.
In this paper we outline and illustrate an easy-to-use inference procedure for directly calculating the approximate bootstrap percentile-type p-value for the one-sample median test, i.e. we calculate the bootstrap p -value without resampling, by using a fractional order statistics based approach. The method parallels earlier work on fractionalorder-statistics-based non-parametric bootstrap percentile-type confidence intervals for quantiles. Monte Carlo simulation studies are performed, which illustrate that the fractional-order-statistics-based approach to the one-sample median test has accurate type I error control for small samples over a wide range of distributions; is easy to calculate; and is preferable to the sign test in terms of type I error control and power. Furthermore, the fractional-order-statistics-based median test is easily generalized to testing that any quantile has some hypothesized value; for example, tests for the upper or lower quartile may be performed using the same framework.  相似文献   

18.
The performance of Box-Cox power transformations in classification using Hinkley's (1975) method is studied. Misclassification probabilities before and after transformation are compared. It is found that the use of Box-Cox transformations can sometimes substantially reduce the error probabilities. Estimates of error probabilities are obtained and certain properties are derived. Examples for a number of distributions are given.  相似文献   

19.
After recalling the framework of minimum-contrast estimation, its consistency and its asymptotic normality, we highlight the fact that these results do not require any stationarity or ergodicity assumptions. The asymptotic distribution of the underlying contrast difference test is a weighted sum of independent chi-square variables having one degree of freedom each. We illustrate these results in three contexts: (1) a nonhomogeneous Markov chain with likelihood contrast; (2) a Markov field with coding, pseudolikelihood or likelihood contrasts; (3) a not necessarily Gaussian time series with Whittle's contrast. In contexts (2) and (3), we compare experimentally the power of the likelihood-ratio test with those of other contrast-difference tests.  相似文献   

20.
Bayesian inclusion probabilities have become a popular tool for variable assessment. From a frequentist perspective, it is often difficult to evaluate these probabilities as typically no Type I error rates are considered, neither are any explorations of power of the methods given. This paper considers how a frequentist may evaluate Bayesian inclusion probabilities for screening predictors. This evaluation looks at both unrestricted and restricted model spaces and develops a framework which a frequentist can utilize inclusion probabilities that preserve Type I error rates. Furthermore, this framework is applied to an analysis of the Arabidopsis thaliana with respect to determining quantitative trait loci associated with cotelydon opening angle.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号