期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Experience Simpson's Paradox in the Classroom

Jiangtao Gou Fengqing Zhang 《The American statistician》2017,71(1):61-66

Simpson's paradox is a challenging topic to teach in an introductory statistics course. To motivate students to understand this paradox both intuitively and statistically, this article introduces several new ways to teach Simpson's paradox. We design a paper toss activity between instructors and students in class to engage students in the learning process. We show that Simpson's paradox widely exists in basketball statistics, and thus instructors may consider looking for Simpson's paradox in their own school basketball teams as examples to motivate students’ interest. A new probabilistic explanation of Simpson's paradox is provided, which helps foster students’ statistical understanding. Supplementary materials for this article are available online. 相似文献

2.

Re-parameterization of multinomial distributions and diversity indices

Zhiyi Zhang Jun Zhou 《Journal of statistical planning and inference》2010

It is shown in this paper that the parameters of a multinomial distribution may be re-parameterized as a set of generalized Simpson's diversity indices. There are two important elements in the generalization: (1) Simpson's diversity index is extended to populations with infinite species; (2) weighting schemes are incorporated. A class of unbiased estimators for the generalized Simpson's biodiversity indices is proposed. Asymptotic normality is established for the estimators. Both the unbiasedness and the asymptotic normality of the estimators hold for all three cases of the number of species in the population: infinite, finite and known, and finite but unknown. In the case of a population with a finite number of species, known or unknown, it is also established that the proposed estimators are uniformly minimum variance unbiased and are asymptotically efficient. 相似文献

3.

Assessing noninferiority: Evaluating efficacy of a new treatment without complete data

Ping Gao Katherine Odem‐Davis 《Pharmaceutical statistics》2019,18(5):546-554

The FDA released the final guidance on noninferiority trials in November 2016. In noninferiority trials, validity of the assessment of the efficacy of the test treatment depends on the control treatment's efficacy. Therefore, it is critically important that there be a reliable estimate of the control treatment effect—which is generally obtained from historical trials, and often assumed to hold in the current setting (the assay constancy assumption). Validating the constancy assumption requires clinical data, which are typically lacking. The guidance acknowledges that “lack of constancy can occur for many reasons.” We clarify the objectives of noninferiority trials. We conclude that correction for bias, rather than assay constancy, is critical to conducting valid noninferiority trials. We propose that assay constancy not be assumed and discounting or thresholds be used to address concern about loss of historical efficacy. Examples are provided for illustration. 相似文献

4.

Asymptotic Properties of Adaptive Likelihood Weights by Cross-Validation

Xiaogang Wang 《统计学通讯:理论与方法》2013,42(7):1257-1270

For clinical trials on neurodegenerative diseases such as Parkinson's or Alzheimer's, the distributions of psychometric measures for both placebo and treatment groups are generally skewed because of the characteristics of the diseases. Through an analytical, but computationally intensive, algorithm, we specifically compare power curves between 3- and 7-category ordinal logistic regression models in terms of the probability of detecting the treatment effect, assuming a symmetric distribution or skewed distributions for the placebo group. The proportional odds assumption under the ordinal logistic regression model plays an important role in these comparisons. The results indicate that there is no significant difference in the power curves between 3-category and 7-category response models where a symmetric distribution is assumed for the placebo group. However, when the skewness becomes more extreme for the placebo group, the loss of power can be substantial. 相似文献

5.

Characterization of probability models by an odds ratio in prospective and retrospective studies

Takashi Yanagawa Kentaro Nomakuchi 《统计学通讯:理论与方法》2013,42(8):2347-2353

It is shown that, when exposure variables are continuous, the odds ratios are functions of exposure differences if and only if Cox's binary logistic models hold in a prospective framework, and if and only if the underlying distribution belongs to a family of exponential type distributions in a retrospective framework. 相似文献

6.

Performance of risk-adjusted cumulative sum charts when some assumptions are not met

A. Hussein A. Kasem S. Nkurunziza S. Campostrini 《统计学通讯:模拟与计算》2017,46(2):823-830

Monitoring health care performance outcomes such as post-operative mortality rates has recently become more common, spurring new statistical methodologies designed for this purpose. One such methodology is the Risk-adjusted Cumulative Sum chart (RA-CUSUM) for monitoring binary outcomes such as mortality after cardiac surgery. When building RA-CUSUMs, independence and model correctness are assumed. We carry out a simulation study to examine the effect of violating these two assumptions on the chart's performance. 相似文献

7.

Identification of the Direction of a Causal Effect by Instrumental Variables

Brendan Kline 《商业与经济统计学杂志》2016,34(2):176-184

This article provides a strategy to identify the existence and direction of a causal effect in a generalized nonparametric and nonseparable model identified by instrumental variables. The causal effect concerns how the outcome depends on the endogenous treatment variable. The outcome variable, treatment variable, other explanatory variables, and the instrumental variable can be essentially any combination of continuous, discrete, or “other” variables. In particular, it is not necessary to have any continuous variables, none of the variables need to have large support, and the instrument can be binary even if the corresponding endogenous treatment variable and/or outcome is continuous. The outcome can be mismeasured or interval-measured, and the endogenous treatment variable need not even be observed. The identification results are constructive, and can be empirically implemented using standard estimation results. 相似文献

8.

Identification of Local Causal Effects with Missing Outcome Values and an Instrument for Non Response

Alessandra Mattei Fabrizia Mealli Barbara Pacini 《统计学通讯:理论与方法》2014,43(4):815-825

Even in randomized experiments the identification of causal effects is often threatened by the presence of missing outcome values, with missingness possibly being non ignorable. We provide sufficient conditions under which the availability of a binary instrument for non response allows us to non parametrically point identify average causal effects in some latent subgroups of units, named Principal Strata, defined by their non response behavior in all possible combinations of treatment and instrument. Examples are provided as possible scenarios where our assumptions may be plausible. 相似文献

9.

Coordinate-free analysis of trends in British social mobility

Anna Klimova Tamás Rudas 《Journal of applied statistics》2012,39(8):1681-1691

This paper is intended to make a contribution to the ongoing debate about declining social mobility in Great Britain by analyzing mobility tables based on data from the 1991 British Household Panel Survey and the 2005 General Household Survey. The models proposed here generalize Hauser's levels models and allow for a semi-parametric analysis of change in social mobility. The cell frequencies are assumed to be equal to the product of three effects: the effect of the father's position for the given year, the effect of the son's position for the given year, and the mobility effect related to the difference between the father's and the son's positions. A generalization of the iterative proportional fitting procedure is proposed and applied to computing the maximum likelihood estimates of the cell frequencies. The standard errors of the estimated parameters are computed under the product-multinomial sampling assumption. The results indicate opposing trends of mobility between the two timepoints. Fewer steps up or down in the society became less likely, while more steps became somewhat more likely. 相似文献

10.

Unconditional Quantile Treatment Effects Under Endogeneity

Markus Frölich Blaise Melly 《商业与经济统计学杂志》2013,31(3):346-357

This article develops estimators for unconditional quantile treatment effects when the treatment selection is endogenous. We use an instrumental variable (IV) to solve for the endogeneity of the binary treatment variable. Identification is based on a monotonicity assumption in the treatment choice equation and is achieved without any functional form restriction. We propose a weighting estimator that is extremely simple to implement. This estimator is root n consistent, asymptotically normally distributed, and its variance attains the semiparametric efficiency bound. We also show that including covariates in the estimation is not only necessary for consistency when the IV is itself confounded but also for efficiency when the instrument is valid unconditionally. An application of the suggested methods to the effects of fertility on the family income distribution illustrates their usefulness. Supplementary materials for this article are available online. 相似文献

11.

A Monte Carlo comparison of alternative methods of maximum likelihood ranking in racing sports

Aaron Anderson 《Journal of applied statistics》2015,42(8):1740-1756

Applications of maximum likelihood techniques to rank competitors in sports are commonly based on the assumption that each competitor's performance is a function of a deterministic component that represents inherent ability and a stochastic component that the competitor has limited control over. Perhaps based on an appeal to the central limit theorem, the stochastic component of performance has often been assumed to be a normal random variable. However, in the context of a racing sport, this assumption is problematic because the resulting model is the computationally difficult rank-ordered probit. Although a rank-ordered logit is a viable alternative, a Thurstonian paired-comparison model could also be applied. The purpose of this analysis was to compare the performance of the rank-ordered logit and Thurstonian paired-comparison models given the objective of ranking competitors based on ability. Monte Carlo simulations were used to generate race results based on a known ranking of competitors, assign rankings from the results of the two models, and judge performance based on Spearman's rank correlation coefficient. Results suggest that in many applications, a Thurstonian model can outperform a rank-ordered logit if each competitor's performance is normally distributed. 相似文献

12.

MONOTONE BOUNDS ON THE CHI-SQUARED APPROXIMATION TO THE DISTRIBUTION OF PEARSON'S XSTATISTICS12

D. R. Jensen 《Australian & New Zealand Journal of Statistics》1973,15(2):65-70

Monotone bounds, depending on the underlying multinomial probabilities and the sample size, are given for the chi-squared approximation to the distribution of Pearson's statistic for goodness of fit for simple hypotheses. These bounds apply to the distribution of a single statistic and to the joint distribution of two statistics associated with the margins of a two-way table, in both the central and non-central cases. 相似文献

13.

Model-based clustering of multivariate ordinal data relying on a stochastic binary search algorithm

Christophe Biernacki Julien Jacques 《Statistics and Computing》2016,26(5):929-943

We design a probability distribution for ordinal data by modeling the process generating data, which is assumed to rely only on order comparisons between categories. Contrariwise, most competitors often either forget the order information or add a non-existent distance information. The data generating process is assumed, from optimality arguments, to be a stochastic binary search algorithm in a sorted table. The resulting distribution is natively governed by two meaningful parameters (position and precision) and has very appealing properties: decrease around the mode, shape tuning from uniformity to a Dirac, identifiability. Moreover, it is easily estimated by an EM algorithm since the path in the stochastic binary search algorithm can be considered as missing values. Using then the classical latent class assumption, the previous univariate ordinal model is straightforwardly extended to model-based clustering for multivariate ordinal data. Parameters of this mixture model are estimated by an AECM algorithm. Both simulated and real data sets illustrate the great potential of this model by its ability to parsimoniously identify particularly relevant clusters which were unsuspected by some traditional competitors. 相似文献

14.

Optimal sampling for repeated binary measurements

Fernando A. Quintana Peter Müller 《Revue canadienne de statistique》2004,32(1):73-84

The authors consider the optimal design of sampling schedules for binary sequence data. They propose an approach which allows a variety of goals to be reflected in the utility function by including deterministic sampling cost, a term related to prediction, and if relevant, a term related to learning about a treatment effect To this end, they use a nonparametric probability model relying on a minimal number of assumptions. They show how their assumption of partial exchangeability for the binary sequence of data allows the sampling distribution to be written as a mixture of homogeneous Markov chains of order k. The implementation follows the approach of Quintana & Müller (2004), which uses a Dirichlet process prior for the mixture. 相似文献

15.

Comparison of several imputation methods for missing baseline data in propensity scores analysis of binary outcome

Brenda J. Crowe Ilya A. Lipkovich Ouhong Wang 《Pharmaceutical statistics》2010,9(4):269-279

We performed a simulation study comparing the statistical properties of the estimated log odds ratio from propensity scores analyses of a binary response variable, in which missing baseline data had been imputed using a simple imputation scheme (Treatment Mean Imputation), compared with three ways of performing multiple imputation (MI) and with a Complete Case analysis. MI that included treatment (treated/untreated) and outcome (for our analyses, outcome was adverse event [yes/no]) in the imputer's model had the best statistical properties of the imputation schemes we studied. MI is feasible to use in situations where one has just a few outcomes to analyze. We also found that Treatment Mean Imputation performed quite well and is a reasonable alternative to MI in situations where it is not feasible to use MI. Treatment Mean Imputation performed better than MI methods that did not include both the treatment and outcome in the imputer's model. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

16.

Asymptotic normality for plug-in estimators of diversity indices on countable alphabets

Michael Grabchak Zhiyi Zhang 《Journal of nonparametric statistics》2018,30(3):774-795

The plug-in estimator is one of the most popular approaches to the estimation of diversity indices. In this paper, we study its asymptotic distribution for a large class of diversity indices on countable alphabets. In particular, we give conditions for the plug-in estimator to be asymptotically normal, and in the case of uniform distributions, where asymptotic normality fails, we give conditions for the asymptotic distribution to be chi-squared. Our results cover some of the most commonly used indices, including Simpson's index, Reńyi's entropy and Shannon's entropy. 相似文献

17.

Reporting cumulative proportion of subjects with an adverse event based on data from multiple studies

Christy Chuang‐Stein Mohan Beltangady 《Pharmaceutical statistics》2011,10(1):3-7

Experience has shown us that when data are pooled from multiple studies to create an integrated summary, an analysis based on naïvely‐pooled data is vulnerable to the mischief of Simpson's Paradox. Using the proportions of patients with a target adverse event (AE) as an example, we demonstrate the Paradox's effect on both the comparison and the estimation of the proportions. While meta analytic approaches have been recommended and increasingly used for comparing safety data between treatments, reporting proportions of subjects experiencing a target AE based on data from multiple studies has received little attention. In this paper, we suggest two possible approaches to report these cumulative proportions. In addition, we urge that regulatory guidelines on reporting such proportions be established so that risks can be communicated in a scientifically defensible and balanced manner. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

18.

Bayes methods in the ecological fallacy cntext:estimation of individual correlation from aggregate data

Robert B. Bendel Bradley P. Carlin 《统计学通讯:理论与方法》2013,42(7):2595-2623

The ecological fallacy is related to Simpson's paradox (1951) where relationships among group means may be counterintuitive and substantially different from relationships within groups, where the groups are usually geographic entities such as census tracts. We consider the problem of estimating the correlation between two jointly normal random variables where only ecological data (group means) are available. Two empirical Bayes estimators and one fully Bayesian estimator are derived and compared with the usual ecological estimator, which is simply the Pearson correlation coefficient of the group sample means. We simulate the bias and mean squared error performance of these estimators, and also give an example employing a dataset where the individual level data are available for model checking. The results indicate superiority of the empirical Bayes estimators in a variety of practical situations where, though we lack individual level data, other relevant prior information is available. 相似文献

19.

Simulations to improve experimental designs for u-shaped dose-response modeling

《Journal of Statistical Computation and Simulation》2012,82(9):737-746

This paper investigates the results of simulations from which clustered binary dose-response data are generated. This data mimics the type of discrete data collected from experiments conducted in developmental toxicity studies on animals. In particular one assumption used in the design of these simulations is that hormesis exists, as evidenced by the dose-response pattern of the generated data. This implies the existence of a threshold level, as hormesis, if it exists, would exist below this level. Below the threshold level, no adverse effects above the response at the control dose level should exist. While hormesis implies several dose-response patterns below threshold, in this paper, the hormetic pattern is assumed to be U-shaped. Improving upon the design of current and past developmental studies, these simulations also include designs in which dose levels and litters (clusters of animals) are allocated in a way that increases the power for detecting hormesis, assuming it exists. The beta-binomial distribution is used to model the clustered binary data that results from responses of animals in the same litter. The results of these simulations will indicate that by altering current designs of developmental studies, this improves the ability to detect hormesis. 相似文献

20.

Inflated statistical significance of student's t test associated with small intersubject correlation

《Journal of Statistical Computation and Simulation》2012,82(9):691-696

The independence assumption in statistical significance testing becomes increasingly crucial and unforgiving as sample size increases. Seemingly, inconsequential violations of this assumption can substantially increase the probability of a Type I error if sample sizes are large. In the case of Student's t test, it is found that correlations within samples in a range from 0.01 to 0.05 can lead to rejection of a true null hypothesis with high probability, if N is 50, 100 or larger. 相似文献