期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The cross-product ratio in bivariate lognormal and gamma distributions,with an application to non-randomized trials

Paul D. Baxter Paul R. Marchant 《Journal of applied statistics》2010,37(4):529-536

Non-randomized trials can give a biased impression of the effectiveness of any intervention. We consider trials in which incidence rates are compared in two areas over two periods. Typically, one area receives an intervention, whereas the other does not. We outline and illustrate a method to estimate the bias in such trials under two different bivariate models. The illustrations use data in which no particular intervention is operating. The purpose is to illustrate the size of the bias that could be observed purely due to regression towards the mean (RTM). The illustrations show that the bias can be appreciably different from zero, and even when centred on zero, the variance of the bias can be large. We conclude that the results of non-randomized trials should be treated with caution, as interventions which show small effects could be explained as artefacts of RTM. 相似文献

2.

On the Estimation Accuracy of Causal Effects using Supplementary Variables

下载免费PDF全文

Manabu Kuroki Takahiro Hayashi 《Scandinavian Journal of Statistics》2016,43(2):505-519

This paper focuses on a situation in which a set of treatments is associated with a response through a set of supplementary variables in linear models as well as discrete models. Under the situation, we demonstrate that the causal effect can be estimated more accurately from the set of supplementary variables. In addition, we show that the set of supplementary variables can include selection variables and proxy variables as well. Furthermore, we propose selection criteria for supplementary variables based on the estimation accuracy of causal effects. From graph structures based on our results, we can judge certain situations under which the causal effect can be estimated more accurately by supplementary variables and reliably evaluate the causal effects from observed data. 相似文献

3.

Split Hamiltonian Monte Carlo

Babak Shahbaba Shiwei Lan Wesley O. Johnson Radford M. Neal 《Statistics and Computing》2014,24(3):339-349

We show how the Hamiltonian Monte Carlo algorithm can sometimes be speeded up by “splitting” the Hamiltonian in a way that allows much of the movement around the state space to be done at low computational cost. One context where this is possible is when the log density of the distribution of interest (the potential energy function) can be written as the log of a Gaussian density, which is a quadratic function, plus a slowly-varying function. Hamiltonian dynamics for quadratic energy functions can be analytically solved. With the splitting technique, only the slowly-varying part of the energy needs to be handled numerically, and this can be done with a larger stepsize (and hence fewer steps) than would be necessary with a direct simulation of the dynamics. Another context where splitting helps is when the most important terms of the potential energy function and its gradient can be evaluated quickly, with only a slowly-varying part requiring costly computations. With splitting, the quick portion can be handled with a small stepsize, while the costly portion uses a larger stepsize. We show that both of these splitting approaches can reduce the computational cost of sampling from the posterior distribution for a logistic regression model, using either a Gaussian approximation centered on the posterior mode, or a Hamiltonian split into a term that depends on only a small number of critical cases, and another term that involves the larger number of cases whose influence on the posterior distribution is small. 相似文献

4.

Using order statistics to assess the sampling variability of personnel selection utility estimates

Wilfried De Corte 《Journal of applied statistics》2000,27(6):703-713

Virtually all models for the utility of personnel selection are based on the average criterion score of the predictor selected applicants. This paper indicates how standard results from the theory on order statistics can be used to determine the expected value, the standard error and the sampling distribution of the average criterion score statistic when a finite number of employees is selected. Exact as well as approximate results are derived and it is shown how these results can be used to construct intervals that will contain, with a given probability 1 - f , the average criterion score associated with a particular implementation of the personnel selection. These interval estimates are particularly helpful to the selection practitioner because they can be used to state the confidence level with which the selection payoff will be above a specific value. In addition, for most realistic selection scenarios, it is found that the corresponding utility interval estimate is quite large. For situations in which multiple selections are performed over time, the utility intervals are, however, smaller. 相似文献

5.

Multivariate Quality Control Chart for Autocorrelated Processes 总被引：4，自引：1，他引：3

A. A. Kalgonda S. R. Kulkarni 《Journal of applied statistics》2004,31(3):317-327

Traditional multivariate statistical process control (SPC) techniques are based on the assumption that the successive observation vectors are independent. In recent years, due to automation of measurement and data collection systems, a process can be sampled at higher rates, which ultimately leads to autocorrelation. Consequently, when the autocorrelation is present in the data, it can have a serious impact on the performance of classical control charts. This paper considers the problem of monitoring the mean vector of a process in which observations can be modelled as a first-order vector autoregressive VAR (1) process. We propose a control chart called Z-chart which is based on the single step finite intersection test (Timm, 1996). An important feature of the proposed method is that it not only detects an out of control status but also helps in identifying variable(s) responsible for the out of control situation. The proposed method is illustrated with the help of suitable illustrations. 相似文献

6.

Exact Non-parametric Confidence Intervals for Quantiles with Progressive Type-II Censoring 总被引：1，自引：0，他引：1

Olivier Guilbaud 《Scandinavian Journal of Statistics》2001,28(4):699-713

It is shown how various exact non-parametric inferences based on order statistics in one or two random samples can be generalized to situations with progressive type-II censoring, which is a kind of evolutionary right censoring. Ordinary type-II right censoring is a special case of such progressive censoring. These inferences include confidence intervals for a given parent quantile, prediction intervals for a given order statistic of a future sample, and related two-sample inferences based on exceedance probabilities. The proposed inferences are valid for any parent distribution with continuous distribution function. The key result is that each observable uncensored order statistic that becomes available with progressive type-II censoring can be represented as a mixture with known weights of underlying ordinary order statistics. The importance of this mixture representation lies in that various properties of such observable order statistics can be deduced immediately from well-known properties of ordinary order statistics. 相似文献

7.

Alleviating linear ecological bias and optimal design with subsample data

Adam N. Glynn Jon Wakefield Mark S. Handcock Thomas S. Richardson 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2008,171(1):179-202

Summary. We illustrate that combining ecological data with subsample data in situations in which a linear model is appropriate provides two main benefits. First, by including the individual level subsample data, the biases that are associated with linear ecological inference can be eliminated. Second, available ecological data can be used to design optimal subsampling schemes that maximize information about parameters. We present an application of this methodology to the classic problem of estimating the effect of a college degree on wages, showing that small, optimally chosen subsamples can be combined with ecological data to generate precise estimates relative to a simple random subsample. 相似文献

8.

Uniform Ergodicity of the Particle Gibbs Sampler

下载免费PDF全文

Fredrik Lindsten Randal Douc Eric Moulines 《Scandinavian Journal of Statistics》2015,42(3):775-797

The particle Gibbs sampler is a systematic way of using a particle filter within Markov chain Monte Carlo. This results in an off‐the‐shelf Markov kernel on the space of state trajectories, which can be used to simulate from the full joint smoothing distribution for a state space model in a Markov chain Monte Carlo scheme. We show that the particle Gibbs Markov kernel is uniformly ergodic under rather general assumptions, which we will carefully review and discuss. In particular, we provide an explicit rate of convergence, which reveals that (i) for fixed number of data points, the convergence rate can be made arbitrarily good by increasing the number of particles and (ii) under general mixing assumptions, the convergence rate can be kept constant by increasing the number of particles superlinearly with the number of observations. We illustrate the applicability of our result by studying in detail a common stochastic volatility model with a non‐compact state space. 相似文献

9.

Simulation of right and left truncated gamma distributions by mixtures

PHILIPPE ANNE 《Statistics and Computing》1997,7(3):173-181

We study the properties of truncated gamma distributions and we derive simulation algorithms which dominate the standard algorithms for these distributions. For the right truncated gamma distribution, an optimal accept–reject algorithm is based on the fact that its density can be expressed as an infinite mixture of beta distribution. For integer values of the parameters, the density of the left truncated distributions can be rewritten as a mixture which can be easily generated. We give an optimal accept–reject algorithm for the other values of the parameter. We compare the efficiency of our algorithm with the previous method and show the improvement in terms of minimum acceptance probability. The algorithm proposed here has an acceptance probability which is superior to e/4. 相似文献

10.

The Perils of Balance Testing in Experimental Design: Messy Analyses of Clean Data

Diana C. Mutz Robin Pemantle Philip Pham 《The American statistician》2019,73(1):32-42

Widespread concern over the credibility of published results has led to scrutiny of statistical practices. We address one aspect of this problem that stems from the use of balance tests in conjunction with experimental data. When random assignment is botched, due either to mistakes in implementation or differential attrition, balance tests can be an important tool in determining whether to treat the data as observational versus experimental. Unfortunately, the use of balance tests has become commonplace in analyses of “clean” data, that is, data for which random assignment can be stipulated. Here, we show that balance tests can destroy the basis on which scientific conclusions are formed, and can lead to erroneous and even fraudulent conclusions. We conclude by advocating that scientists and journal editors resist the use of balance tests in all analyses of clean data. Supplementary materials for this article are available online 相似文献

11.

An Investigation of Agreement Between the Item Difficulty Coefficient Calculated in Accordance With Classical Test Theory and Item Response Theory With Bland–Altman Method

Tülin Acar 《统计学通讯:理论与方法》2013,42(21):4614-4621

The purpose of this study is to investigate agreement between item difficulty coefficients calculated relying on classical test theory and item response theory with Bland–Altman method. According to results, although there is a high correlation between Pj and b coefficient estimated with HGLM (hierarchical generalized linear model), 1P, and 3P models, it can be said that there is no agreement between two methods and cannot be used interchangeably. It is observed that the confidence limit is wide according to Bland–Altman graphics. Therefore, it can be said that there is no agreement between item difficulty values obtained from two methods. Bland–Altman method which is used in clinical studies mostly is suggested to be used in the comparison of methods used especially in the evaluation of student performance in education, in agreement studies among specialist considerations especially in terms of providing additional information to the studies in which correlation coefficient is calculated. 相似文献

12.

Principal component analysis of landmarks from reversible images 总被引：1，自引：0，他引：1

C. M. Theobald C. A. Glasbey G. W. Horgan C. D. Robinson 《Journal of the Royal Statistical Society. Series C, Applied statistics》2004,53(1):163-175

相似文献

13.

Finding 'superclassifications' with an acceptable misclassification rate

Paul C. Taylor David J. Hand 《Journal of applied statistics》1999,26(5):579-590

Cluster analysis methods are based on measures of 'distance' between objects. Sometimes the objects have an internal structure, and use of this can be made when defining such distances. This leads to non-standard cluster analysis methods. We illustrate with an application in which the objects are themselves classes and the aim is to produce clusters of classes which minimize the error rate of a supervised classification rule. For supervised classification problems with more than a handful of classes, there may exist groups of classes which are well separated from other groups, even though individual classes are not all well separated. In such cases, the overall misclassification rate is a crude measure of performance and more subtle measures, taking note of subgroup separation, are desirable. The fact that points can be assigned accurately to groups, if not to individual classes, can sometimes be practically useful. 相似文献

14.

Kernel Density-Based Linear Regression Estimate

Weixin Yao Zhibiao Zhao 《统计学通讯:理论与方法》2013,42(24):4499-4512

For linear regression models with non normally distributed errors, the least squares estimate (LSE) will lose some efficiency compared to the maximum likelihood estimate (MLE). In this article, we propose a kernel density-based regression estimate (KDRE) that is adaptive to the unknown error distribution. The key idea is to approximate the likelihood function by using a nonparametric kernel density estimate of the error density based on some initial parameter estimate. The proposed estimate is shown to be asymptotically as efficient as the oracle MLE which assumes the error density were known. In addition, we propose an EM type algorithm to maximize the estimated likelihood function and show that the KDRE can be considered as an iterated weighted least squares estimate, which provides us some insights on the adaptiveness of KDRE to the unknown error distribution. Our Monte Carlo simulation studies show that, while comparable to the traditional LSE for normal errors, the proposed estimation procedure can have substantial efficiency gain for non normal errors. Moreover, the efficiency gain can be achieved even for a small sample size. 相似文献

15.

Evaluation of probabilities for the noncentral distributions and the difference of two T-variables with a desk calculator

《Journal of Statistical Computation and Simulation》2012,82(3-4):199-206

It is demonstrated that integrals of the noncentral chi-square, noncentral F and noncentral T distributions can be evaluated on desk calculators. The same procedure can be used to compute probabilities for the distribution of the difference of two T-variables with equal degrees of freedom. The proposed method of computation can be used with any computer which yields probabilities for the chi-square and F distributions. 相似文献

16.

Inference for Epidemics with Three Levels of Mixing: Methodology and Application to a Measles Outbreak

TOM BRITTON THEODORE KYPRAIOS PHILIP D. O'NEILL 《Scandinavian Journal of Statistics》2011,38(3):578-599

Abstract. A stochastic epidemic model is defined in which each individual belongs to a household, a secondary grouping (typically school or workplace) and also the community as a whole. Moreover, infectious contacts take place in these three settings according to potentially different rates. For this model, we consider how different kinds of data can be used to estimate the infection rate parameters with a view to understanding what can and cannot be inferred. Among other things we find that temporal data can be of considerable inferential benefit compared with final size data, that the degree of heterogeneity in the data can have a considerable effect on inference for non‐household transmission, and that inferences can be materially different from those obtained from a model with only two levels of mixing. We illustrate our findings by analysing a highly detailed dataset concerning a measles outbreak in Hagelloch, Germany. 相似文献

17.

Mantel test for spatial functional data

Ramón Giraldo William Caballero Jesús Camacho-Tamayo 《AStA Advances in Statistical Analysis》2018,102(1):21-39

Statistics for spatial functional data is an emerging field in statistics which combines methods of spatial statistics and functional data analysis to model spatially correlated functional data. Checking for spatial autocorrelation is an important step in the statistical analysis of spatial data. Several statistics to achieve this goal have been proposed. The test based on the Mantel statistic is widely known and used in this context. This paper proposes an application of this test to the case of spatial functional data. Although we focus particularly on geostatistical functional data, that is functional data observed in a region with spatial continuity, the test proposed can also be applied with functional data which can be measured on a discrete set of areas of a region (areal functional data) by defining properly the distance between the areas. Based on two simulation studies, we show that the proposed test has a good performance. We illustrate the methodology by applying it to an agronomic data set. 相似文献

18.

Role of statistics in quality and productivity improvement

George E. P. Box 《Journal of applied statistics》1996,23(1):3-20

The role of statistics in quality and productivity improvement depends on certain philosophical issues that the author believes have been inadequately addressed. Three such issues are as follows: (1) what is the role of statistics in the process of investigation and discovery; (2) how can we extrapolate results from the particular to the general; and (3) how can we evaluate possible management changes so that they truly benefit an organization? Therefore, statistical methods appropriate to investigation and discovery are discussed as distinct from those appropriate to the testing of an already discovered solution. It is shown how the manner in which the tentative solution has been arrived at determines the assurance with which experimental conclusions can be extrapolated to the application in mind. Whether or not statistical methods and training can have any impact depends on the system of management. A vector representation which can help predict the consequences of changes in management strategy is discussed. This can help to realign policies so that members of an organization can better work together for the benefit of the organization. 相似文献

19.

Statistical modelling and saddle-point approximation of tail probabilities for accumulated splice loss in fibre-optic networks

J. Tyrcha P. Sundberg B. Lindskog Sundstrom 《Journal of applied statistics》2000,27(2):245-256

Tail probabilities are calculated by saddle-point approximation in a probabilistic-statistical model for the accumulated splice loss that results from a number of fusion splices in the installation of fibre-optic networks. When these probabilities, representing the risk of exceeding a specified total loss, can be controlled and kept low, the requirements on the individual losses can be substantially relaxed from their customary settings. As a consequence, it should be possible to save considerable installation time and cost. The probabilistic model, which can be theoretically motivated, states that the individual loss is basically exponentially distributed, but with a Gaussian contribution added and truncated at a set value, and that the loss is additive over splices. An extensive set of installation data fitted well with this model, except for occasional high losses. Therefore, the model described was extended to allow for a frequency of unspecified high losses of this sort. It is also indicated how the model parameters can be estimated from data. 相似文献

20.

Using Markov Chains for Non-perennial Daily Streamflow Data Generation

Hafzullah Aksoy 《Journal of applied statistics》2004,31(9):1083-1094

The use of Markov chains to simulate non-perennial streamflow data is considered. A non-perennial stream may be thought as having three states, namely zero flow, increasing flow and decreasing flow, for which a three-state Markov chain can be constructed. Alternatively, two two-state Markov chains can be used, the first of which represents the existence and non-existence of flow, whereas the second deals with the increment and decrement in the flow for periods with flow. Probabilistic relationships between the two alternatives are derived. Their performances in simulating the state of the stream are compared on the basis of data from two different geographical regions in Turkey. It is concluded that both alternatives are capable of simulating the state of the stream. 相似文献