期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

All Data are Wrong,but Some are Useful? Advocating the Need for Data Auditing

Sitsofe Tsagbey Miguel de Carvalho Garritt L. Page 《The American statistician》2017,71(3):231-235

In a recent article from the Annals of Applied Statistics, Cox discussed the main phases of applied statistical research ranging from clarifying study objectives to final data analysis and interpreting results. As an incidental remark to these main phases, we advocate that beyond cleaning and preprocessing the data, it is a good practice to audit the data to determine if they can be trusted at all. A case study based on Ghanaian Official Fishery Statistics is used to illustrate this need, with Benford's law being the tool used to carrying out the data audit. Supplementary materials for this article are available online. 相似文献

2.

Not the First Digit! Using Benford's Law to Detect Fraudulent Scientif ic Data

Andreas Diekmann 《Journal of applied statistics》2007,34(3):321-329

Digits in statistical data produced by natural or social processes are often distributed in a manner described by ‘Benford's law’. Recently, a test against this distribution was used to identify fraudulent accounting data. This test is based on the supposition that first, second, third, and other digits in real data follow the Benford distribution while the digits in fabricated data do not. Is it possible to apply Benford tests to detect fabricated or falsified scientific data as well as fraudulent financial data? We approached this question in two ways. First, we examined the use of the Benford distribution as a standard by checking the frequencies of the nine possible first and ten possible second digits in published statistical estimates. Second, we conducted experiments in which subjects were asked to fabricate statistical estimates (regression coefficients). The digits in these experimental data were scrutinized for possible deviations from the Benford distribution. There were two main findings. First, both digits of the published regression coefficients were approximately Benford distributed or at least followed a pattern of monotonic decline. Second, the experimental results yielded new insights into the strengths and weaknesses of Benford tests. Surprisingly, first digits of faked data also exhibited a pattern of monotonic decline, while second, third, and fourth digits were distributed less in accordance with Benford's law. At least in the case of regression coefficients, there were indications that checks for digit-preference anomalies should focus less on the first (i.e. leftmost) and more on later digits. 相似文献

3.

The Moment Bound is Tighter than Chernoff's Bound for Positive Tail Probabilities

Thomas K. Philips Randolph Nelson 《The American statistician》2013,67(2):175-178

Chernoff's bound on P[X ? t] is used almost universally when a tight bound on tail probabilities is required. In this article we show that for all positive t and for all distributions, the moment bound is tighter than Chernoff's bound. By way of example, we demonstrate that the improvement is often substantial. 相似文献

4.

Statistics of Primes (and Probably Twin Primes) Satisfy Taylor's Law from Ecology

Joel E. Cohen 《The American statistician》2013,67(4):399-404

Taylor's law, which originated in ecology, states that, in sets of measurements of population density, the sample variance is approximately proportional to a power of the sample mean. Taylor's law has been verified for many species ranging from bacterial to human. Here, we show that the variance V(x) and the mean M(x) of the primes not exceeding a real number x obey Taylor's law asymptotically for large x. Specifically, V(x) ～ (1/3)(M(x))² as x → ∞. This apparently new fact about primes shows that Taylor's law may arise in the absence of biological processes, and that patterns discovered in biological data can suggest novel questions in number theory. If the Hardy-Littlewood twin primes conjecture is true, then the identical Taylor's law holds also for twin primes. Taylor's law holds in both instances because the primes (and the twin primes, given the conjecture) not exceeding x are asymptotically uniformly distributed on the integers in [2, x]. Hence, asymptotically M(x) ～ x/2, V(x) ～ x²/12. Higher-order moments of the primes (twin primes) not exceeding x satisfy a generalized Taylor's law. The 11,078,937 primes and 813,371 twin primes not exceeding 2 × 10⁸ illustrate these results. 相似文献

5.

A first-digit anomaly in the 2009 Iranian presidential election

Boudewijn F. Roukema 《Journal of applied statistics》2014,41(1):164-199

A local bootstrap method is proposed for the analysis of electoral vote-count first-digit frequencies, complementing the Benford's Law limit. The method is calibrated on five presidential-election first rounds (2002–2006) and applied to the 2009 Iranian presidential-election first round. Candidate K has a highly significant (p<0.15% ) excess of vote counts starting with the digit 7. This leads to other anomalies, two of which are individually significant at p～ 0.1% and one at p～ 1%. Independently, Iranian pre-election opinion polls significantly reject the official results unless the five polls favouring candidate A are considered alone. If the latter represent normalised data and a linear, least-squares, equal-weighted fit is used, then either candidates R and K suffered a sudden, dramatic (70%±15% ) loss of electoral support just prior to the election, or the official results are rejected (p～ 0.01% ). 相似文献

6.

Distributional Studies and the Computer: An Analysis of Durbin's Rank Test

Richard F. Fawcett Kathleen C. Salter 《The American statistician》2013,67(1):81-83

A study of the distribution of a statistic involves two major steps: (a) working out its asymptotic, large n, distribution, and (b) making the connection between the asymptotic results and the distribution of the statistic for the sample sizes used in practice. This crucial second step is not included in many studies. In this article, the second step is applied to Durbin's (1951) well-known rank test of treatment effects in balanced incomplete block designs (BIB's). We found that asymptotic, χ², distributions do not provide adequate approximations in most BIB's. Consequently, we feel that several of Durbin's recommendations should be altered. 相似文献

7.

A Data-Analytic Look at Skewness and Elongation in Common-Stock-Return Distributions

S. G. Badrinath Sangit Chatterjee 《商业与经济统计学杂志》2013,31(2):223-233

This article explores the nature of skewness and elongation in daily common-stock-return distributions of individual firms using estimates of g (for skewness) and h (for elongation) obtained from Tukey's g and h distributions. Both parametric and nonparametric (bootstrap) estimates of standard errors of the g estimates are computed and compared. Daily return distributions are first examined cross-sectionally over a large sample of firms. The estimates of the skewness parameter exhibit variation across individual firms, but some general trends are evident across industry groups and firm sizes. Return distributions typically seem to be more elongated than the Gaussian distribution. From a time series perspective, both skewness and elongation are persistent in the return distributions of individual firms and vary over a finite range. First-order autocorrelation coefficients of monthly g and h estimates are large and suggest a certain degree of predictability. 相似文献

8.

Tests for Differences in Dispersion Based on Quantiles

Lewis H. Shoemaker 《The American statistician》2013,67(2):179-182

It is commonly known that the validity of the F test for testing differences in variability is highly sensitive to the assumption that the population distributions are normal. Hence there is a need for nonparametric tests that do not rely on the assumption of normal population distributions. Several nonparametric tests for testing differences in dispersion have been developed in the past 40 years. These include Mood's test, Klotz's test, and the Siegel-Tukey test. Unfortunately, many of these tests do not have a natural or easily calculated measure of dispersion associated with them. This article introduces a test for differences in dispersion based on quantiles that is easy to compute and readily comprehended by the casual user of statistics. 相似文献

9.

Exploring the distribution for the estimator of Rosenthal's ‘fail-safe’ number of unpublished studies in meta-analysis

Konstantinos C. Fragkos Michail Tsagris Christos C. Frangos 《统计学通讯:理论与方法》2017,46(11):5672-5684

The present article discusses the statistical distribution for the estimator of Rosenthal's ‘file-drawer’ number N_R, which is an estimator of unpublished studies in meta-analysis. We calculate the probability distribution function of N_R. This is achieved based on the central limit theorem and the proposition that certain components of the estimator N_R follow a half-normal distribution, derived from the standard normal distribution. Our proposed distributions are supported by simulations and investigation of convergence. 相似文献

10.

Likelihood-Based Inference for Weak Exogeneity in I(2) Cointegrated VAR Models

Takamitsu Kurita 《Econometric Reviews》2013,32(3):325-360

This article develops limit theory for likelihood analysis of weak exogeneity in I(2) cointegrated vector autoregressive (VAR) models incorporating deterministic terms. Conditions for weak exogeneity in I(2) VAR models are reviewed, and the asymptotic properties of conditional maximum likelihood estimators and a likelihood-based weak exogeneity test are then investigated. It is demonstrated that weak exogeneity in I(2) VAR models allows us to conduct asymptotic conditional inference based on mixed Gaussian distributions. It is then proved that a log-likelihood ratio test statistic for weak exogeneity in I(2) VAR models is asymptotically χ² distributed. The article also presents an empirical illustration of the proposed test for weak exogeneity using Japan's macroeconomic data. 相似文献

11.

Comparison of Generalized Lambda Distribution (GLD) and Response Modeling Methodology (RMM) as General Platforms for Distribution Fitting

Haim Shore 《统计学通讯:理论与方法》2013,42(15):2805-2819

Distribution fitting is widely practiced in all branches of engineering and applied science, yet only a few studies have examined the relative capability of various parameter-rich families of distributions to represent a wide spectrum of diversely shaped distributions. In this article, two such families of distributions, Generalized Lambda Distribution (GLD) and Response Modeling Methodology (RMM), are compared. For a sample of some commonly used distributions, each family is fitted to each distribution, using two methods: fitting by minimization of the L ₂ norm (minimizing density function distance) and nonlinear regression applied to a sample of exact quantile values (minimizing quantile function distance). The resultant goodness-of-fit is assessed by four criteria: the optimized value of the L ₂ norm, and three additional criteria, relating to quantile function matching. Results show that RMM is uniformly better than GLD. An additional study includes Shore's quantile function (QF) and again RMM is the best performer, followed by Shore's QF and then GLD. 相似文献

12.

On the Breakdown Properties of Some Multivariate M‐Functionals*

LUTZ DÜMBGEN DAVID E. TYLER 《Scandinavian Journal of Statistics》2005,32(2):247-264

Abstract. For probability distributions on ? ^q, a detailed study of the breakdown properties of some multivariate M‐functionals related to Tyler's [Ann. Statist. 15 (1987) 234] ‘distribution‐free’ M‐functional of scatter is given. These include a symmetrized version of Tyler's M‐functional of scatter, and the multivariate t M‐functionals of location and scatter. It is shown that for ‘smooth’ distributions, the (contamination) breakdown point of Tyler's M‐functional of scatter and of its symmetrized version are 1/q and , respectively. For the multivariate t M‐functional which arises from the maximum likelihood estimate for the parameters of an elliptical t distribution on ν ≥ 1 degrees of freedom the breakdown point at smooth distributions is 1/( q + ν). Breakdown points are also obtained for general distributions, including empirical distributions. Finally, the sources of breakdown are investigated. It turns out that breakdown can only be caused by contaminating distributions that are concentrated near low‐dimensional subspaces. 相似文献

13.

On Some Aspects of Unbiased Estimation of Parameters in Quasi-Binomial Distributions

Bikas K. Sinha Sujay K. Mukhoti 《统计学通讯:理论与方法》2013,42(19):3023-3028

In this article, an attempt has been made to settle the question of existence of unbiased estimator of the key parameter p of the quasi-binomial distributions of Type I (QBD I) and of Type II (QBD II), with/without any knowledge of the other parameter φ appearing in the expressions for probability functions of the QBD's. This is studied with reference to a single observation, a random sample of finite size m as also with samples drawn by suitably defined sequential sampling rules. 相似文献

14.

THE GENERALIZED SECANT HYPERBOLIC DISTRIBUTION AND ITS PROPERTIES

《统计学通讯:理论与方法》2013,42(2):219-238

ABSTRACT

The properties of a family of distributions generalizing the secant hyperbolic are developed. This family consists of symmetric distributions, with kurtosis ranging from 1.8 to infinity, and includes the logistic as a special case, the uniform as a limiting case, and closely approximates the normal and Student's t-distributions with corresponding kurtosis. A significant difference between this family and Student's t is that for any member of the generalized secant hyperbolic family, all moments are finite. Further, technical difficulties associated with evaluating moments of Student's t (especially for fractional degrees of freedom) are not present with this family. The properties of the maximum likelihood and modified maximum likelihood estimates of the location and scale parameters for complete samples are considered. Examples illustrate the methods developed in this work. 相似文献

15.

The distance random variable and its applications

Elzbieta Trybus Ginter Trybus 《统计学通讯:理论与方法》2013,42(3):1085-1098

This article presents and discusses the so-called distance random variable D¹ it's distributions and moments. α-density traces are introduced and the results of a simulation study are presented. The main assumption is that the underlying distributions are continuous. 相似文献

16.

Non Uniform Bounds on Geometric Approximation Via Stein's Method and w-Functions

K. Teerapabolarn 《统计学通讯:理论与方法》2013,42(1):145-158

In this article, we use Stein's method and w-functions to give uniform and non uniform bounds in the geometric approximation of a non negative integer-valued random variable. We give some applications of the results of this approximation concerning the beta-geometric, Pólya, and Poisson distributions. 相似文献

17.

A NEW TEST FOR EQUALITY OF VARIANCES FOR k NORMAL POPULATIONS

《统计学通讯:模拟与计算》2013,42(4):567-587

ABSTRACT

A simple test based on Gini's mean difference is proposed to test the hypothesis of equality of population variances. Using 2000 replicated samples and empirical distributions, we show that the test compares favourably with Bartlett's and Levene's test for the normal population. Also, it is more powerful than Bartlett's and Levene's tests for some alternative hypotheses for some non-normal distributions and more robust than the other two tests for large sample sizes under some alternative hypotheses. We also give an approximate distribution to the test statistic to enable one to calculate the nominal levels and P-values. 相似文献

18.

A more powerful unconditional exact test of homogeneity for 2 × c contingency table analysis

Louis Ehwerhemuepha Heng Sok Cyril Rakovski 《Journal of applied statistics》2019,46(14):2572-2582

The classical unconditional exact p-value test can be used to compare two multinomial distributions with small samples. This general hypothesis requires parameter estimation under the null which makes the test severely conservative. Similar property has been observed for Fisher's exact test with Barnard and Boschloo providing distinct adjustments that produce more powerful testing approaches. In this study, we develop a novel adjustment for the conservativeness of the unconditional multinomial exact p-value test that produces nominal type I error rate and increased power in comparison to all alternative approaches. We used a large simulation study to empirically estimate the 5th percentiles of the distributions of the p-values of the exact test over a range of scenarios and implemented a regression model to predict the values for two-sample multinomial settings. Our results show that the new test is uniformly more powerful than Fisher's, Barnard's, and Boschloo's tests with gains in power as large as several hundred percent in certain scenarios. Lastly, we provide a real-life data example where the unadjusted unconditional exact test wrongly fails to reject the null hypothesis and the corrected unconditional exact test rejects the null appropriately. 相似文献

19.

The predictive distributions of thinning‐based count processes

Yang Lu 《Scandinavian Journal of Statistics》2021,48(1):42-67

This paper shows that the term structure of conditional (i.e. predictive) distributions allows for closed form expression in a large family of (possibly higher order or infinite order) thinning‐based count processes such as INAR(p), INARCH(p), NBAR(p), and INGARCH(1,1). Such predictive distributions are currently often deemed intractable by the literature and existing approximation methods are usually time consuming and induce approximation errors. In this paper, we propose a Taylor's expansion algorithm for these predictive distributions, which is both exact and fast. Through extensive simulation exercises, we demonstrate its advantages with respect to existing methods in terms of the computational gain and/or precision. 相似文献

20.

Asymptotic Theory for the QMLE in GARCH-X Models With Stationary and Nonstationary Covariates

Heejoon Han Dennis Kristensen 《商业与经济统计学杂志》2014,32(3):416-429

This article investigates the asymptotic properties of the Gaussian quasi-maximum-likelihood estimators (QMLE’s) of the GARCH model augmented by including an additional explanatory variable—the so-called GARCH-X model. The additional covariate is allowed to exhibit any degree of persistence as captured by its long-memory parameter d_x; in particular, we allow for both stationary and nonstationary covariates. We show that the QMLE’s of the parameters entering the volatility equation are consistent and mixed-normally distributed in large samples. The convergence rates and limiting distributions of the QMLE’s depend on whether the regressor is stationary or not. However, standard inferential tools for the parameters are robust to the level of persistence of the regressor with t-statistics following standard Normal distributions in large sample irrespective of whether the regressor is stationary or not. Supplementary materials for this article are available online. 相似文献