期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Imputation procedures for categorical data: their effects on the goodness-of-fit chi-square statistic

Phyllis A. Gimotty Morton B. Brown 《统计学通讯:模拟与计算》2013,42(2):681-703

An imputation procedure is a procedure by which each missing value in a data set is replaced (imputed) by an observed value using a predetermined resampling procedure. The distribution of a statistic computed from a data set consisting of observed and imputed values, called a completed data set, is affecwd by the imputation procedure used. In a Monte Carlo experiment, three imputation procedures are compared with respect to the empirical behavior of the goodness-of- fit chi-square statistic computed from a completed data set. The results show that each imputation procedure affects the distribution of the goodness-of-fit chi-square statistic in 3. different manner. However, when the empirical behavior of the goodness-of-fit chi-square statistic is compared u, its appropriate asymptotic distribution, there are no substantial differences between these imputation procedures. 相似文献

2.

Exact and approximate distributions of the chi-square statistic for equiprobability

Paul J. Smith Donald S. Rae Ronald W. Manderscheid Sam Silbergeld 《统计学通讯:模拟与计算》2013,42(2):131-149

The distribution of the chi-square goodness-of-fit statistic is studied in the equiprobable case. Tables of exact critical values are given for a = .1, .05, .01, .005; k = 2(1)4, N = 26(1)50; k = 5, N = 26(1)40; k = 6(1)10, N = 26(1)30, where a is the desired significance level, k is the number of cells and N is the sample size. Methods of fitting the true distribution are compared. If k> 3, it is found that a simple additive adjustment to the asymptotic chi-square fit leads to high accuracy even for N between 10 and 20. For k = 2, the Yates corrected chi-square statistic is very accurately fitted by the usual chi-square distribution. 相似文献

3.

Approximations to the chi-square distribution

《Journal of Statistical Computation and Simulation》2012,82(4):267-277

The chi-square distribution arises frequently in applied statistics.Associated with the chi-square random variable with v degrees of freedom are two interdependent variables: the probability integral and the percentage point.Given one of these variables,the other can be obtained from chi-square tables for selected values.In order to overcome the inconvenience of statistical tables and interpolation,many approximations have been suggested.The computational difficulty and accuracy of various approximations is compared. 相似文献

4.

Biplots of compositional data 总被引：6，自引：0，他引：6

John Aitchison Michael Greenacre 《Journal of the Royal Statistical Society. Series C, Applied statistics》2002,51(4):375-392

Summary. The singular value decomposition and its interpretation as a linear biplot have proved to be a powerful tool for analysing many forms of multivariate data. Here we adapt biplot methodology to the specific case of compositional data consisting of positive vectors each of which is constrained to have unit sum. These relative variation biplots have properties relating to the special features of compositional data: the study of ratios, subcompositions and models of compositional relationships. The methodology is applied to a data set consisting of six-part colour compositions in 22 abstract paintings, showing how the singular value decomposition can achieve an accurate biplot of the colour ratios and how possible models interrelating the colours can be diagnosed. 相似文献

5.

Asymptotic expansions of the distributions of the chi-square statistic based on the asymptotically distribution-free theory in covariance structures

Haruhiko Ogasawara 《Journal of statistical planning and inference》2009

An asymptotic expansion of the null distribution of the chi-square statistic based on the asymptotically distribution-free theory for general covariance structures is derived under non-normality. The added higher-order term in the approximate density is given by a weighted sum of those of the chi-square distributed variables with different degrees of freedom. A formula for the corresponding Bartlett correction is also shown without using the above asymptotic expansion. Under a fixed alternative hypothesis, the Edgeworth expansion of the distribution of the standardized chi-square statistic is given up to order O(1/n). From the intermediate results of the asymptotic expansions for the chi-square statistics, asymptotic expansions of the joint distributions of the parameter estimators both under the null and fixed alternative hypotheses are derived up to order O(1/n). 相似文献

6.

Fitting Kent models to compositional data with small concentration

J. L. Scealy A. H. Welsh 《Statistics and Computing》2014,24(2):165-179

Compositional data can be transformed to directional data by the square root transformation and then modelled by using the Kent distribution. The current approach for estimating the parameters in the Kent model for compositional data relies on a large concentration assumption which assumes that the majority of the transformed data is not distributed too close to the boundaries of the positive orthant. When the data is distributed close to the boundaries with large variance significant folding may result. To treat this case we propose new estimators of the parameters derived based on the actual folded Kent distribution which are obtained via the EM algorithm. We show that these new estimators significantly reduce the bias in the current estimators when both the sample size and amount of folding is moderately large. We also propose using a saddlepoint density approximation for the Kent distribution normalising constant in order to more accurately estimate the shape parameters when the concentration is small or only moderately large. 相似文献

7.

Partitioning Anderson's statistic for tied data

《Journal of statistical planning and inference》2004,121(1):93-111

Anderson (Biometrics 15 (1959) 582) proposed a χ²-type statistic for the nonparametric analysis of a randomized blocks design with no ties in the data. In this paper, we propose an Anderson statistic that allows for ties in the data. We show that the asymptotic distribution of the statistic under the null hypothesis of no treatment effect is a χ² distribution. Under weak assumptions on the tie structure it is shown that the degrees of freedom for the asymptotic distribution is unchanged compared to the untied case. An extended analysis based on a partition of the statistic into independent components is suggested. The first component is shown to equal the Friedman rank statistic corrected for ties. The subsequent components allow for the detection of dispersion effects, higher order effects and differences in distribution. A simulation study is given and the new analysis is applied to a sensory evaluation data set. 相似文献

8.

Incremental modelling for compositional data streams

Yuan Wei Huiwen Wang Gilbert Saporta 《统计学通讯:模拟与计算》2013,42(8):2229-2243

ABSTRACT

Incremental modelling of data streams is of great practical importance, as shown by its applications in advertising and financial data analysis. We propose two incremental covariance matrix decomposition methods for a compositional data type. The first method, exact incremental covariance decomposition of compositional data (C-EICD), gives an exact decomposition result. The second method, covariance-free incremental covariance decomposition of compositional data (C-CICD), is an approximate algorithm that can efficiently compute high-dimensional cases. Based on these two methods, many frequently used compositional statistical models can be incrementally calculated. We take multiple linear regression and principle component analysis as examples to illustrate the utility of the proposed methods via extensive simulation studies. 相似文献

9.

Bayesian modelling of spatial compositional data 总被引：1，自引：0，他引：1

H kon Tjelmeland Kjetill Vassmo Lund 《Journal of applied statistics》2003,30(1):87-100

Compositional data are vectors of proportions, specifying fractions of a whole. Aitchison (1986) defines logistic normal distributions for compositional data by applying a logistic transformation and assuming the transformed data to be multi- normal distributed. In this paper we generalize this idea to spatially varying logistic data and thereby define logistic Gaussian fields. We consider the model in a Bayesian framework and discuss appropriate prior distributions. We consider both complete observations and observations of subcompositions or individual proportions, and discuss the resulting posterior distributions. In general, the posterior cannot be analytically handled, but the Gaussian base of the model allows us to define efficient Markov chain Monte Carlo algorithms. We use the model to analyse a data set of sediments in an Arctic lake. These data have previously been considered, but then without taking the spatial aspect into account. 相似文献

10.

A mean deviation-based approach to monitor process variability

《Journal of Statistical Computation and Simulation》2012,82(10):1173-1193

The study proposes a Shewhart-type control chart, namely an MD chart, based on average absolute deviations taken from the median, for monitoring changes (especially moderate and large changes – a major concern of Shewhart control charts) in process dispersion assuming normality of the quality characteristic to be monitored. The design structure of the proposed MD chart is developed and its comparison is made with those of two well-known dispersion control charts, namely the R and S charts. Using power curves as a performance measure, it has been observed that the design structure of the proposed MD chart is more powerful than that of the R chart and is very close competitor to that of the S chart, in terms of discriminatory power for detecting shifts in the process dispersion. The non-normality effect is also examined on design structures of the three charts, and it has been observed that the design structure of the proposed MD chart is least affected by departure from normality. 相似文献

11.

A modified chi-square test for Bertholon model with censored data

Sana Chouia Nacira Seddik-Ameur 《统计学通讯:模拟与计算》2017,46(1):593-602

In this work, we propose the construction of a chi-squared goodness-of-fit test in censored data case, for Bertholon model which can analyse various competing risks of failure or death. This test is based on a modification of the Nikulin-Rao-Robson (NRR) statistic proposed by Bagdonavicius and Nikulin (2011a Bagdonavicius, V., Nikulin, M. (2011a). Chi-squared tests for general composite hypotheses from censored samples. Comptes Rendus Mathématiques: Series I 349(3–4):219–223. [Google Scholar], 2011b Bagdonavicius, V., Nikulin, M. (2011b). Chi-squared goodness-of-fit test for right censored data. International Journal of Applied Mathematics and Statistics 24:30–50. [Google Scholar]) for censored data. We applied this test to numerical examples from simulated samples and real data. 相似文献

12.

Distribution of the C statistic with applications to the sample mean of Poisson data

Massimiliano Bonamente 《Journal of applied statistics》2020,47(11):2044

The

C

statistic, also known as the Cash statistic, is often used in astronomy for the analysis of low-count Poisson data. The main advantage of this statistic, compared to the more commonly used

χ^{2}

statistic, is its applicability without the need to combine data points. This feature has made the

C

statistic a very useful method to analyze Poisson data that have small (or even null) counts in each resolution element. One of the challenges of the

C

statistic is that its probability distribution, under the null hypothesis that the data follow a parent model, is not known exactly. This paper presents an effort towards improving our understanding of the

C

statistic by studying (a) the distribution of

C

statistic for a fully specified model, (b) the distribution of C_min resulting from a maximum-likelihood fit to a simple one-parameter constant model, i.e. a model that represents the sample mean of N Poisson measurements, and (c) the distribution of the associated

Δ C

statistic that is used for parameter estimation. The results confirm the expectation that, in the high-count limit, both

C

statistic and C_min have the same mean and variance as a

χ^{2}

statistic with same number of degrees of freedom. It is also found that, in the low-count regime, the expectation of the

C

statistic and C_min can be substantially lower than for a

χ^{2}

distribution. The paper makes use of recent X-ray observations of the astronomical source PG 1116+215 to illustrate the application of the

C

statistic to Poisson data. 相似文献

13.

A robust Parafac model for compositional data

M. A. Di Palma P. Filzmoser M. Gallo K. Hron 《Journal of applied statistics》2018,45(8):1347-1369

Compositional data are characterized by values containing relative information, and thus the ratios between the data values are of interest for the analysis. Due to specific features of compositional data, standard statistical methods should be applied to compositions expressed in a proper coordinate system with respect to an orthonormal basis. It is discussed how three-way compositional data can be analyzed with the Parafac model. When data are contaminated by outliers, robust estimates for the Parafac model parameters should be employed. It is demonstrated how robust estimation can be done in the context of compositional data and how the results can be interpreted. A real data example from macroeconomics underlines the usefulness of this approach. 相似文献

14.

A folded model for compositional data analysis

Michail Tsagris Connie Stewart 《Australian & New Zealand Journal of Statistics》2020,62(2):249-277

A folded type model is developed for analysing compositional data. The proposed model involves an extension of the α‐transformation for compositional data and provides a new and flexible class of distributions for modelling data defined on the simplex sample space. Despite its rather seemingly complex structure, employment of the EM algorithm guarantees efficient parameter estimation. The model is validated through simulation studies and examples which illustrate that the proposed model performs better in terms of capturing the data structure, when compared to the popular logistic normal distribution, and can be advantageous over a similar model without folding. 相似文献

15.

A note on score statistic for grouped data

Hyo-Il Park 《Journal of the Korean Statistical Society》2009,38(4):331-337

In this paper, we consider modifying the score statistic proposed by Prentice and Gloeckler [Prentice, R. L., & Gloeckler, L. A. (1978). Regression analysis of grouped data with applications to breast cancer data. Biometrics, 34, 57–67] for the grouped data under the proportional hazards model. For this matter, we apply the likelihood method and derive the scores without re-parameterization as a discrete model. Then we illustrate the test with an example and compare the efficiency with the test of Prentice and Gloeckler’s statistic by obtaining empirical powers through simulation study. Also we discuss some possible extension and estimated variances of the score statistic as concluding remarks. 相似文献

16.

On projection-based tests for directional and compositional data

Juan A. Cuesta-Albertos Antonio Cuevas Ricardo Fraiman 《Statistics and Computing》2009,19(4):367-380

A new class of nonparametric tests, based on random projections, is proposed. They can be used for several null hypotheses of practical interest, including uniformity for spherical (directional) and compositional data, sphericity of the underlying distribution and homogeneity in two-sample problems on the sphere or the simplex. The proposed procedures have a number of advantages, mostly associated with their flexibility (for example, they also work to test “partial uniformity” in a subset of the sphere), computational simplicity and ease of application even in high-dimensional cases. 相似文献

17.

Perfect aggregation of Bayesian analysis on compositional data

Tzu-Tsung Wong 《Statistical Papers》2007,48(2):265-282

Sufficiency is a widely used concept for reducing the dimensionality of a data set. Collecting data for a sufficient statistic is generally much easier and less expensive than collecting all of the available data. When the posterior distributions of a quantity of interest given the aggregate and disaggregate data are identical, perfect aggregation is said to hold, and in this case the aggregate data is a sufficient statistic for the quantity of interest. In this paper, the conditions for perfect aggregation are shown to depend on the functional form of the prior distribution. When the quantity of interest is the sum of some parameters in a vector having either a generalized Dirichlet or a Liouville distribution for analyzing compositional data, necessary and sufficient conditions for perfect aggregation are also established. 相似文献

18.

A semi-analytical solution to the maximum-likelihood fit of Poisson data to a linear model using the Cash statistic

Massimiliano Bonamente David Spence 《Journal of applied statistics》2022,49(3):522

The Cash statistic, also known as the

C

statistic, is commonly used for the analysis of low-count Poisson data, including data with null counts for certain values of the independent variable. The use of this statistic is especially attractive for low-count data that cannot be combined, or re-binned, without loss of resolution. This paper presents a new maximum-likelihood solution for the best-fit parameters of a linear model using the Poisson-based Cash statistic. The solution presented in this paper provides a new and simple method to measure the best-fit parameters of a linear model for any Poisson-based data, including data with null counts. In particular, the method enforces the requirement that the best-fit linear model be non-negative throughout the support of the independent variable. The method is summarized in a simple algorithm to fit Poisson counting data of any size and counting rate with a linear model, by-passing entirely the use of the traditional

χ^{2}

statistic. 相似文献

19.

The asymptotic distribution of the one-sided kolmogorov-smirnov statistic for truncated data

H.M. Schey 《统计学通讯:理论与方法》2013,42(14):1361-1366

The asymptotic distribution function for the one-sided Kolmogorov-Smimov statistic is derived in the case of truncated data. A comparison is made of the one-sided percentage points and the two-sided percentage points of Koziol and Byar. 相似文献

20.

A random-sum Wilcoxon statistic and its application to analysis of ROC and LROC data

Tang LL Balakrishnan N 《Journal of statistical planning and inference》2011,141(1):335-344

The Wilcoxon-Mann-Whitney statistic is commonly used for a distribution-free comparison of two groups. One requirement for its use is that the sample sizes of the two groups are fixed. This is violated in some of the applications such as medical imaging studies and diagnostic marker studies; in the former, the violation occurs since the number of correctly localized abnormal images is random, while in the latter the violation is due to some subjects not having observable measurements. For this reason, we propose here a random-sum Wilcoxon statistic for comparing two groups in the presence of ties, and derive its variance as well as its asymptotic distribution for large sample sizes. The proposed statistic includes the regular Wilcoxon rank-sum statistic. Finally, we apply the proposed statistic for summarizing location response operating characteristic data from a liver computed tomography study, and also for summarizing diagnostic accuracy of biomarker data. 相似文献