首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
An imputation procedure is a procedure by which each missing value in a data set is replaced (imputed) by an observed value using a predetermined resampling procedure. The distribution of a statistic computed from a data set consisting of observed and imputed values, called a completed data set, is affecwd by the imputation procedure used. In a Monte Carlo experiment, three imputation procedures are compared with respect to the empirical behavior of the goodness-of- fit chi-square statistic computed from a completed data set. The results show that each imputation procedure affects the distribution of the goodness-of-fit chi-square statistic in 3. different manner. However, when the empirical behavior of the goodness-of-fit chi-square statistic is compared u, its appropriate asymptotic distribution, there are no substantial differences between these imputation procedures.  相似文献   

2.
The distribution of the chi-square goodness-of-fit statistic is studied in the equiprobable case. Tables of exact critical values are given for a = .1, .05, .01, .005; k = 2(1)4, N = 26(1)50; k = 5, N = 26(1)40; k = 6(1)10, N = 26(1)30, where a is the desired significance level, k is the number of cells and N is the sample size. Methods of fitting the true distribution are compared. If k> 3, it is found that a simple additive adjustment to the asymptotic chi-square fit leads to high accuracy even for N between 10 and 20. For k = 2, the Yates corrected chi-square statistic is very accurately fitted by the usual chi-square distribution.  相似文献   

3.
The chi-square distribution arises frequently in applied statistics.Associated with the chi-square random variable with v degrees of freedom are two interdependent variables: the probability integral and the percentage point.Given one of these variables,the other can be obtained from chi-square tables for selected values.In order to overcome the inconvenience of statistical tables and interpolation,many approximations have been suggested.The computational difficulty and accuracy of various approximations is compared.  相似文献   

4.
Biplots of compositional data   总被引:6,自引:0,他引:6  
Summary. The singular value decomposition and its interpretation as a linear biplot have proved to be a powerful tool for analysing many forms of multivariate data. Here we adapt biplot methodology to the specific case of compositional data consisting of positive vectors each of which is constrained to have unit sum. These relative variation biplots have properties relating to the special features of compositional data: the study of ratios, subcompositions and models of compositional relationships. The methodology is applied to a data set consisting of six-part colour compositions in 22 abstract paintings, showing how the singular value decomposition can achieve an accurate biplot of the colour ratios and how possible models interrelating the colours can be diagnosed.  相似文献   

5.
An asymptotic expansion of the null distribution of the chi-square statistic based on the asymptotically distribution-free theory for general covariance structures is derived under non-normality. The added higher-order term in the approximate density is given by a weighted sum of those of the chi-square distributed variables with different degrees of freedom. A formula for the corresponding Bartlett correction is also shown without using the above asymptotic expansion. Under a fixed alternative hypothesis, the Edgeworth expansion of the distribution of the standardized chi-square statistic is given up to order O(1/n). From the intermediate results of the asymptotic expansions for the chi-square statistics, asymptotic expansions of the joint distributions of the parameter estimators both under the null and fixed alternative hypotheses are derived up to order O(1/n).  相似文献   

6.
Compositional data can be transformed to directional data by the square root transformation and then modelled by using the Kent distribution. The current approach for estimating the parameters in the Kent model for compositional data relies on a large concentration assumption which assumes that the majority of the transformed data is not distributed too close to the boundaries of the positive orthant. When the data is distributed close to the boundaries with large variance significant folding may result. To treat this case we propose new estimators of the parameters derived based on the actual folded Kent distribution which are obtained via the EM algorithm. We show that these new estimators significantly reduce the bias in the current estimators when both the sample size and amount of folding is moderately large. We also propose using a saddlepoint density approximation for the Kent distribution normalising constant in order to more accurately estimate the shape parameters when the concentration is small or only moderately large.  相似文献   

7.
Anderson (Biometrics 15 (1959) 582) proposed a χ2-type statistic for the nonparametric analysis of a randomized blocks design with no ties in the data. In this paper, we propose an Anderson statistic that allows for ties in the data. We show that the asymptotic distribution of the statistic under the null hypothesis of no treatment effect is a χ2 distribution. Under weak assumptions on the tie structure it is shown that the degrees of freedom for the asymptotic distribution is unchanged compared to the untied case. An extended analysis based on a partition of the statistic into independent components is suggested. The first component is shown to equal the Friedman rank statistic corrected for ties. The subsequent components allow for the detection of dispersion effects, higher order effects and differences in distribution. A simulation study is given and the new analysis is applied to a sensory evaluation data set.  相似文献   

8.
ABSTRACT

Incremental modelling of data streams is of great practical importance, as shown by its applications in advertising and financial data analysis. We propose two incremental covariance matrix decomposition methods for a compositional data type. The first method, exact incremental covariance decomposition of compositional data (C-EICD), gives an exact decomposition result. The second method, covariance-free incremental covariance decomposition of compositional data (C-CICD), is an approximate algorithm that can efficiently compute high-dimensional cases. Based on these two methods, many frequently used compositional statistical models can be incrementally calculated. We take multiple linear regression and principle component analysis as examples to illustrate the utility of the proposed methods via extensive simulation studies.  相似文献   

9.
Bayesian modelling of spatial compositional data   总被引:1,自引:0,他引:1  
Compositional data are vectors of proportions, specifying fractions of a whole. Aitchison (1986) defines logistic normal distributions for compositional data by applying a logistic transformation and assuming the transformed data to be multi- normal distributed. In this paper we generalize this idea to spatially varying logistic data and thereby define logistic Gaussian fields. We consider the model in a Bayesian framework and discuss appropriate prior distributions. We consider both complete observations and observations of subcompositions or individual proportions, and discuss the resulting posterior distributions. In general, the posterior cannot be analytically handled, but the Gaussian base of the model allows us to define efficient Markov chain Monte Carlo algorithms. We use the model to analyse a data set of sediments in an Arctic lake. These data have previously been considered, but then without taking the spatial aspect into account.  相似文献   

10.
The study proposes a Shewhart-type control chart, namely an MD chart, based on average absolute deviations taken from the median, for monitoring changes (especially moderate and large changes – a major concern of Shewhart control charts) in process dispersion assuming normality of the quality characteristic to be monitored. The design structure of the proposed MD chart is developed and its comparison is made with those of two well-known dispersion control charts, namely the R and S charts. Using power curves as a performance measure, it has been observed that the design structure of the proposed MD chart is more powerful than that of the R chart and is very close competitor to that of the S chart, in terms of discriminatory power for detecting shifts in the process dispersion. The non-normality effect is also examined on design structures of the three charts, and it has been observed that the design structure of the proposed MD chart is least affected by departure from normality.  相似文献   

11.
In this work, we propose the construction of a chi-squared goodness-of-fit test in censored data case, for Bertholon model which can analyse various competing risks of failure or death. This test is based on a modification of the Nikulin-Rao-Robson (NRR) statistic proposed by Bagdonavicius and Nikulin (2011a Bagdonavicius, V., Nikulin, M. (2011a). Chi-squared tests for general composite hypotheses from censored samples. Comptes Rendus Mathématiques: Series I 349(3–4):219223. [Google Scholar], 2011b Bagdonavicius, V., Nikulin, M. (2011b). Chi-squared goodness-of-fit test for right censored data. International Journal of Applied Mathematics and Statistics 24:3050. [Google Scholar]) for censored data. We applied this test to numerical examples from simulated samples and real data.  相似文献   

12.
The C statistic, also known as the Cash statistic, is often used in astronomy for the analysis of low-count Poisson data. The main advantage of this statistic, compared to the more commonly used χ2 statistic, is its applicability without the need to combine data points. This feature has made the C statistic a very useful method to analyze Poisson data that have small (or even null) counts in each resolution element. One of the challenges of the C statistic is that its probability distribution, under the null hypothesis that the data follow a parent model, is not known exactly. This paper presents an effort towards improving our understanding of the C statistic by studying (a) the distribution of C statistic for a fully specified model, (b) the distribution of Cmin resulting from a maximum-likelihood fit to a simple one-parameter constant model, i.e. a model that represents the sample mean of N Poisson measurements, and (c) the distribution of the associated ΔC statistic that is used for parameter estimation. The results confirm the expectation that, in the high-count limit, both C statistic and Cmin have the same mean and variance as a χ2 statistic with same number of degrees of freedom. It is also found that, in the low-count regime, the expectation of the C statistic and Cmin can be substantially lower than for a χ2 distribution. The paper makes use of recent X-ray observations of the astronomical source PG 1116+215 to illustrate the application of the C statistic to Poisson data.  相似文献   

13.
Compositional data are characterized by values containing relative information, and thus the ratios between the data values are of interest for the analysis. Due to specific features of compositional data, standard statistical methods should be applied to compositions expressed in a proper coordinate system with respect to an orthonormal basis. It is discussed how three-way compositional data can be analyzed with the Parafac model. When data are contaminated by outliers, robust estimates for the Parafac model parameters should be employed. It is demonstrated how robust estimation can be done in the context of compositional data and how the results can be interpreted. A real data example from macroeconomics underlines the usefulness of this approach.  相似文献   

14.
A folded type model is developed for analysing compositional data. The proposed model involves an extension of the α‐transformation for compositional data and provides a new and flexible class of distributions for modelling data defined on the simplex sample space. Despite its rather seemingly complex structure, employment of the EM algorithm guarantees efficient parameter estimation. The model is validated through simulation studies and examples which illustrate that the proposed model performs better in terms of capturing the data structure, when compared to the popular logistic normal distribution, and can be advantageous over a similar model without folding.  相似文献   

15.
In this paper, we consider modifying the score statistic proposed by Prentice and Gloeckler [Prentice, R. L., & Gloeckler, L. A. (1978). Regression analysis of grouped data with applications to breast cancer data. Biometrics, 34, 57–67] for the grouped data under the proportional hazards model. For this matter, we apply the likelihood method and derive the scores without re-parameterization as a discrete model. Then we illustrate the test with an example and compare the efficiency with the test of Prentice and Gloeckler’s statistic by obtaining empirical powers through simulation study. Also we discuss some possible extension and estimated variances of the score statistic as concluding remarks.  相似文献   

16.
A new class of nonparametric tests, based on random projections, is proposed. They can be used for several null hypotheses of practical interest, including uniformity for spherical (directional) and compositional data, sphericity of the underlying distribution and homogeneity in two-sample problems on the sphere or the simplex. The proposed procedures have a number of advantages, mostly associated with their flexibility (for example, they also work to test “partial uniformity” in a subset of the sphere), computational simplicity and ease of application even in high-dimensional cases.  相似文献   

17.
Sufficiency is a widely used concept for reducing the dimensionality of a data set. Collecting data for a sufficient statistic is generally much easier and less expensive than collecting all of the available data. When the posterior distributions of a quantity of interest given the aggregate and disaggregate data are identical, perfect aggregation is said to hold, and in this case the aggregate data is a sufficient statistic for the quantity of interest. In this paper, the conditions for perfect aggregation are shown to depend on the functional form of the prior distribution. When the quantity of interest is the sum of some parameters in a vector having either a generalized Dirichlet or a Liouville distribution for analyzing compositional data, necessary and sufficient conditions for perfect aggregation are also established.  相似文献   

18.
The Cash statistic, also known as the C statistic, is commonly used for the analysis of low-count Poisson data, including data with null counts for certain values of the independent variable. The use of this statistic is especially attractive for low-count data that cannot be combined, or re-binned, without loss of resolution. This paper presents a new maximum-likelihood solution for the best-fit parameters of a linear model using the Poisson-based Cash statistic. The solution presented in this paper provides a new and simple method to measure the best-fit parameters of a linear model for any Poisson-based data, including data with null counts. In particular, the method enforces the requirement that the best-fit linear model be non-negative throughout the support of the independent variable. The method is summarized in a simple algorithm to fit Poisson counting data of any size and counting rate with a linear model, by-passing entirely the use of the traditional χ2 statistic.  相似文献   

19.
The asymptotic distribution function for the one-sided Kolmogorov-Smimov statistic is derived in the case of truncated data. A comparison is made of the one-sided percentage points and the two-sided percentage points of Koziol and Byar.  相似文献   

20.
The Wilcoxon-Mann-Whitney statistic is commonly used for a distribution-free comparison of two groups. One requirement for its use is that the sample sizes of the two groups are fixed. This is violated in some of the applications such as medical imaging studies and diagnostic marker studies; in the former, the violation occurs since the number of correctly localized abnormal images is random, while in the latter the violation is due to some subjects not having observable measurements. For this reason, we propose here a random-sum Wilcoxon statistic for comparing two groups in the presence of ties, and derive its variance as well as its asymptotic distribution for large sample sizes. The proposed statistic includes the regular Wilcoxon rank-sum statistic. Finally, we apply the proposed statistic for summarizing location response operating characteristic data from a liver computed tomography study, and also for summarizing diagnostic accuracy of biomarker data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号