首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 718 毫秒
1.
Using rounded data to estimate moments and regression coefficients typically biases the estimates. We explore the bias-inducing effects of rounding, thereby reviewing widely dispersed and often half forgotten results in the literature. Under appropriate conditions, these effects can be approximately rectified by versions of Sheppard’s correction formula. We discuss the conditions under which these approximations are valid and also investigate the efficiency loss caused by rounding. The rounding error, which corresponds to the measurement error of a measurement error model, has a marginal distribution, which can be approximated by the uniform distribution, but is not independent of the true value. In order to take account of rounding preferences (heaping), we generalize the concept of simple rounding to that of asymmetric rounding and consider its effect on the mean and variance of a distribution.  相似文献   

2.
This paper looks at the effects of rounding data sampled from the exponential distribution. It examines the nature of the rounded distribution, together with the resulting error distribution. The influence of these distributions on estimates and tests of hypothesis is investigated. The results indicate that even a moderate degree of rounding can cause the bias in an estimator to increase, whereas in hypothesis tests level of significance is altered.  相似文献   

3.
The aim of this paper is to investigate the robustness properties of likelihood inference with respect to rounding effects. Attention is focused on exponential families and on inference about a scalar parameter of interest, also in the presence of nuisance parameters. A summary value of the influence function of a given statistic, the local-shift sensitivity, is considered. It accounts for small fluctuations in the observations. The main result is that the local-shift sensitivity is bounded for the usual likelihood-based statistics, i.e. the directed likelihood, the Wald and score statistics. It is also bounded for the modified directed likelihood, which is a higher-order adjustment of the directed likelihood. The practical implication is that likelihood inference is expected to be robust with respect to rounding effects. Theoretical analysis is supplemented and confirmed by a number of Monte Carlo studies, performed to assess the coverage probabilities of confidence intervals based on likelihood procedures when data are rounded. In addition, simulations indicate that the directed likelihood is less sensitive to rounding effects than the Wald and score statistics. This provides another criterion for choosing among first-order equivalent likelihood procedures. The modified directed likelihood shows the same robustness as the directed likelihood, so that its gain in inferential accuracy does not come at the price of an increase in instability with respect to rounding.  相似文献   

4.
Self-reported income information particularly suffers from an intentional coarsening of the data, which is called heaping or rounding. If it does not occur completely at random – which is usually the case – heaping and rounding have detrimental effects on the results of statistical analysis. Conventional statistical methods do not consider this kind of reporting bias, and thus might produce invalid inference. We describe a novel statistical modeling approach that allows us to deal with self-reported heaped income data in an adequate and flexible way. We suggest modeling heaping mechanisms and the true underlying model in combination. To describe the true net income distribution, we use the zero-inflated log-normal distribution. Heaping points are identified from the data by applying a heuristic procedure comparing a hypothetical income distribution and the empirical one. To determine heaping behavior, we employ two distinct models: either we assume piecewise constant heaping probabilities, or heaping probabilities are considered to increase steadily with proximity to a heaping point. We validate our approach by some examples. To illustrate the capacity of the proposed method, we conduct a case study using income data from the German National Educational Panel Study.  相似文献   

5.
When rounded data are used in place of the true values to compute the variance of a variable or a regression line, the results will be distorted. Under suitable smoothness conditions on the distribution of the variable(s) involved, this bias, however, can be corrected with very high precision by using the well-known Sheppard’s correction. In this paper, Sheppard’s correction is generalized to cover more general forms of rounding procedures than just simple rounding, viz., probabilistic rounding, which includes asymmetric rounding and mixture rounding.  相似文献   

6.
In a previously published study, the effects of rounding on the significance and power of four test statistics were considered when the parent population was normal. Here we investigate how these tests will perform for rounded non-normal data. Guidelines are given on how the degree of precision recommended for normal populations can be applied when the population is non-normal.  相似文献   

7.
This paper is concerned with how standard estimation procedures perform in terms of eficiency for non-normal rounded data. Previous research has shown that the loss in eficiency due to rounding normal data is small. However, evidence from the non-normal distribution considered in this paper suggests, if rounding is coarse or the distribution is very skewed the loss in efficiency due to rounding can be considerable.  相似文献   

8.
When analyzing data on subjective expectations of continuous outcomes, researchers have access to a limited number of reported probabilities for each respondent from which to construct complete distribution functions. Moreover, reported probabilities may be rounded and thus not equal to true beliefs. Using survival expectations elicited from a representative sample from the Netherlands, we investigate what can be learned if we take these two sources of missing information into account and expectations are therefore only partially identified. We find novel evidence for rounding by checking whether reported expectations are consistent with a hazard of death that increases weakly with age. Only 39% of reported beliefs are consistent with this under the assumption that all probabilities are reported precisely, while 92% are if we allow for rounding. Using the available information to construct bounds on subjective life expectancy, we show that the data alone are not sufficiently informative to allow for useful inference in partially identified linear models, even in the absence of rounding. We propose to improve precision by interpolation between rounded probabilities. Interpolation in combination with a limited amount of rounding does yield informative intervals.  相似文献   

9.
Multiple imputation has emerged as a widely used model-based approach in dealing with incomplete data in many application areas. Gaussian and log-linear imputation models are fairly straightforward to implement for continuous and discrete data, respectively. However, in missing data settings which include a mix of continuous and discrete variables, correct specification of the imputation model could be a daunting task owing to the lack of flexible models for the joint distribution of variables of different nature. This complication, along with accessibility to software packages that are capable of carrying out multiple imputation under the assumption of joint multivariate normality, appears to encourage applied researchers for pragmatically treating the discrete variables as continuous for imputation purposes, and subsequently rounding the imputed values to the nearest observed category. In this article, I introduce a distance-based rounding approach for ordinal variables in the presence of continuous ones. The first step of the proposed rounding process is predicated upon creating indicator variables that correspond to the ordinal levels, followed by jointly imputing all variables under the assumption of multivariate normality. The imputed values are then converted to the ordinal scale based on their Euclidean distances to a set of indicators, with minimal distance corresponding to the closest match. I compare the performance of this technique to crude rounding via commonly accepted accuracy and precision measures with simulated data sets.  相似文献   

10.
Summary: Responses to income questions in surveys are often rounded by the respondents. Though this is widely ignored, rounding can have detrimental effects on the results of a statistical analysis, especially with respect to the consistency of estimates. This paper deals with the analysis of data from the Finnish sub–sample of the European Community Household Panel (ECHP) with respect to factors that influence rounding of personal gross wage and earnings. The finding is that the propensity to observe rounded values can be related to factors like the interview mode, the wage level, and personal characteristics like gender and job type.*Work financed by the European Commission under contract number IST–1999–11101.  相似文献   

11.
In previous literature the effects of rounding on fixed sample hypothesis tests have been considered. However sequential tests have received little attention. In this paper the robustness of these tests under rounding is considered. The results indicate that in terms of significance level and power the sequential tests were less affected by rounding than the fixed sample tests.  相似文献   

12.
A basic assumption of many statistical tests is that the samples are drawn from normal populations. However, in practice the data are often subject to rounding. Here we consider the effect of rounded normal data on the significance level of four test statistics. Guidance is given on what is an appropriate degree of precision when using these tests on normal rounded data. The results indicate that less precision is required of the recorded data than that which is usually given.  相似文献   

13.
14.
In a previous study, the effect of rounding on classical statistical techniques was considered. Here, we consider how rounded data may affect the posterior distribution and, thus, any Bayesian inferences made. The results in this paper indicate that Bayesian inferences can be sensitive to the roundingprocess.  相似文献   

15.
Abstract

Quetelet’s data on Scottish chest girths are analyzed with eight normality tests. In contrast to Quetelet’s conclusion that the data are fit well by what is now known as the normal distribution, six of eight normality tests provide strong evidence that the chest circumferences are not normally distributed. Using corrected chest circumferences from Stigler, the χ2 test no longer provides strong evidence against normality, but five commonly used normality tests do. The D’Agostino–Pearson K2 and Jarque–Bera tests, based only on skewness and kurtosis, find that both Quetelet’s original data and the Stigler-corrected data are consistent with the hypothesis of normality. The major reason causing most normality tests to produce low p-values, indicating that Quetelet’s data are not normally distributed, is that the chest circumferences were reported in whole inches and rounding of large numbers of observations can produce many tied values that strongly affect most normality tests. Users should be cautious using many standard normality tests if data have ties, are rounded, and the ratio of the standard deviation to rounding interval is small.  相似文献   

16.
Abstract

In the area of goodness-of-fit there is a clear distinction between the problem of testing the fit of a continuous distribution and that of testing a discrete distribution. In all continuous problems the data is recorded with a limited number of decimals, so in theory one could say that the problem is always of a discrete nature, but it is a common practice to ignore discretization and proceed as if the data is continuous. It is therefore an interesting question whether in a given problem of test of fit, the “limited resolution” in the observed recorded values may be or may be not of concern, if the analysis done ignores this implied discretization. In this article, we address the problem of testing the fit of a continuous distribution with data recorded with a limited resolution. A measure for the degree of discretization is proposed which involves the size of the rounding interval, the dispersion in the underlying distribution and the sample size. This measure is shown to be a key characteristic which allows comparison, in different problems, of the amount of discretization involved. Some asymptotic results are given for the distribution of the EDF (empirical distribution function) statistics that explicitly depend on the above mentioned measure of degree of discretization. The results obtained are illustrated with some simulations for testing normality when the parameters are known and also when they are unknown. The asymptotic distributions are shown to be an accurate approximation for the true finite n distribution obtained by Monte Carlo. A real example from image analysis is also discussed. The conclusion drawn is that in the cases where the value of the measure for the degree of discretization is not “large”, the practice of ignoring discreteness is of no concern. However, when this value is “large”, the effect of ignoring discreteness leads to an exceded number of rejections of the distribution tested, as compared to what would be the number of rejections if no rounding is taking into account. The error made in the number of rejections might be huge.  相似文献   

17.
In dealing with ties in failure time data the mechanism by which the data are observed should be considered. If the data are discrete, the process is relatively simple and is determined by what is actually observed. With continuous data, ties are not supposed to occur, but they do because the data are grouped into intervals (even if only rounding intervals). In this case there is actually a non–identifiability problem which can only be resolved by modelling the process. Various reasonable modelling assumptions are investigated in this paper. They lead to better ways of dealing with ties between observed failure times and censoring times of different individuals. The current practice is to assume that the censoring times occur after all the failures with which they are tied.  相似文献   

18.
It is often assumed in statistics that the random variables under consideration come from a continuous distribution. However, real data is always given in a rounded (discretized) form. The rounding errors become serious when the sample size is large. In this paper, we consider the situation where the mesh of discretization tends to zero as the sample size tends to infinity, and give some sets of sufficient conditions under which the rounding errors can be asymptotically ignored, in the context of Z-estimation. It is theoretically proved that the mid-point discretization is preferable.  相似文献   

19.
We study bias arising from rounding categorical variables following multivariate normal (MVN) imputation. This task has been well studied for binary variables, but not for more general categorical variables. Three methods that assign imputed values to categories based on fixed reference points are compared using 25 specific scenarios covering variables with k=3, …, 7 categories, and five distributional shapes, and for each k=3, …, 7, we examine the distribution of bias arising over 100,000 distributions drawn from a symmetric Dirichlet distribution. We observed, on both empirical and theoretical grounds, that one method (projected-distance-based rounding) is superior to the other two methods, and that the risk of invalid inference with the best method may be too high at sample sizes n≥150 at 50% missingness, n≥250 at 30% missingness and n≥1500 at 10% missingness. Therefore, these methods are generally unsatisfactory for rounding categorical variables (with up to seven categories) following MVN imputation.  相似文献   

20.
Rounding errors have a considerable impact on statistical inferences, especially when the data size is large and the finite normal mixture model is very important in many applied statistical problems, such as bioinformatics. In this article, we investigate the statistical impacts of rounding errors to the finite normal mixture model with a known number of components, and develop a new estimation method to obtain consistent and asymptotically normal estimates for the unknown parameters based on rounded data drawn from this kind of models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号