首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 745 毫秒
1.
影响统计数据质量的因素是多方面的。提高统计数据质量不仅要改进统计的方法技术,还需要建设优良的统计社会环境,为统计提供良好的条件,所以应当重视“统计生态环境建设”的问题。“统计生态环境建设”包括有利于确保数据质量的思想观念环境、政府统计的体制环境、包含数据真实性的社会信用体系以及保障统计数据质量的法制环境。  相似文献   

2.
The self-updating process (SUP) is a clustering algorithm that stands from the viewpoint of data points and simulates the process how data points move and perform self-clustering. It is an iterative process on the sample space and allows for both time-varying and time-invariant operators. By simulations and comparisons, this paper shows that SUP is particularly competitive in clustering (i) data with noise, (ii) data with a large number of clusters, and (iii) unbalanced data. When noise is present in the data, SUP is able to isolate the noise data points while performing clustering simultaneously. The property of the local updating enables SUP to handle data with a large number of clusters and data of various structures. In this paper, we showed that the blurring mean-shift is a static SUP. Therefore, our discussions on the strengths of SUP also apply to the blurring mean-shift.  相似文献   

3.
洪兴建 《统计研究》2010,27(2):83-86
 由于很多收入抽样数据只是公布了相对简约的分组数据,如何依据信息不完整的分组数据估计样本基尼系数的范围是非常重要的。本文针对分组数据中各组收入的取值范围以及各组人均收入是否已知,从多个方面探讨了样本基尼系数的取值范围,并给出了相应的估算公式。最后,结合我国城乡居民收入的分组数据,实证分析了城乡收入基尼系数的范围。  相似文献   

4.
李金昌 《统计研究》2020,37(2):119-128
数据作为重要的数据资源存在,不论是其内在蕴含的信息价值还是其已经成为人类社会所需数据有机组成的客观事实,都迫使我们去不断加强对大数据的应用。然而,由于大数据作为信息技术应用的副产品,其复杂性、不确定性和涌现性决定了我们应用大数据并非易事,存在着很多质量上的问题,除了具有传统数据所有的质量问题外,还包括一些独特的新问题。为了更好地应用大数据,本文对如何进行大数据应用的质量控制进行了初步的研究。主要内容包括以下三个方面:一是对什么是大数据质量、受哪些因素影响、可能存在哪些质量问题进行了探讨;二是从做好理论准备、建立质量控制方案、重视对小数据研究、加强大数据管理、加强大数据人才培养和加强大数据法制建设六个方面,提出了大数据应用的质量控制的基本想法;三是对大数据应用中需要引起注意的几个方面进行了讨论,并结合例子进行了阐释。  相似文献   

5.
Studies producing longitudinal multinomial data arise in several subject areas. This article suggests a Bayesian approach to the analysis of such data. Rather than infusing a latent model structure, we develop a prior distribution for the multinomial parameters which reflects the longitudinal nature of the observations. This distribution is constructed by modifying the prior that posits independent Dirichlet distributions for the multinomial parameters across time. Posterior analysis, which is implemented using Monte Carlo methods, can then be used to assess the temporal behaviour of the multinomial parameters underlying the observed data. We test this methodology on simulated data, opinion polling data, and data from a study concerning the development of moral reasoning.  相似文献   

6.
Summary. Many geophysical regression problems require the analysis of large (more than 104 values) data sets, and, because the data may represent mixtures of concurrent natural processes with widely varying statistical properties, contamination of both response and predictor variables is common. Existing bounded influence or high breakdown point estimators frequently lack the ability to eliminate extremely influential data and/or the computational efficiency to handle large data sets. A new bounded influence estimator is proposed that combines high asymptotic efficiency for normal data, high breakdown point behaviour with contaminated data and computational simplicity for large data sets. The algorithm combines a standard M -estimator to downweight data corresponding to extreme regression residuals and removal of overly influential predictor values (leverage points) on the basis of the statistics of the hat matrix diagonal elements. For this, the exact distribution of the hat matrix diagonal elements p ii for complex multivariate Gaussian predictor data is shown to be β ( p ii ,  m ,  N − m ), where N is the number of data and m is the number of parameters. Real geophysical data from an auroral zone magnetotelluric study which exhibit severe outlier and leverage point contamination are used to illustrate the estimator's performance. The examples also demonstrate the utility of looking at both the residual and the hat matrix distributions through quantile–quantile plots to diagnose robust regression problems.  相似文献   

7.
Binary-response data arise in teratology and mutagenicity studies in which each treatment is applied to a group of litters. In a large experiment, a contingency table can be constructed to test the treatment X litter size interaction (see Kastenbaum and Lamphiear 1959). In situations in which there is a clumped category, as in the Kastenbaum and Lamphiear mice-depletion data, a clumped binomial model (Koch et al. 1976) or a clumped beta-binomial model (Paul 1979) can be used to analyze these data. When a clumped binomial model is appropriate, the maximum likelihood estimates of the parameters of the model under the hypothesis of no treatment X litter size interaction, as well as under the hypothesis of the said interaction, can be estimated via the EM algorithm for computing maximum likelihood estimates from incomplete data (Dempster et al. 1977). In this article the EM algorithm is described and used to test treatment X litter size interaction for the Kastenbaum and Lamphiear data and for a set of data given in Luning et al. (1966).  相似文献   

8.
结构方程模式在信度检验中的应用   总被引:5,自引:1,他引:4  
在对量表数据进行分析前,要先考虑所测量的数值是否可靠,只有信度被接受时,量表的数据分析才是可靠的。国内管理学研究对信度的检验广泛依赖于α系数,但α系数存在要求量表题项受潜在变量影响相等、误差间不能相关、不能对题项进行信度评估等诸多缺陷。随着结构方程模式的发展并被广泛应用,使用结构方程模式对量表进行组合信度(Composite Reliability)和个别项目信度(Individual Item Reliability)检验,可以克服上述缺陷,并为项目修正提供直接依据。  相似文献   

9.
A data set in the form of a 2 × 2 × 2 contingency table is presented and analyzed in detail. For instructional purposes, the analysis of the data can be used to illustrate some basic concepts in the loglinear model approach to the analysis of multidimensional contingency tables.  相似文献   

10.
Jacob Shelby 《Serials Review》2017,43(3-4):195-207
ABSTRACT

Linked data has swept across the library community, making its way into special collections and catalog data. What would linked data look like in a technical services environment? This article will look at the intersection of linked data and technical services. The article will begin with an introduction to linked data concepts. This will be followed by a look at linked data technologies and publishing strategies. The article will close with a discussion of potential and real applications of linked data in technical services, benefits and challenges of linked data, and thoughts on how the library community can contribute to the linked data effort.  相似文献   

11.
This article considers the problem of testing the validity of the assumption that the underlying distribution of life is Pareto. For complete and censored samples, the relationship between the Pareto and the exponential distributions could be of vital importance to test for the validity of this assumption. For grouped uncensored data the classical Pearson χ2 test based on the multinomial model can be used. Attention is confined in this article to handle grouped data with withdrawals within intervals. Graphical as well as analytical procedures will be presented. Maximum likelihood estimators for the parameters of the Pareto distribution based on grouped data will be derived.  相似文献   

12.
A common problem in multivariate general linear models is partially missing response data. The simplest method of analysis in the presence of missing data has been to delete all observations on any individual with any missing data(listwise deletion) and utilize a traditional complete data approach. However: this can result in a great loss of information: and perhaps inconsistencies in the estimation of the variance-covariance matrix. In the generalized multivariate analysis of variance(GMANOVA) model with missing data: Kleinbaum(1973) proposed an estimated generalized least squares approach. In order to apply this: however: a consistent estimate of the variance-covariance matrix is needed. Kleinbaum proposed an estimator which is unbiased and consistent: but it does not take advantage of the fact that the underlying model is GMANOVA and not MANOVA. Using the fact that the underlying model is GMANOVA we have constructed four other con¬sistent estimators. A Monte Carlo simulation experiment is conducted tto further examine how well these estimators compare to the estimator proposed by Kleinbaum.  相似文献   

13.
Many statistical series that are available from official agencies, such as the Office for National Statistics in the UK and the Bureau of Economic Analysis in the USA, are subject to an extensive process of revision and refinement. This feature of the data is often not explicitly recognized by users even though it may be important to their use of the data. The starting-point of this study is to conceptualize and model the data measurement process as it is relevant to the index of production (IOP). The IOP attracts considerable attention because of its timely publication and importance as an indicator of the UK's industrial base. This study shows that there is one common stochastic trend (and one common factor in terms of observable variables) `driving' 13 vintages of data on the IOP. Necessary and sufficient conditions are derived for the `final' vintage of data on the IOP to be the permanent component of the series in the Gonzalo–Granger sense, and the revisions to be the transitory components. These conditions are not satisfied for the IOP; hence, the per-manent component is a function of all the published vintages.  相似文献   

14.
Surles and Padgett recently considered two-parameter Burr Type X distribution by introducing a scale parameter and called it the generalized Rayleigh distribution. It is observed that the generalized Rayleigh and log-normal distributions have many common properties and both distributions can be used quite effectively to analyze skewed data set. In this paper, we mainly compare the Fisher information matrices of the two distributions for complete and censored observations. Although, both distributions may provide similar data fit and are quite similar in nature in many aspects, the corresponding Fisher information matrices can be quite different. We compute the total information measures of the two distributions for different parameter ranges and also compare the loss of information due to censoring. Real data analysis has been performed for illustrative purposes.  相似文献   

15.
In many applications in applied statistics, researchers reduce the complexity of a data set by combining a group of variables into a single measure using a factor analysis or an index number. We argue that such compression loses information if the data actually have high dimensionality. We advocate the use of a non-parametric estimator, commonly used in physics (the Takens estimator), to estimate the correlation dimension of the data prior to compression. The advantage of this approach over traditional linear data compression approaches is that the data do not have to be linearised. Applying our ideas to the United Nations Human Development Index, we find that the four variables that are used in its construction have dimension 3 and the index loses information.  相似文献   

16.
Abstract.  Epidemiology research often entails the analysis of failure times subject to grouping. In large cohorts interval grouping also offers a feasible choice of data reduction to actually facilitate an analysis of the data. Based on an underlying Cox proportional hazards model for the exact failure times one may deduce a grouped data version of this model which may then be used to analyse the data. The model bears a lot of resemblance to a generalized linear model, yet due to the nature of data one also needs to incorporate censoring. In the case of non-trivial censoring this precludes model checking procedures based on ordinary residuals as calculation of these requires knowledge of the censoring distribution. In this paper, we represent interval grouped data in a dynamical way using a counting process approach. This enables us to identify martingale residuals which can be computed without knowledge of the censoring distribution. We use these residuals to construct graphical as well as numerical model checking procedures. An example from epidemiology is provided.  相似文献   

17.
The paper addresses the problem of using LANDSAT data to obtain estimates of crop areas at the county level. In the paper, LANDSAT data are used to supplement ground data collected in a nationwide agricultural survey. The paper extends the Battese-Fuller estimation model to a stratified sample design. The resulting estimator is evaluated on a six county area in South Dakota  相似文献   

18.
廖明球 《统计研究》1996,13(3):10-12
In author’s opinion, when we make macro-analysis using the data from New System of National Accounts, it is necessary to re-group the data structure of accountion system. This paper also addresses the basic theories underlying this regrouping and relevant ideas.  相似文献   

19.
煤炭大数据指数编制及经验模态分解模型研究   总被引:1,自引:0,他引:1  
基于开放性数据源、连续观测昨多变量数据编制的大数据指数,与传统的统计调查指数存在的差异不仅在于数据本身的无限扩张,而且在于编制方法以及分解研究的规则、模型方面的差异。在大数据背景下,率先尝试性地提出大数据指数的定义和数据假设,将"互联网大数据指数"引入煤炭交易价格指数综合编制太原煤炭交易大数据指数,从而反映煤炭价格的变动趋势;导入经验模态分解模型,对所编制的煤炭大数据指数进行分解研究,尝试比较与传统的统计调查指数的差异。研究表明:新编制的煤炭价格大数据指数要比太原煤炭交易价格指数更为敏感和迅速,能更好地反映煤炭价格的变动趋势。随着"互联网+"和大数据战略的逐渐普及,基于互联网大数据编制的综合指数会影响到更多领域,将成为经济管理和社会发展各个领域的晴雨表和指示器;与传统统计调查指数逐步融合、互补或者升级,成为宏观经济大数据指数的重要组成部分。  相似文献   

20.

This paper is concerned with properties (bias, standard deviation, mean square error and efficiency) of twenty six estimators of the intraclass correlation in the analysis of binary data. Our main interest is to study these properties when data are generated from different distributions. For data generation we considered three over-dispersed binomial distributions, namely, the beta-binomial distribution, the probit normal binomial distribution and a mixture of two binomial distributions. The findings regarding bias, standard deviation and mean squared error of all these estimators, are that (a) in general, the distributions of biases of most of the estimators are negatively skewed. The biases are smallest when data are generated from the beta-binomial distribution and largest when data are generated from the mixture distribution; (b) the standard deviations are smallest when data are generated from the beta-binomial distribution; and (c) the mean squared errors are smallest when data are generated from the beta-binomial distribution and largest when data are generated from the mixture distribution. Of the 26, nine estimators including the maximum likelihood estimator, an estimator based on the optimal quadratic estimating equations of Crowder (1987), and an analysis of variance type estimator is found to have least amount of bias, standard deviation and mean squared error. Also, the distributions of the bias, standard deviation and mean squared error for each of these estimators are, in general, more symmetric than those of the other estimators. Our findings regarding efficiency are that the estimator based on the optimal quadratic estimating equations has consistently high efficiency and least variability in the efficiency results. In the important range in which the intraclass correlation is small (≤0 5), on the average, this estimator shows best efficiency performance. The analysis of variance type estimator seems to do well for larger values of the intraclass correlation. In general, the estimator based on the optimal quadratic estimating equations seems to show best efficiency performance for data from the beta-binomial distribution and the probit normal binomial distribution, and the analysis of variance type estimator seems to do well for data from the mixture distribution.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号