首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The Wisconsin Epidemiologic Study of Diabetic Retinopathy is a population-based epidemiological study carried out in Southern Wisconsin during the 1980s. The resulting data were analysed by different statisticians and ophthalmologists during the last two decades. Most of the analyses were carried out on the baseline data, although there were two follow-up studies on the same population. A Bayesian analysis of the first follow-up data, taken four years after the baseline study, was carried out by Angers and Biswas [Angers, J.-F. and Biswas, A., 2004, A Bayesian analysis of the four-year follow-up data of theWisconsin epidemiologic study of diabetic retinopathy. Statistics in Medicine, 23, 601–615.], where the choice of the best model in terms of the covariate inclusion is done, and estimates of the associated covariate effects were obtained using the baseline data to set the prior for the parameters. In the present article we consider an univariate transformation of the bivariate ordinal data, and a parallel analysis with the much simpler univariate data is carried out. The results are then compared with the results of Angers and Biswas (2004). In conclusion, our analyses suggest that the univariate analysis fails to detect features of the data found by the bivariate analysis. Even an univariate transformation of our data with quite high correlation with both left and right eyes is inadequate.  相似文献   

2.
3.
统计数据预处理的理论与方法述评   总被引:1,自引:0,他引:1  
统计数据预处理是提升数据质量的重要阶段,包括数据审查、数据清理、数据转换和数据验证四大步骤。根据处理对象的特点及每一步骤的不同目标,统计数据预处理可采用的方法包括描述及探索性分析、缺失值处理、异常值处理、数据变换技术、信度与效度检验、宏观数据诊断等六大类。选用恰当的方法开展统计数据预处理,有利于保证数据分析结论真实、有效。  相似文献   

4.
An imputation procedure is a procedure by which each missing value in a data set is replaced (imputed) by an observed value using a predetermined resampling procedure. The distribution of a statistic computed from a data set consisting of observed and imputed values, called a completed data set, is affecwd by the imputation procedure used. In a Monte Carlo experiment, three imputation procedures are compared with respect to the empirical behavior of the goodness-of- fit chi-square statistic computed from a completed data set. The results show that each imputation procedure affects the distribution of the goodness-of-fit chi-square statistic in 3. different manner. However, when the empirical behavior of the goodness-of-fit chi-square statistic is compared u, its appropriate asymptotic distribution, there are no substantial differences between these imputation procedures.  相似文献   

5.
This article introduces principal component analysis for multidimensional sparse functional data, utilizing Gaussian basis functions. Our multidimensional model is estimated by maximizing a penalized log-likelihood function, while previous mixed-type models were estimated by maximum likelihood methods for one-dimensional data. The penalized estimation performs well for our multidimensional model, while maximum likelihood methods yield unstable parameter estimates and some of the parameter estimates are infinite. Numerical experiments are conducted to investigate the effectiveness of our method for some types of missing data. The proposed method is applied to handwriting data, which consist of the XY coordinates values in handwritings.  相似文献   

6.
When modeling multilevel data, it is important to accurately represent the interdependence of observations within clusters. Ignoring data clustering may result in parameter misestimation. However, it is not well established to what degree parameter estimates are affected by model misspecification when applying missing data techniques (MDTs) to incomplete multilevel data. We compare the performance of three MDTs with incomplete hierarchical data. We consider the impact of imputation model misspecification on the quality of parameter estimates by employing multiple imputation under assumptions of a normal model (MI/NM) with two-level cross-sectional data when values are missing at random on the dependent variable at rates of 10%, 30%, and 50%. Five criteria are used to compare estimates from MI/NM to estimates from MI assuming a linear mixed model (MI/LMM) and maximum likelihood estimation to the same incomplete data sets. With 10% missing data (MD), techniques performed similarly for fixed-effects estimates, but variance components were biased with MI/NM. Effects of model misspecification worsened at higher rates of MD, with the hierarchical structure of the data markedly underrepresented by biased variance component estimates. MI/LMM and maximum likelihood provided generally accurate and unbiased parameter estimates but performance was negatively affected by increased rates of MD.  相似文献   

7.
缺失数据是影响调查问卷数据质量的重要因素,对调查问卷中的缺失值进行插补可以显著提高调查数据的质量。调查问卷的数据类型多以分类型数据为主,数据挖掘技术中的分类算法是处理属性分类问题的常用方法,随机森林模型是众多分类算法中精度较高的方法之一。将随机森林模型引入调查问卷缺失数据的插补研究中,提出了基于随机森林模型的分类数据缺失值插补方法,并根据不同的缺失模式探讨了相应的插补步骤。通过与其它方法的实证模拟比较,表明随机森林插补法得到的插补值准确度更优、可信度更高。  相似文献   

8.
The paper presents a new approach to interrelated two-way clustering of gene expression data. Clustering of genes has been effected using entropy and a correlation measure, whereas the samples have been clustered using the fuzzy C-means. The efficiency of this approach has been tested on two well known data sets: the colon cancer data set and the leukemia data set. Using this approach, we were able to identify the important co-regulated genes and group the samples efficiently at the same time.  相似文献   

9.
The paper gives a review of a number of data models for aggregate statistical data which have appeared in the computer science literature in the last ten years.After a brief introduction to the data model in general, the fundamental concepts of statistical data are introduced. These are called statistical objects because they are complex data structures (vectors, matrices, relations, time series, etc) which may have different possible representations (e.g. tables, relations, vectors, pie-charts, bar-charts, graphs, and so on). For this reason a statistical object is defined by two different types of attribute (a summary attribute, with its own summary type and with its own instances, called summary data, and the set of category attributes, which describe the summary attribute). Some conceptual models of statistical data (CSM, SDM4S), some semantic models of statistical data (SCM, SAM*, OSAM*), and some graphical models of statistical data (SUBJECT, GRASS, STORM) are also discussed.  相似文献   

10.
Most data used to study the durations of unemployment spells come from the Current Population Survey (CPS), which is a point-in-time survey and gives an incomplete picture of the underlying duration distribution. We introduce a new sample of completed unemployment spells obtained from panel data and apply CPS sampling and reporting techniques to replicate the type of data used by other researchers. Predicted duration distributions derived from this CPS-like data are then compared to the actual distribution. We conclude that the best inferences that can be made about unemployment durations by using CPS-like data are seriously biased.  相似文献   

11.
随着大数据时代的来临,近年来函数型数据分析方法成为研究的热点问题,针对曲线的聚类分析方法引起了学界的关注.给出一种曲线聚类的方法:以L2距离作为亲疏程度的度量,在B样条基底函数展开表述下,将曲线本身信息、曲线变化信息引入聚类算法构建,并实现了曲线聚类与传统多元统计聚类方法的对接.作为应用,以城乡收入函数聚类实例验证了该曲线聚类方法,结果表明,在引入曲线变化信息的情况下,比仅考虑曲线本身信息能够取得更好的聚类效果.  相似文献   

12.
Inequality-restricted hypotheses testing methods containing multivariate one-sided testing methods are useful in practice, especially in multiple comparison problems. In practice, multivariate and longitudinal data often contain missing values since it may be difficult to observe all values for each variable. However, although missing values are common for multivariate data, statistical methods for multivariate one-sided tests with missing values are quite limited. In this article, motivated by a dataset in a recent collaborative project, we develop two likelihood-based methods for multivariate one-sided tests with missing values, where the missing data patterns can be arbitrary and the missing data mechanisms may be non-ignorable. Although non-ignorable missing data are not testable based on observed data, statistical methods addressing this issue can be used for sensitivity analysis and might lead to more reliable results, since ignoring informative missingness may lead to biased analysis. We analyse the real dataset in details under various possible missing data mechanisms and report interesting findings which are previously unavailable. We also derive some asymptotic results and evaluate our new tests using simulations.  相似文献   

13.
基于数据汇总的普查调查框误差研究   总被引:1,自引:0,他引:1  
作为一种全面调查,普查数据的生产过程可以视为由个体数据汇总为总量数据的过程.为了开展普查数据质量评估与控制研究,从中国普查调查实施过程共性出发,构建普查数据汇总模型的一般形式,并以此为基础界定普查调查框及其作用,将普查划分为两种类型;同时从普查数据汇总的角度论述普查调查框误差的量化形式,进一步完善单位清查(清查摸底)环节在普查数据汇总中的理论意义.  相似文献   

14.
文章针对大量复杂的靶场观测数据,通过构造初始拟合数据,利用B样条曲线的方法构造递推模型,使用基于样条平滑方法估计的判断门限对双向检验的结果数据是否异常进行判定,并且对满足修复条件的数据进行拟合修复,当双向检验的结果不同时,通过构造内推模型来进一步检验。实例分析表明:文章提出的方法相对其他方法能更有效地剔除异常数据,通过数据分段处理能更有效地检验那些可能产生阶段性跳跃的数据,使得模型具有更好的稳定性、更广的适用性和更高的异常数据剔除率。  相似文献   

15.
The autologistic model, first introduced by Besag, is a popular tool for analyzing binary data in spatial lattices. However, no investigation was found to consider modeling of binary data clustered in uncorrelated lattices. Owing to spatial dependency of responses, the exact likelihood estimation of parameters is not possible. For circumventing this difficulty, many studies have been designed to approximate the likelihood and the related partition function of the model. So, the traditional and Bayesian estimation methods based on the likelihood function are often time-consuming and require heavy computations and recursive techniques. Some investigators have introduced and implemented data augmentation and latent variable model to reduce computational complications in parameter estimation. In this work, the spatially correlated binary data distributed in uncorrelated lattices were modeled using autologistic regression, a Bayesian inference was developed with contribution of data augmentation and the proposed models were applied to caries experiences of deciduous dents.  相似文献   

16.
Statistical process control of multi-attribute count data has received much attention with modern data-acquisition equipment and online computers. The multivariate Poisson distribution is often used to monitor multivariate attributes count data. However, little work has been done so far on under- or over-dispersed multivariate count data, which is common in many industrial processes, with positive or negative correlation. In this study, a Shewhart-type multivariate control chart is constructed to monitor such kind of data, namely the multivariate COM-Poisson (MCP) chart, based on the MCP distribution. The performance of the MCP chart is evaluated by the average run length in simulation. The proposed chart generalizes some existing multivariate attribute charts as its special cases. A real-life bivariate process and a simulated trivariate Poisson process are used to illustrate the application of the MCP chart.  相似文献   

17.
Functional boxplot is an attractive technique to visualize data that come from functions. We propose an alternative to the functional boxplot based on depth measures. Our proposal generalizes the usual construction of the box-plot in one dimension related to the down-upward orderings of the data by considering two intuitive pre-orders in the functional context. These orderings are based on the epigraphs and hypographs of the data that allow a new definition of functional quartiles which is more robust to shape outliers. Simulated and real examples show that this proposal provides a convenient visualization technique with a great potential for analyzing functional data and illustrate its usefulness to detect outliers that other procedures do not detect.  相似文献   

18.
由于数据来源不同,所以在编制社会核算矩阵时会出现大量数据不衔接的问题.目前国内外学者多数采用RAS、CE等方法进行平衡处理,而这些方法都属于纯数学线性调整和平衡方法,并没有考虑具体数据的现实经济意义.鉴此,设计一种“项目对应平衡法”,以矩阵表中元素的具体经济含义为依据,对所有不平衡项目进行逐项处理,以达到整个矩阵的总体平衡.同时采用这种平衡方法,实际编制了2007年中国社会核算矩阵.  相似文献   

19.
The paper considers the property of global log-concavity of the likelihood function in discrete data models in which the data are observed in ‘grouped’ form, meaning that for some observations, while the actual value is unknown, the realisation of the discrete random variable is known to fall within a certain range of values. A typical likelihood contribution in this type of model is a sum of probabilities over a range of realisations. An important issue is whether the property of log-concavity in the ungrouped case carries over to the grouped counterpart; the paper finds, by way of a simple but relevant counter-example, that this is not always the case. However, in two cases of practical interest, namely the Poisson and geometric models, the property of log-concavity is preserved under grouping.  相似文献   

20.
Compositional data are characterized by values containing relative information, and thus the ratios between the data values are of interest for the analysis. Due to specific features of compositional data, standard statistical methods should be applied to compositions expressed in a proper coordinate system with respect to an orthonormal basis. It is discussed how three-way compositional data can be analyzed with the Parafac model. When data are contaminated by outliers, robust estimates for the Parafac model parameters should be employed. It is demonstrated how robust estimation can be done in the context of compositional data and how the results can be interpreted. A real data example from macroeconomics underlines the usefulness of this approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号