共查询到20条相似文献,搜索用时 0 毫秒
1.
Viswanathan Ramakrishnan 《统计学通讯:模拟与计算》2013,42(3):405-418
In many genetic analyses of dichotomous twin data, odds ratios have been used to test hypotheses on heritability and shared common environment effects of a given disease (Lichtenstein et al., 2000; Ahlbom et al., 1997; Ramakrishnan et al., 1992, 4). However, estimates of these two effects have not been dealt with in the literature. In epidemiology, the attributable fraction (AF), a function of the odds ratio and the prevalence of the risk factor has been used to describe the contribution of a risk factor to a disease in a given population (Leviton, 1973). In this article, we adapt the AF to quantify the heritability and the shared common environment. Twin data on cancer, gallstone disease and phobia are used to illustrate the applicability of the AF estimate as a measure of heritability. 相似文献
2.
大规模抽样调查多采用复杂抽样设计,得到具有分层嵌套结构的调查数据集,其中不可避免会遇到数据缺失问题,针对分层结构含缺失数据集的插补策略目前鲜有研究。本文将Gibbs算法应用到分层含缺失数据集的多重插补过程中,分别研究了固定效应模型插补法和随机效应模型插补法,进而通过理论推导和数值模拟,在不同组内相关系数、群组规模、数据缺失比例等情形下,从参数估计结果的无偏性和有效性两方面,比较不同方法的插补效果,给出插补模型的选择建议。研究结果表明,采用随机效应模型作为插补模型时,得到的参数估计结果更准确,而固定效应模型作为插补模型操作相对简便,在数据缺失比例较小、组内相关系数较大、群组规模较大等情形下,可以采用固定效应插补模型,否则建议采用随机效应插补模型。 相似文献
3.
We first compare correspondence analysis, which uses chi-square distance, and an alternative approach using Hellinger distance, for representing categorical data in a contingency table. We propose a coefficient which globally measures the similarity between these two approaches. This coefficient can be decomposed into several components, one component for each principal dimension, indicating the contribution of the dimensions to the difference between the two representations. We also make comparisons with the logratio approach based on compositional data. These three methods of representation can produce quite similar results. Two illustrative examples are given. 相似文献
4.
MONIA LUPPARELLI GIOVANNI M. MARCHETTI WICHER P. BERGSMA 《Scandinavian Journal of Statistics》2009,36(3):559-576
Abstract. We discuss two parameterizations of models for marginal independencies for discrete distributions which are representable by bi-directed graph models, under the global Markov property. Such models are useful data analytic tools especially if used in combination with other graphical models. The first parameterization, in the saturated case, is also known as thenation multivariate logistic transformation, the second is a variant that allows, in some (but not all) cases, variation-independent parameters. An algorithm for maximum likelihood fitting is proposed, based on an extension of the Aitchison and Silvey method. 相似文献
5.
By entering the data (y i ,x i ) followed by (–y i ,–x i ), one can obtain an intercept-free regression Y = Xβ + ε from a program package that normally uses an intercept term. There is no bias in the resultant regression coefficients, but a minor postanalysis adjustment is needed to the residual variance and standard errors. 相似文献
6.
Conditional Prior Proposals in Dynamic Models 总被引:2,自引:0,他引:2
Leonhard Knorr-Held 《Scandinavian Journal of Statistics》1999,26(1):129-144
ABSTRACT. Dynamic models extend state space models to non-normal observations. This paper suggests a specific hybrid Metropolis–Hastings algorithm as a simple device for Bayesian inference via Markov chain Monte Carlo in dynamic models. Hastings proposals from the (conditional) prior distribution of the unknown, time-varying parameters are used to update the corresponding full conditional distributions. It is shown through simulated examples that the methodology has optimal performance in situations where the prior is relatively strong compared to the likelihood. Typical examples include smoothing priors for categorical data. A specific blocking strategy is proposed to ensure good mixing and convergence properties of the simulated Markov chain. It is also shown that the methodology is easily extended to robust transition models using mixtures of normals. The applicability is illustrated with an analysis of a binomial and a binary time series, known in the literature. 相似文献
7.
Inverse Gaussian first hitting time regression models sometimes provide an attractive representation of lifetime data. Various authors comment that dependence of both parameters on the same covariate may imply multicollinearity. The frequent appearance of conflicting signs for the two coefficients of the same covariate may be related to this. We carry out simulation studies to examine the reality of this possible multicollinearity. Although there is some dependence between estimates, multicollinearity does not seem to be a major problem. Fitting this model to data generated by a Weibull regression suggests that conflicting signs of estimates may be due to model misspecification. 相似文献
8.
The potency of antiretroviral agents in AIDS clinical trials can be assessed on the basis of a viral response such as viral decay rate or change in viral load (number of HIV RNA copies in plasma). Linear, nonlinear, and nonparametric mixed-effects models have been proposed to estimate such parameters in viral dynamic models. However, there are two critical questions that stand out: whether these models achieve consistent estimates for viral decay rates, and which model is more appropriate for use in practice. Moreover, one often assumes that a model random error is normally distributed, but this assumption may be unrealistic, obscuring important features of within- and among-subject variations. In this article, we develop a skew-normal (SN) Bayesian linear mixed-effects (SN-BLME) model, an SN Bayesian nonlinear mixed-effects (SN-BNLME) model, and an SN Bayesian semiparametric nonlinear mixed-effects (SN-BSNLME) model that relax the normality assumption by considering model random error to have an SN distribution. We compare the performance of these SN models, and also compare their performance with the corresponding normal models. An AIDS dataset is used to test the proposed models and methods. It was found that there is a significant incongruity in the estimated viral decay rates. The results indicate that SN-BSNLME model is preferred to the other models, implying that an arbitrary data truncation is not necessary. The findings also suggest that it is important to assume a model with an SN distribution in order to achieve reasonable results when the data exhibit skewness. 相似文献
9.
There has been much recent work on Bayesian approaches to survival analysis, incorporating features such as flexible baseline hazards, time-dependent covariate effects, and random effects. Some of the proposed methods are quite complicated to implement, and we argue that as good or better results can be obtained via simpler methods. In particular, the normal approximation to the log-gamma distribution yields easy and efficient computational methods in the face of simple multivariate normal priors for baseline log-hazards and time-dependent covariate effects. While the basic method applies to piecewise-constant hazards and covariate effects, it is easy to apply importance sampling to consider smoother functions. 相似文献
10.
Data from complex surveys are being used increasingly to build the same sort of explanatory and predictive models as those used in the rest of statistics. Unfortunately the assumptions underlying standard statistical methods are not even approximately valid for most survey data. The problem of parameter estimation has been largely solved, at least for routine data analysis, through the use of weighted estimating equations, and software for most standard analytical procedures is now available in the major statistical packages. One notable omission from standard software is an analogue of the likelihood ratio test. An exception is the Rao–Scott test for loglinear models in contingency tables. In this paper we show how the Rao–Scott test can be extended to handle arbitrary regression models. We illustrate the process of fitting a model to survey data with an example from NHANES. 相似文献
11.
In some applications, the failure time of interest is the time from an originating event to a failure event while both event times are interval censored. We propose fitting Cox proportional hazards models to this type of data using a spline‐based sieve maximum marginal likelihood, where the time to the originating event is integrated out in the empirical likelihood function of the failure time of interest. This greatly reduces the complexity of the objective function compared with the fully semiparametric likelihood. The dependence of the time of interest on time to the originating event is induced by including the latter as a covariate in the proportional hazards model for the failure time of interest. The use of splines results in a higher rate of convergence of the estimator of the baseline hazard function compared with the usual non‐parametric estimator. The computation of the estimator is facilitated by a multiple imputation approach. Asymptotic theory is established and a simulation study is conducted to assess its finite sample performance. It is also applied to analyzing a real data set on AIDS incubation time. 相似文献
12.
Geert Molenberghs & Els Goetghebeur 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1997,59(2):401-414
A popular approach to estimation based on incomplete data is the EM algorithm. For categorical data, this paper presents a simple expression of the observed data log-likelihood and its derivatives in terms of the complete data for a broad class of models and missing data patterns. We show that using the observed data likelihood directly is easy and has some advantages. One can gain considerable computational speed over the EM algorithm and a straightforward variance estimator is obtained for the parameter estimates. The general formulation treats a wide range of missing data problems in a uniform way. Two examples are worked out in full. 相似文献
13.
The problem of two-group classification has implications in a number of fields, such as medicine, finance, and economics. This study aims to compare the methods of two-group classification. The minimum sum of deviations and linear programming model, linear discriminant analysis, quadratic discriminant analysis and logistic regression, multivariate analysis of variance (MANOVA) test-based classification and the unpooled T-square test-based classification methods, support vector machines and k-nearest neighbor methods, and combined classification method will be compared for data structures having fat-tail and/or skewness. The comparison has been carried out by using a simulation procedure designed for various stable distribution structures and sample sizes. 相似文献
14.
基于图形分析方法的函数型数据异常值检验实证研究 总被引:1,自引:0,他引:1
函数型数据本质上是一种复杂数据,其抽样、生成、结构和关联程度都会影响到数据的复杂性和描述性,有些情形甚至连基本的可视化描述都成为难点。在利用函数型数据的主成分得分、图基的数据深度和密度概念的基础上,引入函数型数据的打包图和箱线图,并针对函数型数据的图形分析提出了函数型数据异常值检测的三种方法。与已有的检测方法相比较,所提三种方法更易于识别函数型数据的异常值。 相似文献
15.
Recently, least absolute deviations (LAD) estimator for median regression models with doubly censored data was proposed and the asymptotic normality of the estimator was established. However, it is invalid to make inference on the regression parameter vectors, because the asymptotic covariance matrices are difficult to estimate reliably since they involve conditional densities of error terms. In this article, three methods, which are based on bootstrap, random weighting, and empirical likelihood, respectively, and do not require density estimation, are proposed for making inference for the doubly censored median regression models. Simulations are also done to assess the performance of the proposed methods. 相似文献
16.
This paper is concerned with the analysis of ordinal data through linear models for rank function measures.Primary attention is directed at pairwise Mann-Whitney statistics for which dimension reduction is managed by use of a Bradley-Terry log-linear structure.The nature of linear models for such quantities is contrasted with that for mean ranks (or ridits).Aspects of application are illustrated with an example for which results of other methods are also given. 相似文献
17.
Many methods have been developed for detecting multiple outliers in a single multivariate sample, but very few for the case where there may be groups in the data. We propose a method of simultaneously determining groups (as in cluster analysis) and detecting outliers, which are points that are distant from every group. Our method is an adaptation of the BACON algorithm proposed by Billor, Hadi and Velleman for the robust detection of multiple outliers in a single group of multivariate data. There are two versions of our method, depending on whether or not the groups can be assumed to have equal covariance matrices. The effectiveness of the method is illustrated by its application to two real data sets and further shown by a simulation study for different sample sizes and dimensions for 2 and 3 groups, with and without planted outliers in the data. When the number of groups is not known in advance, the algorithm could be used as a robust method of cluster analysis, by running it for various numbers of groups and choosing the best solution. 相似文献
18.
Laura Ferreira 《统计学通讯:模拟与计算》2013,42(9):1925-1949
Functional data analysis (FDA)—the analysis of data that can be considered a set of observed continuous functions—is an increasingly common class of statistical analysis. One of the most widely used FDA methods is the cluster analysis of functional data; however, little work has been done to compare the performance of clustering methods on functional data. In this article, a simulation study compares the performance of four major hierarchical methods for clustering functional data. The simulated data varied in three ways: the nature of the signal functions (periodic, non periodic, or mixed), the amount of noise added to the signal functions, and the pattern of the true cluster sizes. The Rand index was used to compare the performance of each clustering method. As a secondary goal, clustering methods were also compared when the number of clusters has been misspecified. To illustrate the results, a real set of functional data was clustered where the true clustering structure is believed to be known. Comparing the clustering methods for the real data set confirmed the findings of the simulation. This study yields concrete suggestions to future researchers to determine the best method for clustering their functional data. 相似文献
19.
G. Jacob F. H. C. Marriott & P. A. Robbins 《Journal of the Royal Statistical Society. Series C, Applied statistics》1997,46(2):235-243
Records of gas flow during breathing are cyclical, with the cycles varying in duration. The shape of these cycles may change with the intensity of respiratory stimulation or the development of respiratory disease, but currently research is hampered by the lack of a fully satisfactory technique for determining the shape of a typical cycle. The approach adopted here is to replace the time series by a 'phase diagram', plotting the time integral of flow against flow itself. Principal curves are then fitted. These are curves `through the middle of the data' which were introduced by Hastie and Stuetzle. The shapes of these curves are compared, either directly or after reconstructing an average cycle corresponding to each fitted curve. This has the advantage that the shape of the waveform is separated from the amplitude, and from the duration of the breath. A disadvantage is that periods of zero flow are lost, and the reconstructed average cycle may show irregularities at points near zero flow as a result. In practice, the methodology showed clear differences in shape between the protocols, gave reasonable average cycles and ordered the waveform shapes according to the hardness of breathing induced by the protocols. 相似文献
20.
Multivariate survival data arise when eachstudy subject may experience multiple events or when study subjectsare clustered into groups. Statistical analyses of such dataneed to account for the intra-cluster dependence through appropriatemodeling. Frailty models are the most popular for such failuretime data. However, there are other approaches which model thedependence structure directly. In this article, we compare thefrailty models for bivariate data with the models based on bivariateexponential and Weibull distributions. Bayesian methods providea convenient paradigm for comparing the two sets of models weconsider. Our techniques are illustrated using two examples.One simulated example demonstrates model choice methods developedin this paper and the other example, based on a practical dataset of onset of blindness among patients with diabetic Retinopathy,considers Bayesian inference using different models. 相似文献