首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
Clustering gene expression time course data is an important problem in bioinformatics because understanding which genes behave similarly can lead to the discovery of important biological information. Statistically, the problem of clustering time course data is a special case of the more general problem of clustering longitudinal data. In this paper, a very general and flexible model-based technique is used to cluster longitudinal data. Mixtures of multivariate t-distributions are utilized, with a linear model for the mean and a modified Cholesky-decomposed covariance structure. Constraints are placed upon the covariance structure, leading to a novel family of mixture models, including parsimonious models. In addition to model-based clustering, these models are also used for model-based classification, i.e., semi-supervised clustering. Parameters, including the component degrees of freedom, are estimated using an expectation-maximization algorithm and two different approaches to model selection are considered. The models are applied to simulated data to illustrate their efficacy; this includes a comparison with their Gaussian analogues—the use of these Gaussian analogues with a linear model for the mean is novel in itself. Our family of multivariate t mixture models is then applied to two real gene expression time course data sets and the results are discussed. We conclude with a summary, suggestions for future work, and a discussion about constraining the degrees of freedom parameter.  相似文献   

2.
We consider varying coefficient models, which are an extension of the classical linear regression models in the sense that the regression coefficients are replaced by functions in certain variables (for example, time), the covariates are also allowed to depend on other variables. Varying coefficient models are popular in longitudinal data and panel data studies, and have been applied in fields such as finance and health sciences. We consider longitudinal data and estimate the coefficient functions by the flexible B-spline technique. An important question in a varying coefficient model is whether an estimated coefficient function is statistically different from a constant (or zero). We develop testing procedures based on the estimated B-spline coefficients by making use of nice properties of a B-spline basis. Our method allows longitudinal data where repeated measurements for an individual can be correlated. We obtain the asymptotic null distribution of the test statistic. The power of the proposed testing procedures are illustrated on simulated data where we highlight the importance of including the correlation structure of the response variable and on real data.  相似文献   

3.
由于常用的线性混合效应模型对具有非线性关系的纵向数据建模具有一定的局限性,因此对线性混合效应模型进行扩展,根据变量间的非线性关系建立不同的非线性混合效应模型,并根据因变量的分布特征建立混合分布模型。基于一组实际的保险损失数据,建立多项式混合效应模型、截断多项式混合效应模型和B样条混合效应模型。研究结果表明,非线性混合效应模型能够显著改进对保险损失数据的建模效果,对非寿险费率厘定具有重要参考价值。  相似文献   

4.
The joint models for longitudinal data and time-to-event data have recently received numerous attention in clinical and epidemiologic studies. Our interest is in modeling the relationship between event time outcomes and internal time-dependent covariates. In practice, the longitudinal responses often show non linear and fluctuated curves. Therefore, the main aim of this paper is to use penalized splines with a truncated polynomial basis to parameterize the non linear longitudinal process. Then, the linear mixed-effects model is applied to subject-specific curves and to control the smoothing. The association between the dropout process and longitudinal outcomes is modeled through a proportional hazard model. Two types of baseline risk functions are considered, namely a Gompertz distribution and a piecewise constant model. The resulting models are referred to as penalized spline joint models; an extension of the standard joint models. The expectation conditional maximization (ECM) algorithm is applied to estimate the parameters in the proposed models. To validate the proposed algorithm, extensive simulation studies were implemented followed by a case study. In summary, the penalized spline joint models provide a new approach for joint models that have improved the existing standard joint models.  相似文献   

5.
Joint models for longitudinal and time-to-event data have been applied in many different fields of statistics and clinical studies. However, the main difficulty these models have to face with is the computational problem. The requirement for numerical integration becomes severe when the dimension of random effects increases. In this paper, a modified two-stage approach has been proposed to estimate the parameters in joint models. In particular, in the first stage, the linear mixed-effects models and best linear unbiased predictorsare applied to estimate parameters in the longitudinal submodel. In the second stage, an approximation of the fully joint log-likelihood is proposed using the estimated the values of these parameters from the longitudinal submodel. Survival parameters are estimated bymaximizing the approximation of the fully joint log-likelihood. Simulation studies show that the approach performs well, especially when the dimension of random effects increases. Finally, we implement this approach on AIDS data.  相似文献   

6.
This paper discusses a general strategy for reducing measurement-error-induced bias in statistical models. It is assumed that the measurement error is unbiased with a known variance although no other distributional assumptions on the measurement-error are employed,

Using a preliminary fit of the model to the observed data, a transformation of the variable measured with error is estimated. The transformation is constructed so that the estimates obtained by refitting the model to the ‘corrected’ data have smaller bias,

Whereas the general strategy can be applied in a number of settings, this paper focuses on the problem of covariate measurement error in generalized linear models, Two estimators are derived and their effectiveness at reducing bias is demonstrated in a Monte Carlo study.  相似文献   

7.
Statistical approaches tailored to analyzing longitudinal data that have multiple outcomes with different distributions are scarce. This paucity is due to the non-availability of multivariate distributions that jointly model outcomes with different distributions other than the multivariate normal. A plethora of research has been done on the specific combination of binary-Gaussian bivariate outcomes but a more general approach that allows other mixtures of distributions for multiple longitudinal outcomes has not been thoroughly demonstrated and examined. Here, we study a multivariate generalized linear mixed models approach that jointly models multiple longitudinal outcomes with different combinations of distributions and incorporates the correlations between the various outcomes through separate yet correlated random intercepts. Every outcome is linked to the set of covariates through a proper link function that allows the incorporation and joint modeling of different distributions. A novel application was demonstrated on a cohort study of Type-1 diabetic patients to jointly model a mix of longitudinal cardiovascular outcomes and to explore for the first time the effect of glycemic control treatment, plasma prekallikrein biomarker, gender and age on cardiovascular risk factors collectively.  相似文献   

8.
The omission of important variables is a well‐known model specification issue in regression analysis and mixed linear models. The author considers longitudinal data models that are special cases of the mixed linear models; in particular, they are linear models of repeated observations on a subject. Models of omitted variables have origins in both the econometrics and biostatistics literatures. The author describes regression coefficient estimators that are robust to and that provide the basis for detecting the influence of certain types of omitted variables. New robust estimators and omitted variable tests are introduced and illustrated with a case study that investigates the determinants of tax liability.  相似文献   

9.
Functional linear models are useful in longitudinal data analysis. They include many classical and recently proposed statistical models for longitudinal data and other functional data. Recently, smoothing spline and kernel methods have been proposed for estimating their coefficient functions nonparametrically but these methods are either intensive in computation or inefficient in performance. To overcome these drawbacks, in this paper, a simple and powerful two-step alternative is proposed. In particular, the implementation of the proposed approach via local polynomial smoothing is discussed. Methods for estimating standard deviations of estimated coefficient functions are also proposed. Some asymptotic results for the local polynomial estimators are established. Two longitudinal data sets, one of which involves time-dependent covariates, are used to demonstrate the approach proposed. Simulation studies show that our two-step approach improves the kernel method proposed by Hoover and co-workers in several aspects such as accuracy, computational time and visual appeal of the estimators.  相似文献   

10.
Partial linear varying coefficient models (PLVCM) are often considered for analysing longitudinal data for a good balance between flexibility and parsimony. The existing estimation and variable selection methods for this model are mainly built upon which subset of variables have linear or varying effect on the response is known in advance, or say, model structure is determined. However, in application, this is unreasonable. In this work, we propose a simultaneous structure estimation and variable selection method, which can do simultaneous coefficient estimation and three types of selections: varying and constant effects selection, relevant variable selection. It can be easily implemented in one step by employing a penalized M-type regression, which uses a general loss function to treat mean, median, quantile and robust mean regressions in a unified framework. Consistency in the three types of selections and oracle property in estimation are established as well. Simulation studies and real data analysis also confirm our method.  相似文献   

11.
Penalized regression methods have for quite some time been a popular choice for addressing challenges in high dimensional data analysis. Despite their popularity, their application to time series data has been limited. This paper concerns bridge penalized methods in a linear regression time series model. We first prove consistency, sparsity and asymptotic normality of bridge estimators under a general mixing model. Next, as a special case of mixing errors, we consider bridge regression with autoregressive and moving average (ARMA) error models and develop a computational algorithm that can simultaneously select important predictors and the orders of ARMA models. Simulated and real data examples demonstrate the effective performance of the proposed algorithm and the improvement over ordinary bridge regression.  相似文献   

12.
The maximum likelihood equations for a multivariate normal model with structured mean and structured covariance matrix may not have an explicit solution. In some cases the model's error term may be decomposed as the sum of two independent error terms, each having a patterned covariance matrix, such that if one of the unobservable error terms is artificially treated as "missing data", the EM algorithm can be used to compute the maximum likelihood estimates for the original problem. Some decompositions produce likelihood equations which do not have an explicit solution at each iteration of the EM algorithm, but within-iteration explicit solutions are shown for two general classes of models including covariance component models used for analysis of longitudinal data.  相似文献   

13.
The author proposes a general method for evaluating the fit of a model for functional data. His approach consists of embedding the proposed model into a larger family of models, assuming the true process generating the data is within the larger family, and then computing a posterior distribution for the Kullback‐Leibler distance between the true and the proposed models. The technique is illustrated on biomechanical data reported by Ramsay, Flanagan & Wang (1995). It is developed in detail for hierarchical polynomial models such as those found in Lindley & Smith (1972), and is also generally applicable to longitudinal data analysis where polynomials are fit to many individuals.  相似文献   

14.
In this paper, we consider improved estimating equations for semiparametric partial linear models (PLM) for longitudinal data, or clustered data in general. We approximate the non‐parametric function in the PLM by a regression spline, and utilize quadratic inference functions (QIF) in the estimating equations to achieve a more efficient estimation of the parametric part in the model, even when the correlation structure is misspecified. Moreover, we construct a test which is an analogue to the likelihood ratio inference function for inferring the parametric component in the model. The proposed methods perform well in simulation studies and real data analysis conducted in this paper.  相似文献   

15.
Forecasting with longitudinal data has been rarely studied. Most of the available studies are for continuous response and all of them are for univariate response. In this study, we consider forecasting multivariate longitudinal binary data. Five different models including simple ones, univariate and multivariate marginal models, and complex ones, marginally specified models, are studied to forecast such data. Model forecasting abilities are illustrated via a real-life data set and a simulation study. The simulation study includes a model independent data generation to provide a fair environment for model competitions. Independent variables are forecast as well as the dependent ones to mimic the real-life cases best. Several accuracy measures are considered to compare model forecasting abilities. Results show that complex models yield better forecasts.  相似文献   

16.
Jointly modeling longitudinal and survival data has been an active research area. Most researches focus on improving the estimating efficiency but ignore many data features frequently encountered in practice. In the current study, we develop the joint models that concurrently accounting for longitudinal and survival data with multiple features. Specifically, the proposed model handles skewness, missingness and measurement errors in covariates which are typically observed in the collection of longitudinal survival data from many studies. We employ a Bayesian inferential method to make inference on the proposed model. We applied the proposed model to an real data study. A few alternative models under different conditions are compared. We conduct extensive simulations in order to evaluate how the method works.  相似文献   

17.
Estimation in mixed linear models is, in general, computationally demanding, since applied problems may involve extensive data sets and large numbers of random effects. Existing computer algorithms are slow and/or require large amounts of memory. These problems are compounded in generalized linear mixed models for categorical data, since even approximate methods involve fitting of a linear mixed model within steps of an iteratively reweighted least squares algorithm. Only in models in which the random effects are hierarchically nested can the computations for fitting these models to large data sets be carried out rapidly. We describe a data augmentation approach to these computational difficulties in which we repeatedly fit an overlapping series of submodels, incorporating the missing terms in each submodel as 'offsets'. The submodels are chosen so that they have a nested random-effect structure, thus allowing maximum exploitation of the computational efficiency which is available in this case. Examples of the use of the algorithm for both metric and discrete responses are discussed, all calculations being carried out using macros within the MLwiN program.  相似文献   

18.
Model checking with discrete data regressions can be difficult because the usual methods such as residual plots have complicated reference distributions that depend on the parameters in the model. Posterior predictive checks have been proposed as a Bayesian way to average the results of goodness-of-fit tests in the presence of uncertainty in estimation of the parameters. We try this approach using a variety of discrepancy variables for generalized linear models fitted to a historical data set on behavioural learning. We then discuss the general applicability of our findings in the context of a recent applied example on which we have worked. We find that the following discrepancy variables work well, in the sense of being easy to interpret and sensitive to important model failures: structured displays of the entire data set, general discrepancy variables based on plots of binned or smoothed residuals versus predictors and specific discrepancy variables created on the basis of the particular concerns arising in an application. Plots of binned residuals are especially easy to use because their predictive distributions under the model are sufficiently simple that model checks can often be made implicitly. The following discrepancy variables did not work well: scatterplots of latent residuals defined from an underlying continuous model and quantile–quantile plots of these residuals.  相似文献   

19.
ABSTRACT

Clustered observations such as longitudinal data are often analysed with generalized linear mixed models (GLMM). Approximate Bayesian inference for GLMMs with normally distributed random effects can be done using integrated nested Laplace approximations (INLA), which is in general known to yield accurate results. However, INLA is known to be less accurate for GLMMs with binary response. For longitudinal binary response data it is common that patients do not change their health state during the study period. In this case the grouping covariate perfectly predicts a subset of the response, which implies a monotone likelihood with diverging maximum likelihood (ML) estimates for cluster-specific parameters. This is known as quasi-complete separation. In this paper we demonstrate, based on longitudinal data from a randomized clinical trial and two simulations, that the accuracy of INLA decreases with increasing degree of cluster-specific quasi-complete separation. Comparing parameter estimates by INLA, Markov chain Monte Carlo sampling and ML shows that INLA increasingly deviates from the other methods in such a scenario.  相似文献   

20.
Recent advances in computing make it practical to use complex hierarchical models. However, the complexity makes it difficult to see how features of the data determine the fitted model. This paper describes an approach to diagnostics for hierarchical models, specifically linear hierarchical models with additive normal or t -errors. The key is to express hierarchical models in the form of ordinary linear models by adding artificial `cases' to the data set corresponding to the higher levels of the hierarchy. The error term of this linear model is not homoscedastic, but its covariance structure is much simpler than that usually used in variance component or random effects models. The re-expression has several advantages. First, it is extremely general, covering dynamic linear models, random effect and mixed effect models, and pairwise difference models, among others. Second, it makes more explicit the geometry of hierarchical models, by analogy with the geometry of linear models. Third, the analogy with linear models provides a rich source of ideas for diagnostics for all the parts of hierarchical models. This paper gives diagnostics to examine candidate added variables, transformations, collinearity, case influence and residuals.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号