首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A robust generalized score test for comparing groups of cluster binary data is proposed. This novel test is asymptotically valid for practically any underlying correlation configurations including the situation when correlation coefficients vary within or between clusters. This structure generally undermines the validity of the typical large sample properties of the method of maximum likelihood. Simulations and real data analysis are used to demonstrate the merit of this parametric robust method. Results show that our test is superior to two recently proposed test statistics advocated by other researchers.  相似文献   

2.
This paper presents a new Bayesian, infinite mixture model based, clustering approach, specifically designed for time-course microarray data. The problem is to group together genes which have “similar” expression profiles, given the set of noisy measurements of their expression levels over a specific time interval. In order to capture temporal variations of each curve, a non-parametric regression approach is used. Each expression profile is expanded over a set of basis functions and the sets of coefficients of each curve are subsequently modeled through a Bayesian infinite mixture of Gaussian distributions. Therefore, the task of finding clusters of genes with similar expression profiles is then reduced to the problem of grouping together genes whose coefficients are sampled from the same distribution in the mixture. Dirichlet processes prior is naturally employed in such kinds of models, since it allows one to deal automatically with the uncertainty about the number of clusters. The posterior inference is carried out by a split and merge MCMC sampling scheme which integrates out parameters of the component distributions and updates only the latent vector of the cluster membership. The final configuration is obtained via the maximum a posteriori estimator. The performance of the method is studied using synthetic and real microarray data and is compared with the performances of competitive techniques.  相似文献   

3.
We investigate mixed analysis of covariance models for the 'one-step' assessment of conditional QT prolongation. Initially, we consider three different covariance structures for the data, where between-treatment covariance of repeated measures is modelled respectively through random effects, random coefficients, and through a combination of random effects and random coefficients. In all three of those models, an unstructured covariance pattern is used to model within-treatment covariance. In a fourth model, proposed earlier in the literature, between-treatment covariance is modelled through random coefficients but the residuals are assumed to be independent identically distributed (i.i.d.). Finally, we consider a mixed model with saturated covariance structure. We investigate the precision and robustness of those models by fitting them to a large group of real data sets from thorough QT studies. Our findings suggest: (i) Point estimates of treatment contrasts from all five models are similar. (ii) The random coefficients model with i.i.d. residuals is not robust; the model potentially leads to both under- and overestimation of standard errors of treatment contrasts and therefore cannot be recommended for the analysis of conditional QT prolongation. (iii) The combined random effects/random coefficients model does not always converge; in the cases where it converges, its precision is generally inferior to the other models considered. (iv) Both the random effects and the random coefficients model are robust. (v) The random effects, the random coefficients, and the saturated model have similar precision and all three models are suitable for the one-step assessment of conditional QT prolongation.  相似文献   

4.
Values of pharmacokinetic parameters may seem to vary randomly between dosing occasions. An accurate explanation of the pharmacokinetic behaviour of a particular drug within a population therefore requires two major sources of variability to be accounted for, namely interoccasion variability and intersubject variability. A hierarchical model that recognizes these two sources of variation has been developed. Standard Bayesian techniques were applied to this statistical model, and a mathematical algorithm based on a Gibbs sampling strategy was derived. The accuracy of this algorithm's determination of the interoccasion and intersubject variation in pharmacokinetic parameters was evaluated from various population analyses of several sets of simulated data. A comparison of results from these analyses with those obtained from parallel maximum likelihood analyses (NONMEM) showed that, for simple problems, the outputs from the two algorithms agreed well, whereas for more complex situations the NONMEM approach may be less accurate. Statistical analyses of a multioccasion data set of pharmacokinetic measurements on the drug metoprolol (the measurements being of concentrations of drug in blood plasma from human subjects) revealed substantial interoccasion variability for all structural model parameters. For some parameters, interoccasion variability appears to be the primary source of pharmacokinetic variation.  相似文献   

5.
In the health and social sciences, researchers often encounter categorical data for which complexities come from a nested hierarchy and/or cross-classification for the sampling structure. A common feature of these studies is a non-standard data structure with repeated measurements which may have some degree of clustering. In this paper, methodology is presented for the joint estimation of quantities of interest in the context of a stratified two-stage sample with bivariate dichotomous data. These quantities are the mean value π of an observed dichotomous response for a certain condition or time-point and a set of correlation coefficients for intra-cluster association for each condition or time period and for inter-condition correlation within and among clusters. The methodology uses the cluster means and pairwise joint probability parameters from each cluster. They together provide appropriate information across clusters for the estimation of the correlation coefficients.  相似文献   

6.
Count data are routinely assumed to have a Poisson distribution, especially when there are no straightforward diagnostic procedures for checking this assumption. We reanalyse two data sets from crossover trials of treatments for angina pectoris , in which the outcomes are counts of anginal attacks. Standard analyses focus on treatment effects, averaged over subjects; we are also interested in the dispersion of these effects (treatment heterogeneity). We set up a log-Poisson model with random coefficients to estimate the distribution of the treatment effects and show that the analysis is very sensitive to the distributional assumption; the population variance of the treatment effects is confounded with the (variance) function that relates the conditional variance of the outcomes, given the subject's rate of attacks, to the conditional mean. Diagnostic model checks based on resampling from the fitted distribution indicate that the default choice of the Poisson distribution for the analysed data sets is poorly supported. We propose to augment the data sets with observations of the counts, made possibly outside the clinical setting, so that the conditional distribution of the counts could be established.  相似文献   

7.
In varying-coefficient models, an important question is to determine whether some of the varying coefficients are actually invariant coefficients. This article proposes a penalized likelihood method in the framework of the smoothing spline ANOVA models, with a penalty designed toward the goal of automatically distinguishing varying coefficients and those which are not varying. Unlike the stepwise procedure, the method simultaneously quantifies and estimates the coefficients. An efficient algorithm is given and ways of choosing the smoothing parameters are discussed. Simulation results and an analysis on the Boston housing data illustrate the usefulness of the method. The proposed approach is further extended to longitudinal data analysis.  相似文献   

8.
Survival data analysis aims at collecting data on durations spent in a state by a sample of units, in order to analyse the process of transition to a different state. Survival analysis applied to social and economic phenomena typically relies upon data on transitions collected, for a sample of units, in one or more follow-up surveys. We explore the effect of misclassification of the transition indicator on parameter estimates in an appropriate statistical model for the duration spent in an origin state. Some empirical investigations about the bias induced when ignoring misclassification are reported, extending the model to include the possibility that the rate of misclassification can vary across units according to the value of some covariates. Finally it is shown how a Bayesian approach can lead to parameter estimates.  相似文献   

9.
Summary.  In functional data analysis, curves or surfaces are observed, up to measurement error, at a finite set of locations, for, say, a sample of n individuals. Often, the curves are homogeneous, except perhaps for individual-specific regions that provide heterogeneous behaviour (e.g. 'damaged' areas of irregular shape on an otherwise smooth surface). Motivated by applications with functional data of this nature, we propose a Bayesian mixture model, with the aim of dimension reduction, by representing the sample of n curves through a smaller set of canonical curves. We propose a novel prior on the space of probability measures for a random curve which extends the popular Dirichlet priors by allowing local clustering: non-homogeneous portions of a curve can be allocated to different clusters and the n individual curves can be represented as recombinations (hybrids) of a few canonical curves. More precisely, the prior proposed envisions a conceptual hidden factor with k -levels that acts locally on each curve. We discuss several models incorporating this prior and illustrate its performance with simulated and real data sets. We examine theoretical properties of the proposed finite hybrid Dirichlet mixtures, specifically, their behaviour as the number of the mixture components goes to ∞ and their connection with Dirichlet process mixtures.  相似文献   

10.
In data sets that consist of a large number of clusters, a frequent goal of the analysis is to detect whether heterogeneity exists between clusters. A standard approach is to model the heterogeneity in the framework of a mixture model and to derive a score test to detect heterogeneity. The likelihood function, from which the score test derives, depends heavily on the assumed density of the response variable. This paper examines the robustness of the heterogeneity test to misspecification of this density function when there is homogeneity and shows that the test size can be far different from the nominal level.  相似文献   

11.
Dynamic models for spatiotemporal data   总被引:1,自引:0,他引:1  
We propose a model for non-stationary spatiotemporal data. To account for spatial variability, we model the mean function at each time period as a locally weighted mixture of linear regressions. To incorporate temporal variation, we allow the regression coefficients to change through time. The model is cast in a Gaussian state space framework, which allows us to include temporal components such as trends, seasonal effects and autoregressions, and permits a fast implementation and full probabilistic inference for the parameters, interpolations and forecasts. To illustrate the model, we apply it to two large environmental data sets: tropical rainfall levels and Atlantic Ocean temperatures.  相似文献   

12.
Summary.  The literature on multivariate linear regression includes multivariate normal models, models that are used in survival analysis and a variety of models that are used in other areas such as econometrics. The paper considers the class of location–scale models, which includes a large proportion of the preceding models. It is shown that, for complete data, the maximum likelihood estimators for regression coefficients in a linear location–scale framework are consistent even when the joint distribution is misspecified. In addition, gains in efficiency arising from the use of a bivariate model, as opposed to separate univariate models, are studied. A major area of application for multivariate regression models is to clustered, 'parallel' lifetime data, so we also study the case of censored responses. Estimators of regression coefficients are no longer consistent under model misspecification, but we give simulation results that show that the bias is small in many practical situations. Gains in efficiency from bivariate models are also examined in the censored data setting. The methodology in the paper is illustrated by using lifetime data from the Diabetic Retinopathy Study.  相似文献   

13.
Semiparametric Bayesian classification with longitudinal markers   总被引:1,自引:0,他引:1  
Summary.  We analyse data from a study involving 173 pregnant women. The data are observed values of the β human chorionic gonadotropin hormone measured during the first 80 days of gestational age, including from one up to six longitudinal responses for each woman. The main objective in this study is to predict normal versus abnormal pregnancy outcomes from data that are available at the early stages of pregnancy. We achieve the desired classification with a semiparametric hierarchical model. Specifically, we consider a Dirichlet process mixture prior for the distribution of the random effects in each group. The unknown random-effects distributions are allowed to vary across groups but are made dependent by using a design vector to select different features of a single underlying random probability measure. The resulting model is an extension of the dependent Dirichlet process model, with an additional probability model for group classification. The model is shown to perform better than an alternative model which is based on independent Dirichlet processes for the groups. Relevant posterior distributions are summarized by using Markov chain Monte Carlo methods.  相似文献   

14.
Summary.  We propose an adaptive varying-coefficient spatiotemporal model for data that are observed irregularly over space and regularly in time. The model is capable of catching possible non-linearity (both in space and in time) and non-stationarity (in space) by allowing the auto-regressive coefficients to vary with both spatial location and an unknown index variable. We suggest a two-step procedure to estimate both the coefficient functions and the index variable, which is readily implemented and can be computed even for large spatiotemporal data sets. Our theoretical results indicate that, in the presence of the so-called nugget effect, the errors in the estimation may be reduced via the spatial smoothing—the second step in the estimation procedure proposed. The simulation results reinforce this finding. As an illustration, we apply the methodology to a data set of sea level pressure in the North Sea.  相似文献   

15.
Estimation in mixed linear models is, in general, computationally demanding, since applied problems may involve extensive data sets and large numbers of random effects. Existing computer algorithms are slow and/or require large amounts of memory. These problems are compounded in generalized linear mixed models for categorical data, since even approximate methods involve fitting of a linear mixed model within steps of an iteratively reweighted least squares algorithm. Only in models in which the random effects are hierarchically nested can the computations for fitting these models to large data sets be carried out rapidly. We describe a data augmentation approach to these computational difficulties in which we repeatedly fit an overlapping series of submodels, incorporating the missing terms in each submodel as 'offsets'. The submodels are chosen so that they have a nested random-effect structure, thus allowing maximum exploitation of the computational efficiency which is available in this case. Examples of the use of the algorithm for both metric and discrete responses are discussed, all calculations being carried out using macros within the MLwiN program.  相似文献   

16.
If the observations for fitting a polytomous logistic regression model satisfy certain normality assumptions, the maximum likelihood estimates of the regression coefficients are the discriminant function estimates. This article shows that these estimates, their unbiased counterparts, and associated test statistics for variable selection can be calculated using ordinary least squares regression techniques, thereby providing a convenient method for fitting logistic regression models in the normal case. Evidence is given indicating that the discriminant function estimates and test statistics merit wider use in nonnormal cases, especially in exploratory work on large data sets.  相似文献   

17.
Least squares regression models are often used to analyze unbalanced fixed effect data sets with u unique cells defined by design or by post-hoc stratification. Constraints exist among the regression coefficients if there are more coefficients than cells. Models with fewer linearly independent regression coefficients than cells or with empty cells impose constraints on estimated cell means. An easy method of determining constraints among the estimated cell means and among the estimated regression coefficients for any model is developed and illustrated using a small data set.  相似文献   

18.
Summary.  Multilevel modelling is sometimes used for data from complex surveys involving multistage sampling, unequal sampling probabilities and stratification. We consider generalized linear mixed models and particularly the case of dichotomous responses. A pseudolikelihood approach for accommodating inverse probability weights in multilevel models with an arbitrary number of levels is implemented by using adaptive quadrature. A sandwich estimator is used to obtain standard errors that account for stratification and clustering. When level 1 weights are used that vary between elementary units in clusters, the scaling of the weights becomes important. We point out that not only variance components but also regression coefficients can be severely biased when the response is dichotomous. The pseudolikelihood methodology is applied to complex survey data on reading proficiency from the American sample of the 'Program for international student assessment' 2000 study, using the Stata program gllamm which can estimate a wide range of multilevel and latent variable models. Performance of pseudo-maximum-likelihood with different methods for handling level 1 weights is investigated in a Monte Carlo experiment. Pseudo-maximum-likelihood estimators of (conditional) regression coefficients perform well for large cluster sizes but are biased for small cluster sizes. In contrast, estimators of marginal effects perform well in both situations. We conclude that caution must be exercised in pseudo-maximum-likelihood estimation for small cluster sizes when level 1 weights are used.  相似文献   

19.
Even though integer-valued time series are common in practice, the methods for their analysis have been developed only in recent past. Several models for stationary processes with discrete marginal distributions have been proposed in the literature. Such processes assume the parameters of the model to remain constant throughout the time period. However, this need not be true in practice. In this paper, we introduce non-stationary integer-valued autoregressive (INAR) models with structural breaks to model a situation, where the parameters of the INAR process do not remain constant over time. Such models are useful while modelling count data time series with structural breaks. The Bayesian and Markov Chain Monte Carlo (MCMC) procedures for the estimation of the parameters and break points of such models are discussed. We illustrate the model and estimation procedure with the help of a simulation study. The proposed model is applied to the two real biometrical data sets.  相似文献   

20.
A familyof partial likelihood logistic models is proposed for clusteredsurvival data that are reported in discrete time and that maybe censored. The possible dependence of individual survival timeswithin clusters is modeled, while distinct clusters are assumedto be independent. Two types of clusters are considered. First,all clusters have the same size and are identically distributed.Second, the clusters may vary in size. In both cases our asymptoticresults apply to a large number of small independent clusters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号