首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Population pharmacokinetics (POPPK) has many important uses at various stages of drug development and approval. At the phase III stage, one of the major uses of POPPK is to identify covariate influences on human pharmacokinetics, which is important for potential dose adjustment and drug labeling. One common analysis approach is nonlinear mixed‐effect modeling, which typically involves time‐consuming extensive search for best fits among a large number of possible models. We propose that the analysis goal can be better achieved with a more standard confirmatory statistical analysis approach, which uses a prespecified primary analysis and additional sensitivity analyses. We illustrate this approach using a phase III study data set and compare the result with that calculated using the common exploratory approach. We argue that the confirmatory approach not only substantially reduces analysis time but also yields more accurate and interpretable results. Some aspects of this confirmatory approach may also be extended to data analysis in earlier stages of clinical drug development, i.e. phase II and phase I. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

2.
The Cox proportional frailty model with a random effect has been proposed for the analysis of right-censored data which consist of a large number of small clusters of correlated failure time observations. For right-censored data, Cai et al. [3] proposed a class of semiparametric mixed-effects models which provides useful alternatives to the Cox model. We demonstrate that the approach of Cai et al. [3] can be used to analyze clustered doubly censored data when both left- and right-censoring variables are always observed. The asymptotic properties of the proposed estimator are derived. A simulation study is conducted to investigate the performance of the proposed estimator.  相似文献   

3.
Model-based clustering for social networks   总被引:5,自引:0,他引:5  
Summary.  Network models are widely used to represent relations between interacting units or actors. Network data often exhibit transitivity, meaning that two actors that have ties to a third actor are more likely to be tied than actors that do not, homophily by attributes of the actors or dyads, and clustering. Interest often focuses on finding clusters of actors or ties, and the number of groups in the data is typically unknown. We propose a new model, the latent position cluster model , under which the probability of a tie between two actors depends on the distance between them in an unobserved Euclidean 'social space', and the actors' locations in the latent social space arise from a mixture of distributions, each corresponding to a cluster. We propose two estimation methods: a two-stage maximum likelihood method and a fully Bayesian method that uses Markov chain Monte Carlo sampling. The former is quicker and simpler, but the latter performs better. We also propose a Bayesian way of determining the number of clusters that are present by using approximate conditional Bayes factors. Our model represents transitivity, homophily by attributes and clustering simultaneously and does not require the number of clusters to be known. The model makes it easy to simulate realistic networks with clustering, which are potentially useful as inputs to models of more complex systems of which the network is part, such as epidemic models of infectious disease. We apply the model to two networks of social relations. A free software package in the R statistical language, latentnet, is available to analyse data by using the model.  相似文献   

4.
The authors consider the optimal design of sampling schedules for binary sequence data. They propose an approach which allows a variety of goals to be reflected in the utility function by including deterministic sampling cost, a term related to prediction, and if relevant, a term related to learning about a treatment effect To this end, they use a nonparametric probability model relying on a minimal number of assumptions. They show how their assumption of partial exchangeability for the binary sequence of data allows the sampling distribution to be written as a mixture of homogeneous Markov chains of order k. The implementation follows the approach of Quintana & Müller (2004), which uses a Dirichlet process prior for the mixture.  相似文献   

5.
In data sets that consist of a large number of clusters, a frequent goal of the analysis is to detect whether heterogeneity exists between clusters. A standard approach is to model the heterogeneity in the framework of a mixture model and to derive a score test to detect heterogeneity. The likelihood function, from which the score test derives, depends heavily on the assumed density of the response variable. This paper examines the robustness of the heterogeneity test to misspecification of this density function when there is homogeneity and shows that the test size can be far different from the nominal level.  相似文献   

6.
The authors propose a profile likelihood approach to linear clustering which explores potential linear clusters in a data set. For each linear cluster, an errors‐in‐variables model is assumed. The optimization of the derived profile likelihood can be achieved by an EM algorithm. Its asymptotic properties and its relationships with several existing clustering methods are discussed. Methods to determine the number of components in a data set are adapted to this linear clustering setting. Several simulated and real data sets are analyzed for comparison and illustration purposes. The Canadian Journal of Statistics 38: 716–737; 2010 © 2010 Statistical Society of Canada  相似文献   

7.
An approach to the analysis of time-dependent ordinal quality score data from robust design experiments is developed and applied to an experiment from commercial horticultural research, using concepts of product robustness and longevity that are familiar to analysts in engineering research. A two-stage analysis is used to develop models describing the effects of a number of experimental treatments on the rate of post-sales product quality decline. The first stage uses a polynomial function on a transformed scale to approximate the quality decline for an individual experimental unit using derived coefficients and the second stage uses a joint mean and dispersion model to investigate the effects of the experimental treatments on these derived coefficients. The approach, developed specifically for an application in horticulture, is exemplified with data from a trial testing ornamental plants that are subjected to a range of treatments during production and home-life. The results of the analysis show how a number of control and noise factors affect the rate of post-production quality decline. Although the model is used to analyse quality data from a trial on ornamental plants, the approach developed is expected to be more generally applicable to a wide range of other complex production systems.  相似文献   

8.
ESTIMATION, PREDICTION AND INFERENCE FOR THE LASSO RANDOM EFFECTS MODEL   总被引:1,自引:0,他引:1  
The least absolute shrinkage and selection operator (LASSO) can be formulated as a random effects model with an associated variance parameter that can be estimated with other components of variance. In this paper, estimation of the variance parameters is performed by means of an approximation to the marginal likelihood of the observed outcomes. The approximation is based on an alternative but equivalent formulation of the LASSO random effects model. Predictions can be made using point summaries of the predictive distribution of the random effects given the data with the parameters set to their estimated values. The standard LASSO method uses the mode of this distribution as the predictor. It is not the only choice, and a number of other possibilities are defined and empirically assessed in this article. The predictive mode is competitive with the predictive mean (best predictor), but no single predictor performs best across in all situations. Inference for the LASSO random effects is performed using predictive probability statements, which are more appropriate under the random effects formulation than tests of hypothesis.  相似文献   

9.
Abstract. Lasso and other regularization procedures are attractive methods for variable selection, subject to a proper choice of shrinkage parameter. Given a set of potential subsets produced by a regularization algorithm, a consistent model selection criterion is proposed to select the best one among this preselected set. The approach leads to a fast and efficient procedure for variable selection, especially in high‐dimensional settings. Model selection consistency of the suggested criterion is proven when the number of covariates d is fixed. Simulation studies suggest that the criterion still enjoys model selection consistency when d is much larger than the sample size. The simulations also show that our approach for variable selection works surprisingly well in comparison with existing competitors. The method is also applied to a real data set.  相似文献   

10.
Spatial econometric models estimated on the big geo-located point data have at least two problems: limited computational capabilities and inefficient forecasting for the new out-of-sample geo-points. This is because of spatial weights matrix W defined for in-sample observations only and the computational complexity. Machine learning models suffer the same when using kriging for predictions; thus this problem still remains unsolved. The paper presents a novel methodology for estimating spatial models on big data and predicting in new locations. The approach uses bootstrap and tessellation to calibrate both model and space. The best bootstrapped model is selected with the PAM (Partitioning Around Medoids) algorithm by classifying the regression coefficients jointly in a nonindependent manner. Voronoi polygons for the geo-points used in the best model allow for a representative space division. New out-of-sample points are assigned to tessellation tiles and linked to the spatial weights matrix as a replacement for an original point what makes feasible usage of calibrated spatial models as a forecasting tool for new locations. There is no trade-off between forecast quality and computational efficiency in this approach. An empirical example illustrates a model for business locations and firms' profitability.  相似文献   

11.
Modeling clustered categorical data based on extensions of generalized linear model theory has received much attention in recent years. The rapidly increasing number of approaches suitable for categorical data in which clusters are uncorrelated, but correlations exist within a cluster, has caused uncertainty among applied scientists as to their respective merits and demerits. Upon centering estimation around solving an unbiased estimating function for mean parameters and estimation of covariance parameters describing within-cluster or among-cluster heterogeneity, many approaches can easily be related. This contribution describes a series of algorithms and their implementation in detail, based on a classification of inferential procedures for clustered data.  相似文献   

12.
Block clustering with collapsed latent block models   总被引:1,自引:0,他引:1  
We introduce a Bayesian extension of the latent block model for model-based block clustering of data matrices. Our approach considers a block model where block parameters may be integrated out. The result is a posterior defined over the number of clusters in rows and columns and cluster memberships. The number of row and column clusters need not be known in advance as these are sampled along with cluster memberhips using Markov chain Monte Carlo. This differs from existing work on latent block models, where the number of clusters is assumed known or is chosen using some information criteria. We analyze both simulated and real data to validate the technique.  相似文献   

13.
The performance of several test statistics for comparing vectors of propor tions from certain survey data was compared. The statistics were used to analyze a subsample of data from the 'High School and Beyond' survey. These tests include the Wald test statistic X2w and the modified Wald test statistic FW, the chi-squared test statistic X2rSB and its modification FRSB, a test X2dmb based on a probability model, and a method of moments approach, X2H. Data were also simulated based on two-stage cluster sampling design and the type I error level, and the power of these tests was obtained for selected combinations of parameter values. The statistics X2DMB XRSB, FRSB and X2H performed well both for a small number of clusters or a small number of units within clusters. The power performance of these tests is quite stable. Approximate intervals were constructed for design effect constants. Methods of estimating these constants based on a normality assumption worked best.  相似文献   

14.
Multivariate failure time data arise when data consist of clusters in which the failure times may be dependent. A popular approach to such data is the marginal proportional hazards model with estimation under the working independence assumption. In this paper, we consider the Clayton–Oakes model with marginal proportional hazards and use the full model structure to improve on efficiency compared with the independence analysis. We derive a likelihood based estimating equation for the regression parameters as well as for the correlation parameter of the model. We give the large sample properties of the estimators arising from this estimating equation. Finally, we investigate the small sample properties of the estimators through Monte Carlo simulations.  相似文献   

15.
In functional magnetic resonance imaging, spatial activation patterns are commonly estimated using a non-parametric smoothing approach. Significant peaks or clusters in the smoothed image are subsequently identified by testing the null hypothesis of lack of activation in every volume element of the scans. A weakness of this approach is the lack of a model for the activation pattern; this makes it difficult to determine the variance of estimates, to test specific neuroscientific hypotheses or to incorporate prior information about the brain area under study in the analysis. These issues may be addressed by formulating explicit spatial models for the activation and using simulation methods for inference. We present one such approach, based on a marked point process prior. Informally, one may think of the points as centres of activation, and the marks as parameters describing the shape and area of the surrounding cluster. We present an MCMC algorithm for making inference in the model and compare the approach with a traditional non-parametric method, using both simulated and visual stimulation data. Finally we discuss extensions of the model and the inferential framework to account for non-stationary responses and spatio-temporal correlation.  相似文献   

16.
We analyze the multivariate spatial distribution of plant species diversity, distributed across three ecologically distinct land uses, the urban residential, urban non-residential, and desert. We model these data using a spatial generalized linear mixed model. Here plant species counts are assumed to be correlated within and among the spatial locations. We implement this model across the Phoenix metropolis and surrounding desert. Using a Bayesian approach, we utilized the Langevin–Hastings hybrid algorithm. Under a generalization of a spatial log-Gaussian Cox model, the log-intensities of the species count processes follow Gaussian distributions. The purely spatial component corresponding to these log-intensities are jointly modeled using a cross-convolution approach, in order to depict a valid cross-correlation structure. We observe that this approach yields non-stationarity of the model ensuing from different land use types. We obtain predictions of various measures of plant diversity including plant richness and the Shannon–Weiner diversity at observed locations. We also obtain a prediction framework for plant preferences in urban and desert plots.  相似文献   

17.
The latent class model or multivariate multinomial mixture is a powerful approach for clustering categorical data. It uses a conditional independence assumption given the latent class to which a statistical unit is belonging. In this paper, we exploit the fact that a fully Bayesian analysis with Jeffreys non-informative prior distributions does not involve technical difficulty to propose an exact expression of the integrated complete-data likelihood, which is known as being a meaningful model selection criterion in a clustering perspective. Similarly, a Monte Carlo approximation of the integrated observed-data likelihood can be obtained in two steps: an exact integration over the parameters is followed by an approximation of the sum over all possible partitions through an importance sampling strategy. Then, the exact and the approximate criteria experimentally compete, respectively, with their standard asymptotic BIC approximations for choosing the number of mixture components. Numerical experiments on simulated data and a biological example highlight that asymptotic criteria are usually dramatically more conservative than the non-asymptotic presented criteria, not only for moderate sample sizes as expected but also for quite large sample sizes. This research highlights that asymptotic standard criteria could often fail to select some interesting structures present in the data.  相似文献   

18.
Clustered (longitudinal) count data arise in many bio-statistical practices in which a number of repeated count responses are observed on a number of individuals. The repeated observations may also represent counts over time from a number of individuals. One important problem that arises in practice is to test homogeneity within clusters (individuals) and between clusters (individuals). As data within clusters are observations of repeated responses, the count data may be correlated and/or over-dispersed. For over-dispersed count data with unknown over-dispersion parameter we derive two score tests by assuming a random intercept model within the framework of (i) the negative binomial mixed effects model and (ii) the double extended quasi-likelihood mixed effects model (Lee and Nelder, 2001). These two statistics are much simpler than a statistic derived by Jacqmin-Gadda and Commenges (1995) under the framework of the over-dispersed generalized linear model. The first statistic takes the over-dispersion more directly into the model and therefore is expected to do well when the model assumptions are satisfied and the other statistic is expected to be robust. Simulations show superior level property of the statistics derived under the negative binomial and double extended quasi-likelihood model assumptions. A data set is analyzed and a discussion is given.  相似文献   

19.
In this work, we develop modeling and estimation approach for the analysis of cross-sectional clustered data with multimodal conditional distributions where the main interest is in analysis of subpopulations. It is proposed to model such data in a hierarchical model with conditional distributions viewed as finite mixtures of normal components. With a large number of observations in the lowest level clusters, a two-stage estimation approach is used. In the first stage, the normal mixture parameters in each lowest level cluster are estimated using robust methods. Robust alternatives to the maximum likelihood estimation are used to provide stable results even for data with conditional distributions such that their components may not quite meet normality assumptions. Then the lowest level cluster-specific means and standard deviations are modeled in a mixed effects model in the second stage. A small simulation study was conducted to compare performance of finite normal mixture population parameter estimates based on robust and maximum likelihood estimation in stage 1. The proposed modeling approach is illustrated through the analysis of mice tendon fibril diameters data. Analyses results address genotype differences between corresponding components in the mixtures and demonstrate advantages of robust estimation in stage 1.  相似文献   

20.
In this article, we propose a Bayesian approach to estimate the multiple structural change-points in a level and the trend when the number of change-points is unknown. Our formulation of the structural-change model involves a binary discrete variable that indicates the structural change. The determination of the number and the form of structural changes are considered as a model selection issue in Bayesian structural-change analysis. We apply an advanced Monte Carlo algorithm, the stochastic approximation Monte Carlo (SAMC) algorithm, to this structural-change model selection issue. SAMC effectively functions for the complex structural-change model estimation, since it prevents entrapment in local posterior mode. The estimation of the model parameters in each regime is made using the Gibbs sampler after each change-point is detected. The performance of our proposed method has been investigated on simulated and real data sets, a long time series of US real gross domestic product, US uses of force between 1870 and 1994 and 1-year time series of temperature in Seoul, South Korea.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号