期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Joint modelling of longitudinal binary data and survival data

Yi-Ting Hwang Chia-Hui Huang Chun-Chao Wang Tzu-Yin Lin Yi-Kuan Tseng 《Journal of applied statistics》2019,46(13):2357-2371

The medical costs in an ageing society substantially increase when the incidences of chronic diseases, disabilities and inability to live independently are high. Healthy lifestyles not only affect elderly individuals but also influence the entire community. When assessing treatment efficacy, survival and quality of life should be considered simultaneously. This paper proposes the joint likelihood approach for modelling survival and longitudinal binary covariates simultaneously. Because some unobservable information is present in the model, the Monte Carlo EM algorithm and Metropolis-Hastings algorithm are used to find the estimators. Monte Carlo simulations are performed to evaluate the performance of the proposed model based on the accuracy and precision of the estimates. Real data are used to demonstrate the feasibility of the proposed model. 相似文献

2.

Regression models for cyclic data

Graham J. G. Upton Kenneth A. Hickey Aaron Stallard 《Journal of the Royal Statistical Society. Series C, Applied statistics》2003,52(2):227-235

Summary. Geologists take slices through rock samples at a series of different orientations. Each slice is examined for a phenomenon which may occur in one of two states labelled clockwise and anticlockwise. The probability of an occurrence being in the clockwise state is dependent on the orientation. Motivated by these data, two models are presented that relate the probability of an event to orientation. Each model has two parameters, one identifying the orientation corresponding to the peak probability, and the other controlling the rate of change of probability with orientation. One model is a logistic model, whereas the other involves a power of the sine function. For the given data neither model consistently outperforms the other. 相似文献

3.

Approximate estimation in nonlinear panel data models

Offer Lieberman Laszlo Matyas 《统计学通讯:模拟与计算》2013,42(3):1177-1195

This paper is concerned with the estimation of a general class of nonlinear panel data models in which the conditional distribution of the dependent variable and the distribution of the heterogeneity factors are arbitrary. In general, exact analytical results for this problem do not exist. Here, Laplace and small-sigma appriximations for the marginal likelihood are presented. The computation of the MLE from both approximations is straightforward. It is shown that the accuracy of the Laplace approximation depends on both the sample size and the variance of the individual effects, whereas the accuracy of the small-sigma approximation is 0(1) with respect to the sample size. The results are applied to count, duration and probit panel data models. The accuracy of the approximations is evaluated through a Monte Carlo simulation experiment. The approximations are also applied in an analysis of youth unemployment in Australia. 相似文献

4.

High dimensional multivariate mixed models for binary questionnaire data 总被引：1，自引：0，他引：1

Steffen Fieuws Geert Verbeke Filip Boen Christophe Delecluse 《Journal of the Royal Statistical Society. Series C, Applied statistics》2006,55(4):449-460

Summary. Questionnaires that are used to measure the effect of an intervention often consist of different sets of items, each set possibly measuring another concept. Mixed models with set-specific random effects are a flexible tool to model the different sets of items jointly. However, computational problems typically arise as the number of sets increases. This is especially true when the random-effects distribution cannot be integrated out analytically, as with mixed models for binary data. A pairwise modelling strategy, in which all possible bivariate mixed models are fitted and where inference follows from pseudolikelihood theory, has been proposed as a solution. This approach has been applied to assess the effect of physical activity on psychocognitive functioning, the latter measured by a battery of questionnaires. 相似文献

5.

Microeconometric models and anonymized micro data

Gerd Ronning 《Allgemeines Statistisches Archiv》2006,90(1):153-166

Summary The paper first provides a short review of the most common microeconometric models including logit, probit, discrete choice, duration models, models for count data and Tobit-type models. In the second part we consider the situation that the micro data have undergone some anonymization procedure which has become an important issue since otherwise confidentiality would not be guaranteed. We shortly describe the most important approaches for data protection which also can be seen as creating errors of measurement by purpose. We also consider the possibility of correcting the estimation procedure while taking into account the anonymization procedure. We illustrate this for the case of binary data which are anonymized by ‘post-randomization’ and which are used in a probit model. We show the effect of ‘naive’ estimation, i. e. when disregarding the anonymization procedure. We also show that a ‘corrected’ estimate is available which is satisfactory in statistical terms. This is also true if parameters of the anonymization procedure have to be estimated, too. Research in this paper is related to the project “Faktische Anonymisierung wirtschaftsstatistischer Einzeldaten” financed by German Ministry of Research and Technology. 相似文献

6.

Segmentation of circular data

Douglas M. Hawkins 《Journal of applied statistics》2015,42(1):88-97

Circular data – data whose values lie in the interval [0,2π) – are important in a number of application areas. In some, there is a suspicion that a sequence of circular readings may contain two or more segments following different models. An analysis may then seek to decide whether there are multiple segments, and if so, to estimate the changepoints separating them. This paper presents an optimal method for segmenting sequences of data following the von Mises distribution. It is shown by example that the method is also successful in data following a distribution with much heavier tails. 相似文献

7.

Semiparametric mixed-effects models for clustered doubly censored data

Pao-Sheng Shen 《Journal of applied statistics》2012,39(9):1881-1892

The Cox proportional frailty model with a random effect has been proposed for the analysis of right-censored data which consist of a large number of small clusters of correlated failure time observations. For right-censored data, Cai et al. [3] proposed a class of semiparametric mixed-effects models which provides useful alternatives to the Cox model. We demonstrate that the approach of Cai et al. [3] can be used to analyze clustered doubly censored data when both left- and right-censoring variables are always observed. The asymptotic properties of the proposed estimator are derived. A simulation study is conducted to investigate the performance of the proposed estimator. 相似文献

8.

An extended random-effects approach to modeling repeated, overdispersed count data

Molenberghs G Verbeke G Demétrio CG 《Lifetime data analysis》2007,13(4):513-531

Non-Gaussian outcomes are often modeled using members of the so-called exponential family. The Poisson model for count data falls within this tradition. The family in general, and the Poisson model in particular, are at the same time convenient since mathematically elegant, but in need of extension since often somewhat restrictive. Two of the main rationales for existing extensions are (1) the occurrence of overdispersion, in the sense that the variability in the data is not adequately captured by the model's prescribed mean-variance link, and (2) the accommodation of data hierarchies owing to, for example, repeatedly measuring the outcome on the same subject, recording information from various members of the same family, etc. There is a variety of overdispersion models for count data, such as, for example, the negative-binomial model. Hierarchies are often accommodated through the inclusion of subject-specific, random effects. Though not always, one conventionally assumes such random effects to be normally distributed. While both of these issues may occur simultaneously, models accommodating them at once are less than common. This paper proposes a generalized linear model, accommodating overdispersion and clustering through two separate sets of random effects, of gamma and normal type, respectively. This is in line with the proposal by Booth et al. (Stat Model 3:179-181, 2003). The model extends both classical overdispersion models for count data (Breslow, Appl Stat 33:38-44, 1984), in particular the negative binomial model, as well as the generalized linear mixed model (Breslow and Clayton, J Am Stat Assoc 88:9-25, 1993). Apart from model formulation, we briefly discuss several estimation options, and then settle for maximum likelihood estimation with both fully analytic integration as well as hybrid between analytic and numerical integration. The latter is implemented in the SAS procedure NLMIXED. The methodology is applied to data from a study in epileptic seizures. 相似文献

9.

Residual analysis for parametric models fit to censored data

L.A. Weissfeld H. Schneider 《统计学通讯:理论与方法》2013,42(8):2283-2297

The focus of this paper is on residual analysis for the lognormal and extreme value or Weibull models, although the proposed methods can be applied to any parametric model. Residuals developed by Barlow and Prentice (1988) for the Cox proportional hazards model are extended to the parametric model setting. Three different residuals are proposed based on this approach with two residuals measuring the impact of survival time and one measuring the impact of the covariates included in the model. In addition, a residual derived from the deviations equality presented in Efron and Johnstone (1990) and the residual proposed by Joergensen (1984) for censored data models are discussed. 相似文献

10.

Interval-valued data regression using partial linear model

Yuan Wei Huiwen Wang 《Journal of Statistical Computation and Simulation》2017,87(16):3175-3194

Semi-parametric modelling of interval-valued data is of great practical importance, as exampled by applications in economic and financial data analysis. We propose a flexible semi-parametric modelling of interval-valued data by integrating the partial linear regression model based on the Center & Range method, and investigate its estimation procedure. Furthermore, we introduce a test statistic that allows one to decide between a parametric linear model and a semi-parametric model, and approximate its null asymptotic distribution based on wild Bootstrap method to obtain the critical values. Extensive simulation studies are carried out to evaluate the performance of the proposed methodology and the new test. Moreover, several empirical data sets are analysed to document its practical applications. 相似文献

11.

Bayesian analysis of generalized odds-rate hazards models for survival data

Banerjee T Chen MH Dey DK Kim S 《Lifetime data analysis》2007,13(2):241-260

In the analysis of censored survival data Cox proportional hazards model (1972) is extremely popular among the practitioners. However, in many real-life situations the proportionality of the hazard ratios does not seem to be an appropriate assumption. To overcome such a problem, we consider a class of nonproportional hazards models known as generalized odds-rate class of regression models. The class is general enough to include several commonly used models, such as proportional hazards model, proportional odds model, and accelerated life time model. The theoretical and computational properties of these models have been re-examined. The propriety of the posterior has been established under some mild conditions. A simulation study is conducted and a detailed analysis of the data from a prostate cancer study is presented to further illustrate the proposed methodology. 相似文献

12.

Fast two-stage estimator for clustered count data with overdispersion

Alvaro J. Flórez Geert Molenberghs Geert Verbeke Michael G. Kenward Pavlos Mamouris Bert Vaes 《Journal of Statistical Computation and Simulation》2019,89(14):2678-2693

Clustered count data are commonly analysed by the generalized linear mixed model (GLMM). Here, the correlation due to clustering and some overdispersion is captured by the inclusion of cluster-specific normally distributed random effects. Often, the model does not capture the variability completely. Therefore, the GLMM can be extended by including a set of gamma random effects. Routinely, the GLMM is fitted by maximizing the marginal likelihood. However, this process is computationally intensive. Although feasible with medium to large data, it can be too time-consuming or computationally intractable with very large data. Therefore, a fast two-stage estimator for correlated, overdispersed count data is proposed. It is rooted in the split-sample methodology. Based on a simulation study, it shows good statistical properties. Furthermore, it is computationally much faster than the full maximum likelihood estimator. The approach is illustrated using a large dataset belonging to a network of Belgian general practices. 相似文献

13.

Gaussian copula distributions for mixed data,with application in discrimination

《Journal of Statistical Computation and Simulation》2012,82(9):1643-1659

The construction of a joint model for mixed discrete and continuous random variables that accounts for their associations is an important statistical problem in many practical applications. In this paper, we use copulas to construct a class of joint distributions of mixed discrete and continuous random variables. In particular, we employ the Gaussian copula to generate joint distributions for mixed variables. Examples include the robit-normal and probit-normal-exponential distributions, the first for modelling the distribution of mixed binary-continuous data and the second for a mixture of continuous, binary and trichotomous variables. The new class of joint distributions is general enough to include many mixed-data models currently available. We study properties of the distributions and outline likelihood estimation; a small simulation study is used to investigate the finite-sample properties of estimates obtained by full and pairwise likelihood methods. Finally, we present an application to discriminant analysis of multiple correlated binary and continuous data from a study involving advanced breast cancer patients. 相似文献

14.

Latent variable techniques for categorical data

Lancaster Gillian Green Mick 《Statistics and Computing》2002,12(2):153-161

Two useful statistical methods for generating a latent variable are described and extended to incorporate polytomous data and additional covariates. Item response analysis is not well-known outside its area of application, mainly because the procedures to fit the models are computer intensive and not routinely available within general statistical software packages. The linear score technique is less computer intensive, straightforward to implement and has been proposed as a good approximation to item response analysis. Both methods have been implemented in the standard statistical software package GLIM 4.0, and are compared to determine their effectiveness. 相似文献

15.

Modeling longitudinal count data with dropouts

Mohamed Alosh 《Pharmaceutical statistics》2010,9(1):35-45

This paper explores the utility of different approaches for modeling longitudinal count data with dropouts arising from a clinical study for the treatment of actinic keratosis lesions on the face and balding scalp. A feature of these data is that as the disease for subjects on the active arm improves their data show larger dispersion compared with those on the vehicle, exhibiting an over‐dispersion relative to the Poisson distribution. After fitting the marginal (or population averaged) model using the generalized estimating equation (GEE), we note that inferences from such a model might be biased as dropouts are treatment related. Then, we consider using a weighted GEE (WGEE) where each subject's contribution to the analysis is weighted inversely by the subject's probability of dropout. Based on the model findings, we argue that the WGEE might not address the concerns about the impact of dropouts on the efficacy findings when dropouts are treatment related. As an alternative, we consider likelihood‐based inference where random effects are added to the model to allow for heterogeneity across subjects. Finally, we consider a transition model where, unlike the previous approaches that model the log‐link function of the mean response, we model the subject's actual lesion counts. This model is an extension of the Poisson autoregressive model of order 1, where the autoregressive parameter is taken to be a function of treatment as well as other covariates to induce different dispersions and correlations for the two treatment arms. We conclude with a discussion about model selection. Published in 2009 by John Wiley & Sons, Ltd. 相似文献

16.

A multicollinearity diagnostic for models fit to censored data

Lisa A. Weissfeld 《统计学通讯:理论与方法》2013,42(6):2073-2085

A multicollinearity diagnostic is discussed for parametric models fit to censored data. The models considered include the Weibull, exponential and lognormal models as well as the Cox proportional hazards model. This diagnostic is an extension of the diagnostic proposed by Belsley, Kuh, and Welsch (1980). The diagnostic is based on the condition indicies and variance proportions of the variance covariance matrix. Its use and properties are studied through a series of examples. The effect of centering variables included in model is also discussed. 相似文献

17.

Multivariate regression analysis of panel data with binary outcomes applied to unemployment data

Claudia Czado 《Statistical Papers》2000,41(3):281-304

Summary In panel studies binary outcome measures together with time stationary and time varying explanatory variables are collected over time on the same individual. Therefore, a regression analysis for this type of data must allow for the correlation among the outcomes of an individual. The multivariate probit model of Ashford and Sowden (1970) was the first regression model for multivariate binary responses. However, a likelihood analysis of the multivariate probit model with general correlation structure for higher dimensions is intractable due to the maximization over high dimensional integrals thus severely restricting ist applicability so far. Czado (1996) developed a Markov Chain Monte Carlo (MCMC) algorithm to overcome this difficulty. In this paper we present an application of this algorithm to unemployment data from the Panel Study of Income Dynamics involving 11 waves of the panel study. In addition we adapt Bayesian model checking techniques based on the posterior predictive distribution (see for example Gelman et al. (1996)) for the multivariate probit model. These help to identify mean and correlation specification which fit the data well. C. Czado was supported by research grant OGP0089858 of the Natural Sciences and Engineering Research Council of Canada. 相似文献

18.

Model selection criteria for dual-inflated data

Ting Hsiang Lin Min-Hsiao Tsai 《Journal of Statistical Computation and Simulation》2016,86(13):2663-2672

ABSTRACT

Inflated data are prevalent in many situations and a variety of inflated models with extensions have been derived to fit data with excessive counts of some particular responses. The family of information criteria (IC) has been used to compare the fit of models for selection purposes. Yet despite the common use in statistical applications, there are not too many studies evaluating the performance of IC in inflated models. In this study, we studied the performance of IC for data with dual-inflated data. The new zero- and K-inflated Poisson (ZKIP) regression model and conventional inflated models including Poisson regression and zero-inflated Poisson (ZIP) regression were fitted for dual-inflated data and the performance of IC were compared. The effect of sample sizes and the proportions of inflated observations towards selection performance were also examined. The results suggest that the Bayesian information criterion (BIC) and consistent Akaike information criterion (CAIC) are more accurate than the Akaike information criterion (AIC) in terms of model selection when the true model is simple (i.e. Poisson regression (POI)). For more complex models, such as ZIP and ZKIP, the AIC was consistently better than the BIC and CAIC, although it did not reach high levels of accuracy when sample size and the proportion of zero observations were small. The AIC tended to over-fit the data for the POI, whereas the BIC and CAIC tended to under-parameterize the data for ZIP and ZKIP. Therefore, it is desirable to study other model selection criteria for dual-inflated data with small sample size. 相似文献

19.

Piecewise Cox models with right-censored data

George Y. C. Wong Michael P. Osborne Qinggang Diao 《统计学通讯:模拟与计算》2017,46(10):7894-7908

We study a general class of piecewise Cox models. We discuss the computation of the semi-parametric maximum likelihood estimates (SMLE) of the parameters, with right-censored data, and a simplified algorithm for the maximum partial likelihood estimates (MPLE). Our simulation study suggests that the relative efficiency of the PMLE of the parameter to the SMLE ranges from 96% to 99.9%, but the relative efficiency of the existing estimators of the baseline survival function to the SMLE ranges from 3% to 24%. Thus, the SMLE is much better than the existing estimators. 相似文献

20.

An analysis of hourly deposition data

Paul Switzer 《Revue canadienne de statistique》1988,16(1):39-50

A scavenging-type model is propsed for the analysis of hourly data of nitrate or other pollutants in precipitation. The model would, in principle, allow for the separate estimation of the cloud-water pollutant contribution and the air-column contribution. In turn, these may be interpreted as regional and local contributions. As an illustration, the model was fitted to data from four precipitation events. Estimation of the model parameters was done on an event-specific basis except for pooling of the scavenging rate parameter. 相似文献