期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Identification of Salmonella high risk pig-herds in Belgium by using semiparametric quantile regression

Kaatje Bollaerts Marc Aerts Stefaan Ribbens Yves Van der Stede Ides Boone Koen Mintiens 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2008,171(2):449-464

Summary. Consumption of pork that is contaminated with Salmonella is an important source of human salmonellosis world wide. To control and prevent salmonellosis, Belgian pig-herds with high Salmonella infection burden are encouraged to take part in a control programme supporting the implementation of control measures. The Belgian government decided that only the 10% of pig-herds with the highest Salmonella infection burden (denoted high risk herds) can participate. To identify these herds, serological data reported as sample-to-positive ratios (SP-ratios) are collected. However, SP-ratios have an extremely skewed distribution and are heavily subject to confounding seasonal and animal age effects. Therefore, we propose to identify the 10% high risk herds by using semiparametric quantile regression with P -splines. In particular, quantile curves of animal SP-ratios are estimated as a function of sampling time and animal age. Then, pigs are classified into low and high risk animals with high risk animals having an SP-ratio that is larger than the corresponding estimated upper quantile. Finally, for each herd, the number of high risk animals is calculated as well as the beta–binomial p -value reflecting the hypothesis that the Salmonella infection burden is higher in that herd compared with the other herds. The 10% pig-herds with the lowest p -values are then identified as high risk herds. In addition, since high risk herds are supported to implement control measures, a risk factor analysis is conducted by using binomial generalized linear mixed models to investigate factors that are associated with decreased or increased Salmonella infection burden. Finally, since the choice of a specific upper quantile is to a certain extent arbitrary, a sensitivity analysis is conducted comparing different choices of upper quantiles. 相似文献

2.

Modelling forces of infection by using monotone local polynomials

Ziv Shkedy Marc Aerts Geert Molenberghs Philippe Beutels Pierre Van Damme 《Journal of the Royal Statistical Society. Series C, Applied statistics》2003,52(4):469-485

Summary. On the basis of serological data from prevalence studies of rubella, mumps and hepatitis A, the paper describes a flexible local maximum likelihood method for the estimation of the rate at which susceptible individuals acquire infection at different ages. In contrast with parametric models that have been used before in the literature, the local polynomial likelihood method allows this age-dependent force of infection to be modelled without making any assumptions about the parametric structure. Moreover, this method allows for simultaneous nonparametric estimation of age-specific incidence and prevalence. Unconstrained models may lead to negative estimates for the force of infection at certain ages. To overcome this problem and to guarantee maximal flexibility, the local smoother can be constrained to be monotone. It turns out that different parametric and nonparametric estimates of the force of infection can exhibit considerably different qualitative features like location and the number of maxima, emphasizing the importance of a well-chosen flexible statistical model. 相似文献

3.

Conditional and marginal estimates in case-control family data – extensions and sensitivity analyses

《Journal of Statistical Computation and Simulation》2012,82(10):1449-1470

This work considers two specific estimation techniques for the family-specific proportional hazards model and for the population-averaged proportional hazards model. So far, these two estimation procedures were presented and studied under the gamma frailty distribution mainly because of its simple interpretation and mathematical tractability. Modifications of both procedures for other frailty distributions, such as the inverse Gaussian, positive stable and a specific case of discrete distribution, are presented. By extensive simulations, it is shown that under the family-specific proportional hazards model, the gamma frailty model appears to be robust to frailty distribution mis-specification in both bias and efficiency loss in the marginal parameters. The population-averaged proportional hazards model, is found to be robust under the gamma frailty model mis-specification only under moderate or weak dependency within cluster members. 相似文献

4.

A Comparison of Correlation Structure Selection Penalties for Generalized Estimating Equations

Philip M. Westgate Woodrow W. Burchett 《The American statistician》2017,71(4):344-353

Correlated data are commonly analyzed using models constructed using population-averaged generalized estimating equations (GEEs). The specification of a population-averaged GEE model includes selection of a structure describing the correlation of repeated measures. Accurate specification of this structure can improve efficiency, whereas the finite-sample estimation of nuisance correlation parameters can inflate the variances of regression parameter estimates. Therefore, correlation structure selection criteria should penalize, or account for, correlation parameter estimation. In this article, we compare recently proposed penalties in terms of their impacts on correlation structure selection and regression parameter estimation, and give practical considerations for data analysts. Supplementary materials for this article are available online. 相似文献

5.

Flexible Bivariate INAR(1) Processes Using Copulas

Dimitris Karlis Xanthi Pedeli 《统计学通讯:理论与方法》2013,42(4):723-740

Multivariate count time series data occur in many different disciplines. The class of INteger-valued AutoRegressive (INAR) processes has the great advantage to consider explicitly both the discreteness and autocorrelation characterizing this type of data. Moreover, extensions of the simple INAR(1) model to the multi-dimensional space make it possible to model more than one series simultaneously. However, existing models do not offer great flexibility for dependence modelling, allowing only for positive correlation. In this work, we consider a bivariate INAR(1) (BINAR(1)) process where cross-correlation is introduced through the use of copulas for the specification of the joint distribution of the innovations. We mainly emphasize on the parametric case that arises under the assumption of Poisson marginals. Other marginal distributions are also considered. A short application on a bivariate financial count series illustrates the model. 相似文献

6.

A new four-parameter lifetime distribution

《Journal of Statistical Computation and Simulation》2012,82(2):248-263

A new four-parameter distribution is introduced. It appears to be a distribution allowing for and only allowing for monotonically increasing, bathtub-shaped and upside down bathtub-shaped hazard rates. It contains as particular cases many of the known lifetime distributions. Some mathematical properties of the new distribution, including estimation procedures by the method of maximum likelihood are derived. Some simulations are run to assess the performance of the maximum-likelihood estimators. Finally, the flexibility of the new distribution is illustrated using a real data set. 相似文献

7.

Model Order Estimation of a Multivariable Stochastic Process

Joseph Lardies Zaka Ratsimalahelo 《统计学通讯:模拟与计算》2013,42(4):863-877

ABSTRACT

This article presents a procedure allowing us to estimate the minimal order of a state-space representation, for a multivariable stochastic process, from a sequence of observations. The method proposes a statistical rule for testing the rank of a block Hankel matrix of data, since this rank is related to the order of the process. A new information criterion is then developed and used to decide upon the order of the model. In this article we generalize the Aoki C-test. Using two representative data sets as the basis for a Monte Carlo experiment and real data based on Danish economy, we estimate the order of multivariable stochastic processes. 相似文献

8.

A BRANCHING PROCESS MODEL FOR FLOW CYTOMETRY AND BUDDING INDEX MEASUREMENTS IN CELL SYNCHRONY EXPERIMENTS

Orlando DA Iversen ES Hartemink AJ Haase SB 《The annals of applied statistics》2009,3(4):1521-1541

We present a flexible branching process model for cell population dynamics in synchrony/time-series experiments used to study important cellular processes. Its formulation is constructive, based on an accounting of the unique cohorts in the population as they arise and evolve over time, allowing it to be written in closed form. The model can attribute effects to subsets of the population, providing flexibility not available using the models historically applied to these populations. It provides a tool for in silico synchronization of the population and can be used to deconvolve population-level experimental measurements, such as temporal expression profiles. It also allows for the direct comparison of assay measurements made from multiple experiments. The model can be fit either to budding index or DNA content measurements, or both, and is easily adaptable to new forms of data. The ability to use DNA content data makes the model applicable to almost any organism. We describe the model and illustrate its utility and flexibility in a study of cell cycle progression in the yeast Saccharomyces cerevisiae. 相似文献

9.

A finite mixture model for multivariate counts under endogenous selectivity

Marco Alfò Antonello Maruotti Giovanni Trovato 《Statistics and Computing》2011,21(2):185-202

We describe a selection model for multivariate counts, where association between the primary outcomes and the endogenous selection source is modeled through outcome-specific latent effects which are assumed to be dependent across equations. Parametric specifications of this model already exist in the literature; in this paper, we show how model parameters can be estimated in a finite mixture context. This approach helps us to consider overdispersed counts, while allowing for multivariate association and endogeneity of the selection variable. In this context, attention is focused both on bias in estimated effects when exogeneity of selection (treatment) variable is assumed, as well as on consistent estimation of the association between the random effects in the primary and in the treatment effect models, when the latter is assumed endogeneous. The model behavior is investigated through a large scale simulation experiment. An empirical example on health care utilization data is provided. 相似文献

10.

Bivariate frailty model for the analysis of multivariate survival time

Xiaonan Xue Ron Brookmeyer 《Lifetime data analysis》1996,2(3):277-289

Because of limitations of the univariate frailty model in analysis of multivariate survival data, a bivariate frailty model is introduced for the analysis of bivariate survival data. This provides tremendous flexibility especially in allowing negative associations between subjects within the same cluster. The approach involves incorporating into the model two possibly correlated frailties for each cluster. The bivariate lognormal distribution is used as the frailty distribution. The model is then generalized to multivariate survival data with two distinguished groups and also to alternating process data. A modified EM algorithm is developed with no requirement of specification of the baseline hazards. The estimators are generalized maximum likelihood estimators with subject-specific interpretation. The model is applied to a mental health study on evaluation of health policy effects for inpatient psychiatric care. 相似文献

11.

Semiparametric Hierarchical Composite Quantile Regression

Yanliang Chen Man-Lai Tang Maozai Tian 《统计学通讯:理论与方法》2013,42(5):996-1012

In biological, medical, and social sciences, multilevel structures are very common. Hierarchical models that take the dependencies among subjects within the same level are necessary. In this article, we introduce a semiparametric hierarchical composite quantile regression model for hierarchical data. This model (i) keeps the easy interpretability of the simple parametric model; (ii) retains some of the flexibility of the complex non parametric model; (iii) relaxes the assumptions that the noise variances and higher-order moments exist and are finite; and (iv) takes the dependencies among subjects within the same hierarchy into consideration. We establish the asymptotic properties of the proposed estimators. Our simulation results show that the proposed method is more efficient than the least-squares-based method for many non normally distributed errors. We illustrate our methodology with a real biometric data set. 相似文献

12.

A Partial Likelihood Estimator of Vaccine Efficacy 总被引：1，自引：0，他引：1

Paul S.F. Yip & Qizhi Chen 《Australian & New Zealand Journal of Statistics》2000,42(3):367-374

A partial likelihood method is proposed for estimating vaccine efficacy for a general epidemic model. In contrast to the maximum likelihood estimator (MLE) which requires complete observation of the epidemic, the suggested method only requires information on the sequence in which individuals are infected and not the exact infection times. A simulation study shows that the method performs almost as well as the MLE. The method is applied to data on the infectious disease mumps. 相似文献

13.

Cascade model with Dirichlet process for analyzing multiple dyadic matrices

Hongxia Yang Jun Wang Alexsandra Mojslovic 《Journal of applied statistics》2012,39(9):1991-2003

Dyadic matrices are natural data representations in a wide range of domains. A dyadic matrix often involves two types of abstract objects and is based on observations of pairs of elements with one element from each object. Owing to the increasing needs from practical applications, dyadic data analysis has recently attracted increasing attention and many techniques have been developed. However, most existing approaches, such as co-clustering and relational reasoning, only handle a single dyadic table and lack flexibility to perform prediction using multiple dyadic matrices. In this article, we propose a general nonparametric Bayesian framework with a cascaded structure to model multiple dyadic matrices and then describe an efficient hybrid Gibbs sampling algorithm for posterior inference and analysis. Empirical evaluations using both synthetic data and real data show that the proposed model captures the hidden structure of data and generalizes the predictive inference in a unique way. 相似文献

14.

Efficient Estimation of Fixed and Time-varying Covariate Effects in Multiplicative Intensity Models 总被引：5，自引：0，他引：5

TORBEN MARTINUSSEN THOMAS H. SCHEIKE & IB M. SKOVGAARD 《Scandinavian Journal of Statistics》2002,29(1):57-74

The proportional hazards assumption of the Cox model does sometimes not hold in practise. An example is a treatment effect that decreases with time. We study a general multiplicative intensity model allowing the influence of each covariate to vary non-parametrically with time. An efficient estimation procedure for the cumulative parameter functions is developed. Its properties are studied using the martingale structure of the problem. Furthermore, we introduce a partly parametric version of the general non-parametric model in which the influence of some of the covariates varies with time while the effects of the remaining covariates are constant. This semiparametric model has not been studied in detail before. An efficient procedure for estimating the parametric as well as the non-parametric components of this model is developed. Again the martingale structure of the model allows us to describe the asymptotic properties of the suggested estimators. The approach is applied to two different data sets, and a Monte Carlo simulation is presented. 相似文献

15.

Nonparametric product partition models for multiple change-points analysis

Eunice Campirán García Eduardo Gutiérrez-Peña 《统计学通讯:模拟与计算》2013,42(7):1922-1947

ABSTRACT

We propose an extension of parametric product partition models. We name our proposal nonparametric product partition models because we associate a random measure instead of a parametric kernel to each set within a random partition. Our methodology does not impose any specific form on the marginal distribution of the observations, allowing us to detect shifts of behaviour even when dealing with heavy-tailed or skewed distributions. We propose a suitable loss function and find the partition of the data having minimum expected loss. We then apply our nonparametric procedure to multiple change-point analysis and compare it with PPMs and with other methodologies that have recently appeared in the literature. Also, in the context of missing data, we exploit the product partition structure in order to estimate the distribution function of each missing value, allowing us to detect change points using the loss function mentioned above. Finally, we present applications to financial as well as genetic data. 相似文献

16.

Estimation of the Force of Infection from Current Status Data Using Generalized Linear Mixed Models

Harriet Namata Ziv Shkedy Christel Faes Marc Aerts Geert Molenberghs Heide Theeten Pierre Van Damme Philippe Beutels 《Journal of applied statistics》2007,34(8):923-939

Based on sero-prevalence data of rubella, mumps in the UK and varicella in Belgium, we show how the force of infection, the age-specific rate at which susceptible individuals contract infection, can be estimated using generalized linear mixed models (McCulloch & Searle, 2001). Modelling the dependency of the force of infection on age by penalized splines, which involve fixed and random effects, allows us to use generalized linear mixed models techniques to estimate both the cumulative probability of being infected before a given age and the force of infection. Moreover, these models permit an automatic selection of the smoothing parameter. The smoothness of the estimated force of infection can be influenced by the number of knots and the degree of the penalized spline used. To determine these, a different number of knots and different degrees are used and the results are compared to establish this sensitivity. Simulations with a different number of knots and polynomial spline bases of different degrees suggest - for estimating the force of infection from serological data - the use of a quadratic penalized spline based on about 10 knots. 相似文献

17.

Genetic Algorithm in the Wavelet Domain for Large p Small n Regression

Eylem Deniz Howe Orietta Nicolis 《统计学通讯:模拟与计算》2015,44(5):1144-1157

Many areas of statistical modeling are plagued by the “curse of dimensionality,” in which there are more variables than observations. This is especially true when developing functional regression models where the independent dataset is some type of spectral decomposition, such as data from near-infrared spectroscopy. While we could develop a very complex model by simply taking enough samples (such that n > p), this could prove impossible or prohibitively expensive. In addition, a regression model developed like this could turn out to be highly inefficient, as spectral data usually exhibit high multicollinearity. In this article, we propose a two-part algorithm for selecting an effective and efficient functional regression model. Our algorithm begins by evaluating a subset of discrete wavelet transformations, allowing for variation in both wavelet and filter number. Next, we perform an intermediate processing step to remove variables with low correlation to the response data. Finally, we use the genetic algorithm to perform a stochastic search through the subset regression model space, driven by an information-theoretic objective function. We allow our algorithm to develop the regression model for each response variable independently, so as to optimally model each variable. We demonstrate our method on the familiar biscuit dough dataset, which has been used in a similar context by several researchers. Our results demonstrate both the flexibility and the power of our algorithm. For each response variable, a different subset model is selected, and different wavelet transformations are used. The models developed by our algorithm show an improvement, as measured by lower mean error, over results in the published literature. 相似文献

18.

Flexible estimation of serial correlation in nonlinear mixed models

Jan Serroyen Marc Aerts Ellen Vloeberghs Peter Paul De Deyn Geert Verbeke 《Journal of applied statistics》2010,37(5):833-846

In the conventional linear mixed-effects model, four structures can be distinguished: fixed effects, random effects, measurement error and serial correlation. The latter captures the phenomenon that the correlation structure within a subject depends on the time lag between two measurements. While the general linear mixed model is rather flexible, the need has arisen to further increase flexibility. In addition to work done in the area, we propose the use of spline-based modeling of the serial correlation function, so as to allow for additional flexibility. This approach is applied to data from a pre-clinical experiment in dementia which studied the eating and drinking behavior in mice. 相似文献

19.

Solving Noisy ICA Using Multivariate Wavelet Denoising with an Application to Noisy Latent Variables Regression

Vahid Nassiri Ali Mohammad-Djafari 《统计学通讯:理论与方法》2014,43(10-12):2297-2310

A novel approach to solve the independent component analysis (ICA) model in the presence of noise is proposed. We use wavelets as natural denoising tools to solve the noisy ICA model. To do this, we use a multivariate wavelet denoising algorithm allowing spatial and temporal dependency. We propose also using a statistical approach, named nested design of experiments, to select the parameters such as wavelet family and thresholding type. This technique helps us to select more convenient combination of the parameters. This approach could be extended to many other problems in which one needs to choose parameters between many choices. The performance of the proposed method is illustrated on the simulated data and promising results are obtained. Also, the suggested method applied in latent variables regression in the presence of noise on real data. The good results confirm the ability of multivariate wavelet denoising to solving noisy ICA. 相似文献

20.

Perturbed robust linear estimating equations for confidentiality protection in remote analysis

Christine M. O’Keefe Tim Ayre Sebastien Lucie Atikur R. Khan Soomin Song Soonmin Kwon 《Statistics and Computing》2017,27(3):775-787

National statistical agencies and other data custodians collect and hold a vast amount of survey and census data, containing information vital for research and policy analysis. However, the problem of allowing analysis of these data, while protecting respondent confidentiality, has proved challenging to address. In this paper we will focus on the remote analysis approach, under which a confidential dataset is held in a secure environment under the direct control of the data custodian agency. A computer system within the secure environment accepts a query from an analyst, runs it on the data, then returns the results to the analyst. In particular, the analyst does not have direct access to the data at all, and cannot view any microdata records. We further focus on the fitting of linear regression models to confidential data in the presence of outliers and influential points, such as are often present in business data. We propose a new method for protecting confidentiality in linear regression via a remote analysis system, that provides additional confidentiality protection for outliers and influential points in the data. The method we describe in this paper was designed for the prototype DataAnalyser system developed by the Australian Bureau of Statistics, however the method would be suitable for similar remote analysis systems. 相似文献