期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Bayesian disclosure risk assessment: predicting small frequencies in contingency tables

Jonathan J. Forster Emily L. Webb 《Journal of the Royal Statistical Society. Series C, Applied statistics》2007,56(5):551-570

Summary. We propose an approach for assessing the risk of individual identification in the release of categorical data. This requires the accurate calculation of predictive probabilities for those cells in a contingency table which have small sample frequencies, making the problem somewhat different from usual contingency table estimation, where interest is generally focused on regions of high probability. Our approach is Bayesian and provides posterior predictive probabilities of identification risk. By incorporating model uncertainty in our analysis, we can provide more realistic estimates of disclosure risk for individual cell counts than are provided by methods which ignore the multivariate structure of the data set. 相似文献

2.

Single-index modelling of conditional probabilities in two-way contingency tables

Gery Geenens Léopold Simar 《Statistics》2013,47(5):451-478

When analysing a contingency table, it is often worth relating the probabilities that a given individual falls into different cells from a set of predictors. These conditional probabilities are usually estimated using appropriate regression techniques. In particular, in this paper, a semiparametric model is developed. Essentially, it is only assumed that the effect of the vector of covariates on the probabilities can entirely be captured by a single index, which is a linear combination of the initial covariates. The estimation is then twofold: the coefficients of the linear combination and the functions linking this index to the related conditional probabilities have to be estimated. Inspired by the estimation procedures already proposed in the literature for single-index regression models, four estimators of the index coefficients are proposed and compared, from a theoretical point-of-view, but also practically, with the aid of simulations. Estimation of the link functions is also addressed. 相似文献

3.

Flexible semi-parametric regression of state occupational probabilities in a multistate model with right-censored data

Chathura Siriwardhana K. B. Kulasekera Somnath Datta 《Lifetime data analysis》2018,24(3):464-491

Inference for the state occupation probabilities, given a set of baseline covariates, is an important problem in survival analysis and time to event multistate data. We introduce an inverse censoring probability re-weighted semi-parametric single index model based approach to estimate conditional state occupation probabilities of a given individual in a multistate model under right-censoring. Besides obtaining a temporal regression function, we also test the potential time varying effect of a baseline covariate on future state occupation. We show that the proposed technique has desirable finite sample performances and its performance is competitive when compared with three other existing approaches. We illustrate the proposed methodology using two different data sets. First, we re-examine a well-known data set dealing with leukemia patients undergoing bone marrow transplant with various state transitions. Our second illustration is based on data from a study involving functional status of a set of spinal cord injured patients undergoing a rehabilitation program. 相似文献

4.

Bayesian modelling of health insurance losses

Zeinab Amin Maram Salem 《Journal of applied statistics》2015,42(2):231-251

The purpose of this paper is to build a model for aggregate losses which constitutes a crucial step in evaluating premiums for health insurance systems. It aims at obtaining the predictive distribution of the aggregate loss within each age class of insured persons over the time horizon involved in planning employing the Bayesian methodology. The model proposed using the Bayesian approach is a generalization of the collective risk model, a commonly used model for analysing risk of an insurance system. Aggregate loss prediction is based on past information on size of loss, number of losses and size of population at risk. In modelling the frequency and severity of losses, the number of losses is assumed to follow a negative binomial distribution, individual loss sizes are independent and identically distributed exponential random variables, while the number of insured persons in a finite number of possible age groups is assumed to follow the multinomial distribution. Prediction of aggregate losses is based on the Gibbs sampling algorithm which incorporates the missing data approach. 相似文献

5.

A frequentist understanding of sets of measures

P.I. Fierens L.C. Rêgo T.L. Fine 《Journal of statistical planning and inference》2009

We present a mathematical theory of objective, frequentist chance phenomena that uses as a model a set of probability measures. In this work, sets of measures are not viewed as a statistical compound hypothesis or as a tool for modeling imprecise subjective behavior. Instead we use sets of measures to model stable (although not stationary in the traditional stochastic sense) physical sources of finite time series data that have highly irregular behavior. Such models give a coarse-grained picture of the phenomena, keeping track of the range of the possible probabilities of the events. We present methods to simulate finite data sequences coming from a source modeled by a set of probability measures, and to estimate the model from finite time series data. The estimation of the set of probability measures is based on the analysis of a set of relative frequencies of events taken along subsequences selected by a collection of rules. In particular, we provide a universal methodology for finding a family of subsequence selection rules that can estimate any set of probability measures with high probability. 相似文献

6.

Analyzing Quantitative Trait Loci for the Arabidopsis thaliana using Markov Chain Monte Carlo Model Composition with restricted and unrestricted model spaces

Edward L. Boone Susan J. Simmons Keying Ye Ann E. Stapleton 《Statistical Methodology》2006,3(1):69

Quantitative Trait Loci (QTL) mapping is a growing field in statistical genetics. However, dealing with this type of data from a statistical perspective is often perilous. In this paper we extend and apply a Markov Chain Monte Carlo Model Composition (MC³) technique to a data set of the Arabidopsis thaliana plant for locating the QTL mapping associated with cotyledon opening. The posterior model probabilities as well as the marginal posterior probabilities of each locus belonging to the model are presented. Furthermore, we show how the MC³ method can be used to deal with the situation where the sample size is less than the number of parameters in a model using a restricted model space approach. 相似文献

7.

Prior distributions for stratified capture-recapture models

J. A. Dupuis 《Journal of applied statistics》2002,29(1-4):225-237

We consider the Arnason-Schwarz model, usually used to estimate survival and movement probabilities from capture-recapture data. A missing data structure of this model is constructed which allows a clear separation of information relative to capture and relative to movement. Extensions of the Arnason-Schwarz model are considered. For example, we consider a model that takes into account both the individual migration history and the individual reproduction history. Biological assumptions of these extensions are summarized via a directed graph. Owing to missing data, the posterior distribution of parameters is numerically intractable. To overcome those computational difficulties we advocate a Gibbs sampling algorithm that takes advantage of the missing data structure inherent in capture-recapture models. Prior information on survival, capture and movement probabilities typically consists of a prior mean and of a prior 95% credible confidence interval. Dirichlet distributions are used to incorporate some prior information on capture, survival probabilities, and movement probabilities. Finally, the influence of the prior on the Bayesian estimates of movement probabilities is examined. 相似文献

8.

Prior distributions for stratified capture-recapture models

J. A. Dupuis 《Journal of applied statistics》2002,29(1):225-237

We consider the Arnason-Schwarz model, usually used to estimate survival and movement probabilities from capture-recapture data. A missing data structure of this model is constructed which allows a clear separation of information relative to capture and relative to movement. Extensions of the Arnason-Schwarz model are considered. For example, we consider a model that takes into account both the individual migration history and the individual reproduction history. Biological assumptions of these extensions are summarized via a directed graph. Owing to missing data, the posterior distribution of parameters is numerically intractable. To overcome those computational difficulties we advocate a Gibbs sampling algorithm that takes advantage of the missing data structure inherent in capture-recapture models. Prior information on survival, capture and movement probabilities typically consists of a prior mean and of a prior 95% credible confidence interval. Dirichlet distributions are used to incorporate some prior information on capture, survival probabilities, and movement probabilities. Finally, the influence of the prior on the Bayesian estimates of movement probabilities is examined. 相似文献

9.

Joint model-based clustering of nonlinear longitudinal trajectories and associated time-to-event data analysis,linked by latent class membership: with application to AIDS clinical studies

Yangxin Huang Xiaosun Lu Jiaqing Chen Juan Liang Miriam Zangmeister 《Lifetime data analysis》2018,24(4):699-718

Longitudinal and time-to-event data are often observed together. Finite mixture models are currently used to analyze nonlinear heterogeneous longitudinal data, which, by releasing the homogeneity restriction of nonlinear mixed-effects (NLME) models, can cluster individuals into one of the pre-specified classes with class membership probabilities. This clustering may have clinical significance, and be associated with clinically important time-to-event data. This article develops a joint modeling approach to a finite mixture of NLME models for longitudinal data and proportional hazard Cox model for time-to-event data, linked by individual latent class indicators, under a Bayesian framework. The proposed joint models and method are applied to a real AIDS clinical trial data set, followed by simulation studies to assess the performance of the proposed joint model and a naive two-step model, in which finite mixture model and Cox model are fitted separately. 相似文献

10.

A semiparametric Bayesian model for repeatedly repeated binary outcomes

Fernando A. Quintana Peter Müller Gary L. Rosner Mary V. Relling 《Journal of the Royal Statistical Society. Series C, Applied statistics》2008,57(4):419-431

Summary. We discuss the analysis of data from single-nucleotide polymorphism arrays comparing tumour and normal tissues. The data consist of sequences of indicators for loss of heterozygosity (LOH) and involve three nested levels of repetition: chromosomes for a given patient, regions within chromosomes and single-nucleotide polymorphisms nested within regions. We propose to analyse these data by using a semiparametric model for multilevel repeated binary data. At the top level of the hierarchy we assume a sampling model for the observed binary LOH sequences that arises from a partial exchangeability argument. This implies a mixture of Markov chains model. The mixture is defined with respect to the Markov transition probabilities. We assume a non-parametric prior for the random-mixing measure. The resulting model takes the form of a semiparametric random-effects model with the matrix of transition probabilities being the random effects. The model includes appropriate dependence assumptions for the two remaining levels of the hierarchy, i.e. for regions within chromosomes and for chromosomes within patient. We use the model to identify regions of increased LOH in a data set coming from a study of treatment-related leukaemia in children with an initial cancer diagnostic. The model successfully identifies the desired regions and performs well compared with other available alternatives. 相似文献

11.

An online Bayesian mixture labelling method by minimizing deviance of classification probabilities to reference labels

《Journal of Statistical Computation and Simulation》2012,82(2):310-323

Solving label switching is crucial for interpreting the results of fitting Bayesian mixture models. The label switching originates from the invariance of posterior distribution to permutation of component labels. As a result, the component labels in Markov chain simulation may switch to another equivalent permutation, and the marginal posterior distribution associated with all labels may be similar and useless for inferring quantities relating to each individual component. In this article, we propose a new simple labelling method by minimizing the deviance of the class probabilities to a fixed reference labels. The reference labels can be chosen before running Markov chain Monte Carlo (MCMC) using optimization methods, such as expectation-maximization algorithms, and therefore the new labelling method can be implemented by an online algorithm, which can reduce the storage requirements and save much computation time. Using the Acid data set and Galaxy data set, we demonstrate the success of the proposed labelling method for removing the labelling switching in the raw MCMC samples. 相似文献

12.

Joint analysis of nonlinear heterogeneous longitudinal data and binary outcome: an application to AIDS clinical studies

Xiaosun Lu Rong Zhou 《Journal of applied statistics》2016,43(15):2713-2728

Finite mixture models are currently used to analyze heterogeneous longitudinal data. By releasing the homogeneity restriction of nonlinear mixed-effects (NLME) models, finite mixture models not only can estimate model parameters but also cluster individuals into one of the pre-specified classes with class membership probabilities. This clustering may have clinical significance, which might be associated with a clinically important binary outcome. This article develops a joint modeling of a finite mixture of NLME models for longitudinal data in the presence of covariate measurement errors and a logistic regression for a binary outcome, linked by individual latent class indicators, under a Bayesian framework. Simulation studies are conducted to assess the performance of the proposed joint model and a naive two-step model, in which finite mixture model and logistic regression are fitted separately, followed by an application to a real data set from an AIDS clinical trial, in which the viral dynamics and dichotomized time to the first decline of CD4/CD8 ratio are analyzed jointly. 相似文献

13.

Robust EM Continual Reassessment Method in Oncology Dose Finding

Yuan Y Yin G 《Journal of the American Statistical Association》2011,106(495):818-831

The continual reassessment method (CRM) is a commonly used dose-finding design for phase I clinical trials. Practical applications of this method have been restricted by two limitations: (1) the requirement that the toxicity outcome needs to be observed shortly after the initiation of the treatment; and (2) the potential sensitivity to the prespecified toxicity probability at each dose. To overcome these limitations, we naturally treat the unobserved toxicity outcomes as missing data, and use the expectation-maximization (EM) algorithm to estimate the dose toxicity probabilities based on the incomplete data to direct dose assignment. To enhance the robustness of the design, we propose prespecifying multiple sets of toxicity probabilities, each set corresponding to an individual CRM model. We carry out these multiple CRMs in parallel, across which model selection and model averaging procedures are used to make more robust inference. We evaluate the operating characteristics of the proposed robust EM-CRM designs through simulation studies and show that the proposed methods satisfactorily resolve both limitations of the CRM. Besides improving the MTD selection percentage, the new designs dramatically shorten the duration of the trial, and are robust to the prespecification of the toxicity probabilities. 相似文献

14.

Non-parametric estimation of population size from capture–recapture data when the capture probability depends on a covariate

Richard Huggins Wen-Han Hwang 《Journal of the Royal Statistical Society. Series C, Applied statistics》2007,56(4):429-443

Summary. In capture–recapture experiments the capture probabilities may depend on individual covariates such as an individual's weight or age. Typically this dependence is modelled through simple parametric functions of the covariates. Here we first demonstrate that misspecification of the model can produce biased estimates and subsequently develop a non-parametric procedure to estimate the functional relationship between the probability of capture and a single covariate. This estimator is then incorporated in a Horvitz–Thompson estimator to estimate the size of the population. The resulting estimators are evaluated in a simulation study and applied to a data set on captures of the Mountain Pygmy Possum. 相似文献

15.

Kernel density estimation for heavy-tailed distributions using the champernowne transformation

Tine Buch-larsen Jens Perch Nielsen Catalina Bolancé 《Statistics》2013,47(6):503-516

When estimating loss distributions in insurance, large and small losses are usually split because it is difficult to find a simple parametric model that fits all claim sizes. This approach involves determining the threshold level between large and small losses. In this article, a unified approach to the estimation of loss distributions is presented. We propose an estimator obtained by transforming the data set with a modification of the Champernowne cdf and then estimating the density of the transformed data by use of the classical kernel density estimator. We investigate the asymptotic bias and variance of the proposed estimator. In a simulation study, the proposed method shows a good performance. We also present two applications dealing with claims costs in insurance. 相似文献

16.

Skew-probit measurement error models

《Statistical Methodology》2007,4(1):1-12

In this paper we extend the structural probit measurement error model by considering the unobserved covariate to follow a skew-normal distribution. The new model is termed the structural skew-normal probit model. As in the normal case, the likelihood function is obtained analytically, and can be maximized by using existing statistical software. A Bayesian approach using Markov chain Monte Carlo techniques for generating from the posterior distributions is also developed. A simulation study demonstrates the usefulness of the approach in avoiding attenuation which arises with the naive procedure. Moreover, a comparison of predicted and true success probabilities indicates that it seems to be more efficient to use the skew probit model when the distribution of the covariate (predictor) is skew. An application to a real data set is also provided. 相似文献

17.

Bias in transition-specific survival and movement probabilities estimated using capture-recapture data

Jay B. Hestbeck 《Journal of applied statistics》1995,22(5-6):737-750

Transition probabilities can be estimated when capture-recapture data are available from each stratum on every capture occasion using a conditional likelihood approach with the Arnason-Schwarz model. To decompose the fundamental transition probabilities into derived parameters, all movement probabilities must sum to 1 and all individuals in stratum r at time i must have the same probability of survival regardless of which stratum the individual is in at time i + 1. If movement occurs among strata at the end of a sampling interval, survival rates of individuals from the same stratum are likely to be equal. However, if movement occurs between sampling periods and survival rates of individuals from the same stratum are not the same, estimates of stratum survival can be confounded with estimates of movement causing both estimates to be biased. Monte Carlo simulations were made of a three-sample model for a population with two strata using SURVIV. When differences were created in transition-specific survival rates for survival rates from the same stratum, relative bias was <2% in estimates of stratum survival and capture rates but relative bias in movement rates was much higher and varied. The magnitude of the relative bias in the movement estimate depended on the relative difference between the transition-specific survival rates and the corresponding stratum survival rate. The direction of the bias in movement rate estimates was opposite to the direction of this difference. Increases in relative bias due to increasing heterogeneity in probabilities of survival, movement and capture were small except when survival and capture probabilities were positively correlated within individuals. 相似文献

18.

基于GB2分布的贝叶斯相依性准备金评估模型

李政宵孟生旺《统计研究》2018,35(1):91-103

非寿险精算的核心问题之一是对未决赔款准备金进行准确评估。非寿险未决赔款准备金评估通常使用增量赔款或累积赔款的流量三角形数据。在未决赔款准备金评估中,多条业务线的流量三角形数据之间通常存在一定的相依关系,这种相依关系对保险公司总准备金的评估结果具有重要影响。从本质上看,未决赔款准备金是一个随机变量,其损失分布存在一定的多样性。因此,在未决赔款准备金的评估中选择合适的分布至关重要。GB2分布是一种包含四个参数的连续型分布,具有灵活的密度函数,分布形状更加灵活,许多常见分布都是它的特例,适宜处理不同特点的未决赔款流量三角形数据。为了考虑不同业务线之间的相依关系对未决赔款准备金评估结果的影响,本文基于GB2分布建立了一种相依性准备金评估模型,该模型首先假设不同业务线的增量赔款服从GB2分布,并在分布的期望中引入事故年和进展年作为解释变量,引入日历年随机效应描述各条业务线之间的相依关系;然后借助贝叶斯HMC方法进行参数估计和未决赔款准备金预测,最后给出了总准备金的预测分布和评估结果。本文将该方法应用到两条业务线的流量三角形数据进行实证研究,并与现有其他方法进行了比较。实证研究结果表明,基于GB2分布的相依性准备金评估模型对未决赔款准备金的尾部风险和不确定性的考虑更加充分,更加适用于评估具有厚尾或者长尾特征的准备金数据。相似文献

19.

Selection among open population capture-recapture models when capture probabilities are heterogeneous

K. P. Burnham D. R. Anderson G. C. White 《Journal of applied statistics》1995,22(5-6):611-624

Selection of a parsimonious model as a basis for statistical inference from capture-recapture data is critical, especially when using open models in the analysis of multiple, interrelated data sets (e.g. males and females, with two to three age classes, over three to five areas and 10-15 years). The global (i.e. most general) model for such data sets might contain hundreds of survival and recapture parameters. Here, we focus on a series of nested models of the Cormack-Jolly-Seber type wherein the likelihood arises from products of multinomial distributions whose cell probabilities are reparameterized in terms of survival ( phi ) and mean capture ( p ) probabilities. This paper presents numerical results on two information-theoretic methods for model selection when the capture probabilities are heterogeneous over individual animals: Akaike's Information Criterion (AIC) and a dimension-consistent criterion (CAIC), derived from a Bayesian viewpoint. Quality of model selection was evaluated based on the relative Euclidian distance between standardized theta and theta (parameter theta is vector-valued and contains the survival ( phi ) and mean capture ( p ) probabilities); this quantity (RSS = sigma{(theta i - theta i )/ theta i } 2 ) is a sum of squared bias and variance. Thus, the quality of inference (RSS) was judged by comparing the performance of the two information criteria and the use of the true model (used to generate the data), in relation to the model that provided the smallest RSS. We found that heterogeneity in the capture probabilities had a negligible effect on model selection using AIC or CAIC. Model size increased as sample size increased with both AIC- and CAIC-selected models. 相似文献

20.

Analysis of repeated categorical responses from fully and partially cross-classified data

Michael Haber Catherine C. H. Chen C. David Williamson 《统计学通讯:理论与方法》2013,42(10):3293-3313

Many follow-up studies involve categorical data measured on the same individual at different times. Frequently, some of the individuals are missing one or more of the measurements. This results in a contingency table with both fully and partially cross-classified data. Two models can be used to analyze data of this type: (i) The multiple-sample model, in which all the study subjects with the same configuration of missing observations are considered a separate sample. (ii) The single-sample model, which assumes that the missing observations are the result of a mechanism causing subjects to lose the informtion from one or some of the measurements. In this work we compare the two approaches and show that under certain conditions, the two models yield the same maximum likelihood estimates of the cell probabilities in the underlying contingency table. 相似文献