期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Model selection information criteria in latent class models with missing data and contingency question

《Journal of Statistical Computation and Simulation》2012,82(1):159-170

Latent class analysis (LCA) has been found to have important applications in social and behavioural sciences for modelling categorical response variables, and non-response is typical when collecting data. In this study, the non-response mainly included ‘contingency questions’ and real ‘missing data’. The primary objective of this study was to evaluate the effects of some potential factors on model selection indices in LCA with non-response data. We simulated missing data with contingency question and evaluated the accuracy rates of eight information criteria for selecting the correct models. The results showed that the main factors are latent class proportions, conditional probabilities, sample size, the number of items, the missing data rate and the contingency data rate. Interactions of the conditional probabilities with class proportions, sample size and the number of items are also significant. From our simulation results, the impact of missing data and contingency questions can be amended by increasing the sample size or the number of items. 相似文献

2.

Nonparametric Shrinkage Estimation for Aalen's Additive Hazards Model

Abdulkadir A. Hussein Sévérien Nkurunziza Katrina Tomanelli 《Australian & New Zealand Journal of Statistics》2014,56(1):15-26

Aalen's nonparametric additive model in which the regression coefficients are assumed to be unspecified functions of time is a flexible alternative to Cox's proportional hazards model when the proportionality assumption is in doubt. In this paper, we incorporate a general linear hypothesis into the estimation of the time‐varying regression coefficients. We combine unrestricted least squares estimators and estimators that are restricted by the linear hypothesis and produce James‐Stein‐type shrinkage estimators of the regression coefficients. We develop the asymptotic joint distribution of such restricted and unrestricted estimators and use this to study the relative performance of the proposed estimators via their integrated asymptotic distributional risks. We conduct Monte Carlo simulations to examine the relative performance of the estimators in terms of their integrated mean square errors. We also compare the performance of the proposed estimators with a recently devised LASSO estimator as well as with ridge‐type estimators both via simulations and data on the survival of primary billiary cirhosis patients. 相似文献

3.

Simultaneous inference of a misclassified outcome and competing risks failure time data

Sheng Luo Xiao Su Min Yi Kelly K. Hunt 《Journal of applied statistics》2015,42(5):1080-1090

Ipsilateral breast tumor relapse (IBTR) often occurs in breast cancer patients after their breast conservation therapy. The IBTR status' classification (true local recurrence versus new ipsilateral primary tumor) is subject to error and there is no widely accepted gold standard. Time to IBTR is likely informative for IBTR classification because new primary tumor tends to have a longer mean time to IBTR and is associated with improved survival as compared with the true local recurrence tumor. Moreover, some patients may die from breast cancer or other causes in a competing risk scenario during the follow-up period. Because the time to death can be correlated to the unobserved true IBTR status and time to IBTR (if relapse occurs), this terminal mechanism is non-ignorable. In this paper, we propose a unified framework that addresses these issues simultaneously by modeling the misclassified binary outcome without a gold standard and the correlated time to IBTR, subject to dependent competing terminal events. We evaluate the proposed framework by a simulation study and apply it to a real data set consisting of 4477 breast cancer patients. The adaptive Gaussian quadrature tools in SAS procedure NLMIXED can be conveniently used to fit the proposed model. We expect to see broad applications of our model in other studies with a similar data structure. 相似文献

4.

Second order minimax estimation of the mean

Shaul K. Bar-Lev Daoud Bshouty Zinoviy Landsman 《Journal of statistical planning and inference》2010

In this study we consider the problem of the improvement of the sample mean in the second order minimax estimation sense for a mean belonging to an unrestricted mean parameter space R⁺

R^{+}

. We solve this problem for the class of natural exponential families (NEF's) whose variance functions (VF's) are regular at zero and at infinity. Such a class of VF's (or NEF's) is huge and contains (among others): Polynomial VF's (e.g., quadratic VF's in the Morris class, cubic VF's in the Letac&Mora class and VF's in the Hinde–Demétrio class); VF's belonging to the Tweedie class with power VF's, VF's belonging to the Babel class and many others. Moreover, we show that if the canonical parameter space of the corresponding NEF is R

R

(which is obviously the case if the support of the NEF is bounded), then the sample mean as an estimator of the mean cannot be further improved. This work presents an original constructive methodology and provides with constructive tools enabling to obtain explicit forms of the second order minimax estimators as well as the forms of the related weight functions. Our work establishes a substantial generalization of the results obtained so far in the literature. Illustrations of the resulting methods are provided and a simulation-based analysis is presented for the negative binomial case. 相似文献

5.

A matching prior for the product of normal means based on the modified profile likelihood

Yongku Kim Woo Dong Lee 《统计学通讯:模拟与计算》2019,48(5):1312-1329

In this paper, we develop a matching prior for the product of means in several normal distributions with unrestricted means and unknown variances. For this problem, properly assigning priors for the product of normal means has been issued because of the presence of nuisance parameters. Matching priors, which are priors matching the posterior probabilities of certain regions with their frequentist coverage probabilities, are commonly used but difficult to derive in this problem. We developed the first order probability matching priors for this problem; however, the developed matching priors are unproper. Thus, we apply an alternative method and derive a matching prior based on a modification of the profile likelihood. Simulation studies show that the derived matching prior performs better than the uniform prior and Jeffreys’ prior in meeting the target coverage probabilities, and meets well the target coverage probabilities even for the small sample sizes. In addition, to evaluate the validity of the proposed matching prior, Bayesian credible interval for the product of normal means using the matching prior is compared to Bayesian credible intervals using the uniform prior and Jeffrey’s prior, and the confidence interval using the method of Yfantis and Flatman. 相似文献

6.

A frequentist assessment of Bayesian inclusion probabilities for screening predictors

《Journal of Statistical Computation and Simulation》2012,82(9):1111-1119

Bayesian inclusion probabilities have become a popular tool for variable assessment. From a frequentist perspective, it is often difficult to evaluate these probabilities as typically no Type I error rates are considered, neither are any explorations of power of the methods given. This paper considers how a frequentist may evaluate Bayesian inclusion probabilities for screening predictors. This evaluation looks at both unrestricted and restricted model spaces and develops a framework which a frequentist can utilize inclusion probabilities that preserve Type I error rates. Furthermore, this framework is applied to an analysis of the Arabidopsis thaliana with respect to determining quantitative trait loci associated with cotelydon opening angle. 相似文献

7.

Evidence for hedge fund predictability from a multivariate Student's t full-factor GARCH model

Ioannis Vrontos 《Journal of applied statistics》2012,39(6):1295-1321

Extending previous work on hedge fund return predictability, this paper introduces the idea of modelling the conditional distribution of hedge fund returns using Student's t full-factor multivariate GARCH models. This class of models takes into account the stylized facts of hedge fund return series, that is, heteroskedasticity, fat tails and deviations from normality. For the proposed class of multivariate predictive regression models, we derive analytic expressions for the score and the Hessian matrix, which can be used within classical and Bayesian inferential procedures to estimate the model parameters, as well as to compare different predictive regression models. We propose a Bayesian approach to model comparison which provides posterior probabilities for various predictive models that can be used for model averaging. Our empirical application indicates that accounting for fat tails and time-varying covariances/correlations provides a more appropriate modelling approach of the underlying dynamics of financial series and improves our ability to predict hedge fund returns. 相似文献

8.

A fast Monte Carlo expectation–maximization algorithm for estimation in latent class model analysis with an application to assess diagnostic accuracy for cervical neoplasia in women with atypical glandular cells

Le Kang Kathleen Darcy James Kauderer Shu-Yuan Liao 《Journal of applied statistics》2013,40(12):2699-2719

In this article, we use a latent class model (LCM) with prevalence modeled as a function of covariates to assess diagnostic test accuracy in situations where the true disease status is not observed, but observations on three or more conditionally independent diagnostic tests are available. A fast Monte Carlo expectation–maximization (MCEM) algorithm with binary (disease) diagnostic data is implemented to estimate parameters of interest; namely, sensitivity, specificity, and prevalence of the disease as a function of covariates. To obtain standard errors for confidence interval construction of estimated parameters, the missing information principle is applied to adjust information matrix estimates. We compare the adjusted information matrix-based standard error estimates with the bootstrap standard error estimates both obtained using the fast MCEM algorithm through an extensive Monte Carlo study. Simulation demonstrates that the adjusted information matrix approach estimates the standard error similarly with the bootstrap methods under certain scenarios. The bootstrap percentile intervals have satisfactory coverage probabilities. We then apply the LCM analysis to a real data set of 122 subjects from a Gynecologic Oncology Group study of significant cervical lesion diagnosis in women with atypical glandular cells of undetermined significance to compare the diagnostic accuracy of a histology-based evaluation, a carbonic anhydrase-IX biomarker-based test and a human papillomavirus DNA test. 相似文献

9.

A Construction of Lancaster Probabilities with Margins in the Multidimensional Meixner Class

A.E. Koudou & D. Pommeret 《Australian & New Zealand Journal of Statistics》2000,42(1):59-66

The well-known Meixner class (Meixner, 1934) of probabilities on R has been extended recently to R ^d (Pommeret, 1996). This generalized Meixner class corresponds to the simple quadratic natural exponential families characterized by Casalis (1996). Following Lancaster (1975), the present paper offers a characterization of the joint probability of a randomvector ( X, Y ) such that the two variables X and Y on R ^d belong to the multidimensional Meixner class and fulfil a bi-orthogonality condition involving orthogonal polynomials. The joint probabilities, called Lancaster probabilities, are characterized by two sequences of orthogonal polynomials with respect to the margins and a sequence of expectations of products. Some multivariate probabilities are studied, namely the Poisson-Gaussian and the gamma-Gaussian. 相似文献

10.

A Sieve model for extreme values

《Journal of Statistical Computation and Simulation》2012,82(8):1692-1710

From the class of extreme value distributions, we focus on the set of heavy-tailed distributions which produce low-frequency, high-cost events. The regular Pareto distribution is the basic model of choice, being the simplest heavy-tailed distribution. Real data suggest that modifications of the Pareto distribution may be a better fit; an alternative model is the truncated Pareto distribution (TPD). For further study, this paper proposed a TPD Sieve class of distributions. The properties and estimation on the Sieve class are also discussed. We fit the models to the largest Black Sea bass caught in Buzzard's Bay, MA, USA and the costliest Atlantic hurricanes from 1900 to 2005. Using measures of model adequacy, the TPD Sieve model is generally found to be the best-fitting model. 相似文献

11.

On the integrated maximum likelihood estimators for a closed population capture–recapture model with unequal capture probabilities

Luis Ernesto B. Salasar José Galvão Leite 《Statistics》2015,49(6):1204-1220

Nuisance parameter elimination is a central problem in capture–recapture modelling. In this paper, we consider a closed population capture–recapture model which assumes the capture probabilities varies only with the sampling occasions. In this model, the capture probabilities are regarded as nuisance parameters and the unknown number of individuals is the parameter of interest. In order to eliminate the nuisance parameters, the likelihood function is integrated with respect to a weight function (uniform and Jeffrey's) of the nuisance parameters resulting in an integrated likelihood function depending only on the population size. For these integrated likelihood functions, analytical expressions for the maximum likelihood estimates are obtained and it is proved that they are always finite and unique. Variance estimates of the proposed estimators are obtained via a parametric bootstrap resampling procedure. The proposed methods are illustrated on a real data set and their frequentist properties are assessed by means of a simulation study. 相似文献

12.

A latent Markov model for detecting patterns of criminal activity 总被引：1，自引：0，他引：1

Francesco Bartolucci Fulvia Pennoni Brian Francis 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2007,170(1):115-132

Summary. The paper investigates the problem of determining patterns of criminal behaviour from official criminal histories, concentrating on the variety and type of offending convictions. The analysis is carried out on the basis of a multivariate latent Markov model which allows for discrete covariates affecting the initial and the transition probabilities of the latent process. We also show some simplifications which reduce the number of parameters substantially; we include a Rasch-like parameterization of the conditional distribution of the response variables given the latent process and a constraint of partial homogeneity of the latent Markov chain. For the maximum likelihood estimation of the model we outline an EM algorithm based on recursions known in the hidden Markov literature, which make the estimation feasible also when the number of time occasions is large. Through this model, we analyse the conviction histories of a cohort of offenders who were born in England and Wales in 1953. The final model identifies five latent classes and specifies common transition probabilities for males and females between 5-year age periods, but with different initial probabilities. 相似文献

13.

Investigations of potential bias in the estimation of u using Pradel's (1996) model for capture-recapture data

James E. Hines James D. Nichols 《Journal of applied statistics》2002,29(1-4):573-587

Pradel's (1996) temporal symmetry model permitting direct estimation and modelling of population growth rate, u i , provides a potentially useful tool for the study of population dynamics using marked animals. Because of its recent publication date, the approach has not seen much use, and there have been virtually no investigations directed at robustness of the resulting estimators. Here we consider several potential sources of bias, all motivated by specific uses of this estimation approach. We consider sampling situations in which the study area expands with time and present an analytic expression for the bias in u i We next consider trap response in capture probabilities and heterogeneous capture probabilities and compute large-sample and simulation-based approximations of resulting bias in u i . These approximations indicate that trap response is an especially important assumption violation that can produce substantial bias. Finally, we consider losses on capture and emphasize the importance of selecting the estimator for u i that is appropriate to the question being addressed. For studies based on only sighting and resighting data, Pradel's (1996) u i ' is the appropriate estimator. 相似文献

14.

A simple two-step procedure using the Fellegi–Sunter model for frequency-based record linkage

Huiping Xu Xiaochun Li Shaun Grannis 《Journal of applied statistics》2022,49(11):2789

The widely used Fellegi–Sunter model for probabilistic record linkage does not leverage information contained in field values and consequently leads to identical classification of match status regardless of whether records agree on rare or common values. Since agreement on rare values is less likely to occur by chance than agreement on common values, records agreeing on rare values are more likely to be matches. Existing frequency-based methods typically rely on knowledge of error probabilities associated with field values and frequencies of agreed field values among matches, often derived using prior studies or training data. When such information is unavailable, applications of these methods are challenging. In this paper, we propose a simple two-step procedure for frequency-based matching using the Fellegi–Sunter framework to overcome these challenges. Matching weights are adjusted based on frequency distributions of the agreed field values among matches and non-matches, estimated by the Fellegi–Sunter model without relying on prior studies or training data. Through a real-world application and simulation, our method is found to produce comparable or better performance than the unadjusted method. Furthermore, frequency-based matching provides greater improvement in matching accuracy when using poorly discriminating fields with diminished benefit as the discriminating power of matching fields increases. 相似文献

15.

基于潜类别随机前沿的区域创新效率及其影响因素分析

赖永剑《统计与信息论坛》2014,(10):52-57

运用可根据研究对象的潜在属性内生分组的潜类别随机前沿模型,采用1999-2012年中国各省区数据,研究各省区的创新效率及影响因素。结果表明:以人力资本水平和基础设施状况为条件变量,将全国各省区分成两个技术类别,分别有各自的技术前沿和函数形式,A类别中上海市的创新效率最高,B类别中河北省的创新效率最高;平均来看,各类的创新效率均呈上升趋势,贸易开放、产业结构和金融发展对创新效率均有显著的正向作用,同时创新效率在各类内部均存在俱乐部收敛。相似文献

16.

Female athletic participation and income: evidence from a latent class model

Steven B. Caudill James E. Long 《Journal of applied statistics》2012,39(3):477-488

This paper introduces and applies an EM algorithm for the maximum-likelihood estimation of a latent class version of the grouped-data regression model. This new model is applied to examine the effects of college athletic participation of females on incomes. No evidence for an “athlete” effect in the case of females has been found in the previous work by Long and Caudill [12], Henderson et al. [10], and Caudill and Long [5]. Our study is the first to find evidence of a lower wage for female athletes. This effect is present in a regime characterizing 42% of the sample. Further analysis indicates that female athletes in many otherwise low-paying jobs actually get paid less than non-athletes. 相似文献

17.

Jointly modelling multiple transplant outcomes by a competing risk model via functional principal component analysis

Jianghu Dong Haolun Shi Liangliang Wang Ying Zhang Jiguo Cao 《Journal of applied statistics》2023,50(1):43

In many clinical studies, longitudinal biomarkers are often used to monitor the progression of a disease. For example, in a kidney transplant study, the glomerular filtration rate (GFR) is used as a longitudinal biomarker to monitor the progression of the kidney function and the patient''s state of survival that is characterized by multiple time-to-event outcomes, such as kidney transplant failure and death. It is known that the joint modelling of longitudinal and survival data leads to a more accurate and comprehensive estimation of the covariates'' effect. While most joint models use the longitudinal outcome as a covariate for predicting survival, very few models consider the further decomposition of the variation within the longitudinal trajectories and its effect on survival. We develop a joint model that uses functional principal component analysis (FPCA) to extract useful features from the longitudinal trajectories and adopt the competing risk model to handle multiple time-to-event outcomes. The longitudinal trajectories and the multiple time-to-event outcomes are linked via the shared functional features. The application of our model on a real kidney transplant data set reveals the significance of these functional features, and a simulation study is carried out to validate the accurateness of the estimation method. 相似文献

18.

A multivariate finite mixture latent trajectory model with application to dementia studies

Dongbing Lai Huiping Xu Daniel Koller Tatiana Foroud 《Journal of applied statistics》2016,43(14):2503-2523

Dementia patients exhibit considerable heterogeneity in individual trajectories of cognitive decline, with some patients showing rapid decline following diagnoses while others exhibiting slower decline or remaining stable for several years. Dementia studies often collect longitudinal measures of multiple neuropsychological tests aimed to measure patients’ decline across a number of cognitive domains. We propose a multivariate finite mixture latent trajectory model to identify distinct longitudinal patterns of cognitive decline simultaneously in multiple cognitive domains, each of which is measured by multiple neuropsychological tests. EM algorithm is used for parameter estimation and posterior probabilities are used to predict latent class membership. We present results of a simulation study demonstrating adequate performance of our proposed approach and apply our model to the Uniform Data Set from the National Alzheimer's Coordinating Center to identify cognitive decline patterns among dementia patients. 相似文献

19.

A comparative study between latent class binomial segmentation and mixed-effects logistic regression to explore between-respondent variability in visual preference for horticultural products

E. Schrevens H. Coppenolle K.M. Portier 《Journal of applied statistics》2005,32(6):589-605

相似文献

20.

Latent class based multiple imputation approach for missing categorical data

Mulugeta Gebregziabher Stacia M. DeSantis 《Journal of statistical planning and inference》2010

In this paper we propose a latent class based multiple imputation approach for analyzing missing categorical covariate data in a highly stratified data model. In this approach, we impute the missing data assuming a latent class imputation model and we use likelihood methods to analyze the imputed data. Via extensive simulations, we study its statistical properties and make comparisons with complete case analysis, multiple imputation, saturated log-linear multiple imputation and the Expectation–Maximization approach under seven missing data mechanisms (including missing completely at random, missing at random and not missing at random). These methods are compared with respect to bias, asymptotic standard error, type I error, and 95% coverage probabilities of parameter estimates. Simulations show that, under many missingness scenarios, latent class multiple imputation performs favorably when jointly considering these criteria. A data example from a matched case–control study of the association between multiple myeloma and polymorphisms of the Inter-Leukin 6 genes is considered. 相似文献