期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A note on model selection using information criteria for general linear models estimated using REML

Arunas Petras Verbyla 《Australian & New Zealand Journal of Statistics》2019,61(1):39-50

It is common practice to compare the fit of non‐nested models using the Akaike (AIC) or Bayesian (BIC) information criteria. The basis of these criteria is the log‐likelihood evaluated at the maximum likelihood estimates of the unknown parameters. For the general linear model (and the linear mixed model, which is a special case), estimation is usually carried out using residual or restricted maximum likelihood (REML). However, for models with different fixed effects, the residual likelihoods are not comparable and hence information criteria based on the residual likelihood cannot be used. For model selection, it is often suggested that the models are refitted using maximum likelihood to enable the criteria to be used. The first aim of this paper is to highlight that both the AIC and BIC can be used for the general linear model by using the full log‐likelihood evaluated at the REML estimates. The second aim is to provide a derivation of the criteria under REML estimation. This aim is achieved by noting that the full likelihood can be decomposed into a marginal (residual) and conditional likelihood and this decomposition then incorporates aspects of both the fixed effects and variance parameters. Using this decomposition, the appropriate information criteria for model selection of models which differ in their fixed effects specification can be derived. An example is presented to illustrate the results and code is available for analyses using the ASReml‐R package. 相似文献

2.

Bias‐reduced marginal Akaike information criteria based on a Monte Carlo method for linear mixed‐effects models

Wataru Sakamoto 《Scandinavian Journal of Statistics》2019,46(1):87-115

In linear mixed‐effects (LME) models, if a fitted model has more random‐effect terms than the true model, a regularity condition required in the asymptotic theory may not hold. In such cases, the marginal Akaike information criterion (AIC) is positively biased for (?2) times the expected log‐likelihood. The asymptotic bias of the maximum log‐likelihood as an estimator of the expected log‐likelihood is evaluated for LME models with balanced design in the context of parameter‐constrained models. Moreover, bias‐reduced marginal AICs for LME models based on a Monte Carlo method are proposed. The performance of the proposed criteria is compared with existing criteria by using example data and by a simulation study. It was found that the bias of the proposed criteria was smaller than that of the existing marginal AIC when a larger model was fitted and that the probability of choosing a smaller model incorrectly was decreased. 相似文献

3.

Markov Poisson regression models for discrete time series. Part 1: Methodology

Peiming Wang Martin L. Puterman 《Journal of applied statistics》1999,26(7):855-869

This paper proposes and investigates a class of Markov Poisson regression models in which Poisson rate functions of covariates are conditional on unobserved states which follow a finite-state Markov chain. Features of the proposed model, estimation, inference, bootstrap confidence intervals, model selection and other implementation issues are discussed. Monte Carlo studies suggest that the proposed estimation method is accurate and reliable for single- and multiple-subject time series data; the choice of starting probabilities for the Markov process has little eff ect on the parameter estimates; and penalized likelihood criteria are reliable for determining the number of states. Part 2 provides applications of the proposed model. 相似文献

4.

Smoothing for discrete-valued time series

Zongwu Cai Qiwei Yao & Wenyang Zhang 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2001,63(2):357-375

We deal with smoothed estimators for conditional probability functions of discrete-valued time series { Y_t } under two different settings. When the conditional distribution of Y_t given its lagged values falls in a parametric family and depends on exogenous random variables, a smoothed maximum (partial) likelihood estimator for the unknown parameter is proposed. While there is no prior information on the distribution, various nonparametric estimation methods have been compared and the adjusted Nadaraya–Watson estimator stands out as it shares the advantages of both Nadaraya–Watson and local linear regression estimators. The asymptotic normality of the estimators proposed has been established in the manner of sparse asymptotics, which shows that the smoothed methods proposed outperform their conventional, unsmoothed, parametric counterparts under very mild conditions. Simulation results lend further support to this assertion. Finally, the new method is illustrated via a real data set concerning the relationship between the number of daily hospital admissions and the levels of pollutants in Hong Kong in 1994–1995. An ad hoc model selection procedure based on a local Akaike information criterion is proposed to select the significant pollutant indices. 相似文献

5.

MODEL SELECTION CRITERIA FOR LOGLINEAR MODELS

Edward J. Bedrick Winston K. Crandall 《Australian & New Zealand Journal of Statistics》2010,52(4):439-449

Considerable work has been devoted to developing model selection criteria for normal theory regression models. Less attention has been paid to discrete data. We develop two loglinear model selection criteria for Poisson counts. These criteria are based on an estimated bias adjustment of the Akaike information criterion. We observe in a simulation study that the corrected statistics provide good model choices and relatively accurate estimates of the mean structure. 相似文献

6.

Markov regression models for count time series with excess zeros: A partial likelihood approach

《Statistical Methodology》2013

Count data with excess zeros are common in many biomedical and public health applications. The zero-inflated Poisson (ZIP) regression model has been widely used in practice to analyze such data. In this paper, we extend the classical ZIP regression framework to model count time series with excess zeros. A Markov regression model is presented and developed, and the partial likelihood is employed for statistical inference. Partial likelihood inference has been successfully applied in modeling time series where the conditional distribution of the response lies within the exponential family. Extending this approach to ZIP time series poses methodological and theoretical challenges, since the ZIP distribution is a mixture and therefore lies outside the exponential family. In the partial likelihood framework, we develop an EM algorithm to compute the maximum partial likelihood estimator (MPLE). We establish the asymptotic theory of the MPLE under mild regularity conditions and investigate its finite sample behavior in a simulation study. The performances of different partial-likelihood based model selection criteria are compared in the presence of model misspecification. Finally, we present an epidemiological application to illustrate the proposed methodology. 相似文献

7.

Statistical Inference for the Multidimensional Mixed Rasch Model

Mohand L. Feddag 《统计学通讯:模拟与计算》2013,42(9):1732-1749

Inference in generalized linear mixed models with multivariate random effects is often made cumbersome by the high-dimensional intractable integrals involved in the marginal likelihood. This article presents an inferential methodology based on the GEE approach. This method involves the approximations of the marginal likelihood and joint moments of the variables. It is also proposed an approximate Akaike and Bayesian information criterions based on the approximate marginal likelihood using the estimation of the parameters by the GEE approach. The different results are illustrated with a simulation study and with an analysis of real data from health-related quality of life. 相似文献

8.

Local influence for generalized linear mixed models

Hong‐Tu Zhu Sik‐Yum Lee 《Revue canadienne de statistique》2003,31(3):293-309

The authors describe a method for assessing model inadequacy in maximum likelihood estimation of a generalized linear mixed model. They treat the latent random effects in the model as missing data and develop the influence analysis on the basis of a Q‐function which is associated with the conditional expectation of the complete‐data log‐likelihood function in the EM algorithm. They propose a procedure to detect influential observations in six model perturbation schemes. They also illustrate their methodology in a hypothetical situation and in two real cases. 相似文献

9.

Model selection criteria for dual-inflated data

Ting Hsiang Lin Min-Hsiao Tsai 《Journal of Statistical Computation and Simulation》2016,86(13):2663-2672

ABSTRACT

Inflated data are prevalent in many situations and a variety of inflated models with extensions have been derived to fit data with excessive counts of some particular responses. The family of information criteria (IC) has been used to compare the fit of models for selection purposes. Yet despite the common use in statistical applications, there are not too many studies evaluating the performance of IC in inflated models. In this study, we studied the performance of IC for data with dual-inflated data. The new zero- and K-inflated Poisson (ZKIP) regression model and conventional inflated models including Poisson regression and zero-inflated Poisson (ZIP) regression were fitted for dual-inflated data and the performance of IC were compared. The effect of sample sizes and the proportions of inflated observations towards selection performance were also examined. The results suggest that the Bayesian information criterion (BIC) and consistent Akaike information criterion (CAIC) are more accurate than the Akaike information criterion (AIC) in terms of model selection when the true model is simple (i.e. Poisson regression (POI)). For more complex models, such as ZIP and ZKIP, the AIC was consistently better than the BIC and CAIC, although it did not reach high levels of accuracy when sample size and the proportion of zero observations were small. The AIC tended to over-fit the data for the POI, whereas the BIC and CAIC tended to under-parameterize the data for ZIP and ZKIP. Therefore, it is desirable to study other model selection criteria for dual-inflated data with small sample size. 相似文献

10.

Efficient Robust Estimation for Linear Models with Missing Response at Random

《Scandinavian Journal of Statistics》2018,45(2):366-381

Coefficient estimation in linear regression models with missing data is routinely carried out in the mean regression framework. However, the mean regression theory breaks down if the error variance is infinite. In addition, correct specification of the likelihood function for existing imputation approach is often challenging in practice, especially for skewed data. In this paper, we develop a novel composite quantile regression and a weighted quantile average estimation procedure for parameter estimation in linear regression models when some responses are missing at random. Instead of imputing the missing response by randomly drawing from its conditional distribution, we propose to impute both missing and observed responses by their estimated conditional quantiles given the observed data and to use the parametrically estimated propensity scores to weigh check functions that define a regression parameter. Both estimation procedures are resistant to heavy‐tailed errors or outliers in the response and can achieve nice robustness and efficiency. Moreover, we propose adaptive penalization methods to simultaneously select significant variables and estimate unknown parameters. Asymptotic properties of the proposed estimators are carefully investigated. An efficient algorithm is developed for fast implementation of the proposed methodologies. We also discuss a model selection criterion, which is based on an IC_Q‐type statistic, to select the penalty parameters. The performance of the proposed methods is illustrated via simulated and real data sets. 相似文献

11.

Conditional Akaike information under covariate shift with application to small area estimation

下载免费PDF全文

Yuki Kawakubo Shonosuke Sugasawa Tatsuya Kubokawa 《Revue canadienne de statistique》2018,46(2):316-335

In this study, we consider the problem of selecting explanatory variables of fixed effects in linear mixed models under covariate shift, which is when the values of covariates in the model for prediction differ from those in the model for observed data. We construct a variable selection criterion based on the conditional Akaike information introduced by Vaida & Blanchard (2005). We focus especially on covariate shift in small area estimation and demonstrate the usefulness of the proposed criterion. In addition, numerical performance is investigated through simulations, one of which is a design‐based simulation using a real dataset of land prices. The Canadian Journal of Statistics 46: 316–335; 2018 © 2018 Statistical Society of Canada 相似文献

12.

Classical model selection via simulated annealing

S. P. Brooks N. Friel R. King 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2003,65(2):503-520

Summary. The classical approach to statistical analysis is usually based upon finding values for model parameters that maximize the likelihood function. Model choice in this context is often also based on the likelihood function, but with the addition of a penalty term for the number of parameters. Though models may be compared pairwise by using likelihood ratio tests for example, various criteria such as the Akaike information criterion have been proposed as alternatives when multiple models need to be compared. In practical terms, the classical approach to model selection usually involves maximizing the likelihood function associated with each competing model and then calculating the corresponding criteria value(s). However, when large numbers of models are possible, this quickly becomes infeasible unless a method that simultaneously maximizes over both parameter and model space is available. We propose an extension to the traditional simulated annealing algorithm that allows for moves that not only change parameter values but also move between competing models. This transdimensional simulated annealing algorithm can therefore be used to locate models and parameters that minimize criteria such as the Akaike information criterion, but within a single algorithm, removing the need for large numbers of simulations to be run. We discuss the implementation of the transdimensional simulated annealing algorithm and use simulation studies to examine its performance in realistically complex modelling situations. We illustrate our ideas with a pedagogic example based on the analysis of an autoregressive time series and two more detailed examples: one on variable selection for logistic regression and the other on model selection for the analysis of integrated recapture–recovery data. 相似文献

13.

Likelihood Inference for Unions of Interacting Discs

JESPER MØLLER KATEŘINA HELISOVÁ 《Scandinavian Journal of Statistics》2010,37(3):365-381

Abstract. This is probably the first paper which discusses likelihood inference for a random set using a germ‐grain model, where the individual grains are unobservable, edge effects occur and other complications appear. We consider the case where the grains form a disc process modelled by a marked point process, where the germs are the centres and the marks are the associated radii of the discs. We propose to use a recent parametric class of interacting disc process models, where the minimal sufficient statistic depends on various geometric properties of the random set, and the density is specified with respect to a given marked Poisson model (i.e. a Boolean model). We show how edge effects and other complications can be handled by considering a certain conditional likelihood. Our methodology is illustrated by analysing Peter Diggle's heather data set, where we discuss the results of simulation‐based maximum likelihood inference and the effect of specifying different reference Poisson models. 相似文献

14.

Empirical Uncertain Bayes Methods in Area‐level Models

下载免费PDF全文

Shonosuke Sugasawa Tatsuya Kubokawa Kota Ogasawara 《Scandinavian Journal of Statistics》2017,44(3):684-706

Random effects model can account for the lack of fitting a regression model and increase precision of estimating area‐level means. However, in case that the synthetic mean provides accurate estimates, the prior distribution may inflate an estimation error. Thus, it is desirable to consider the uncertain prior distribution, which is expressed as the mixture of a one‐point distribution and a proper prior distribution. In this paper, we develop an empirical Bayes approach for estimating area‐level means, using the uncertain prior distribution in the context of a natural exponential family, which we call the empirical uncertain Bayes (EUB) method. The regression model considered in this paper includes the Poisson‐gamma and the binomial‐beta, and the normal‐normal (Fay–Herriot) model, which are typically used in small area estimation. We obtain the estimators of hyperparameters based on the marginal likelihood by using a well‐known expectation‐maximization algorithm and propose the EUB estimators of area means. For risk evaluation of the EUB estimator, we derive a second‐order unbiased estimator of a conditional mean squared error by using some techniques of numerical calculation. Through simulation studies and real data applications, we evaluate a performance of the EUB estimator and compare it with the usual empirical Bayes estimator. 相似文献

15.

Functional approach of flexibly modelling generalized longitudinal data and survival time

Fang Yao 《Journal of statistical planning and inference》2008

We propose a flexible functional approach for modelling generalized longitudinal data and survival time using principal components. In the proposed model the longitudinal observations can be continuous or categorical data, such as Gaussian, binomial or Poisson outcomes. We generalize the traditional joint models that treat categorical data as continuous data by using some transformations, such as CD4 counts. The proposed model is data-adaptive, which does not require pre-specified functional forms for longitudinal trajectories and automatically detects characteristic patterns. The longitudinal trajectories observed with measurement error or random error are represented by flexible basis functions through a possibly nonlinear link function, combining dimension reduction techniques resulting from functional principal component (FPC) analysis. The relationship between the longitudinal process and event history is assessed using a Cox regression model. Although the proposed model inherits the flexibility of non-parametric methods, the estimation procedure based on the EM algorithm is still parametric in computation, and thus simple and easy to implement. The computation is simplified by dimension reduction for random coefficients or FPC scores. An iterative selection procedure based on Akaike information criterion (AIC) is proposed to choose the tuning parameters, such as the knots of spline basis and the number of FPCs, so that appropriate degree of smoothness and fluctuation can be addressed. The effectiveness of the proposed approach is illustrated through a simulation study, followed by an application to longitudinal CD4 counts and survival data which were collected in a recent clinical trial to compare the efficiency and safety of two antiretroviral drugs. 相似文献

16.

Focused information criteria for copulas

Vinnie Ko Nils Lid Hjort Ingrid Hobk Haff 《Scandinavian Journal of Statistics》2019,46(4):1117-1140

In this paper, we extend the focused information criterion (FIC) to copula models. Copulas are often used for applications where the joint tail behavior of the variables is of particular interest, and selecting a copula that captures this well is then essential. Traditional model selection methods such as the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) aim at finding the overall best‐fitting model, which is not necessarily the one best suited for the application at hand. The FIC, on the other hand, evaluates and ranks candidate models based on the precision of their point estimates of a context‐given focus parameter. This could be any quantity of particular interest, for example, the mean, a correlation, conditional probabilities, or measures of tail dependence. We derive FIC formulae for the maximum likelihood estimator, the two‐stage maximum likelihood estimator, and the so‐called pseudo‐maximum‐likelihood (PML) estimator combined with parametric margins. Furthermore, we confirm the validity of the AIC formula for the PML estimator combined with parametric margins. To study the numerical behavior of FIC, we have carried out a simulation study, and we have also analyzed a multivariate data set pertaining to abalones. The results from the study show that the FIC successfully ranks candidate models in terms of their performance, defined as how well they estimate the focus parameter. In terms of estimation precision, FIC clearly outperforms AIC, especially when the focus parameter relates to only a specific part of the model, such as the conditional upper‐tail probability. 相似文献

17.

A jackknife type approach to statistical model selection

Hyunsook Lee G. Jogesh Babu 《Journal of statistical planning and inference》2012,142(1):301-311

相似文献

18.

Random effects regression models for count data with excess zeros in caries research

D. Todem Y. Zhang A. Ismail W. Sohn 《Journal of applied statistics》2010,37(10):1661-1679

We extend the family of Poisson and negative binomial models to derive the joint distribution of clustered count outcomes with extra zeros. Two random effects models are formulated. The first model assumes a shared random effects term between the conditional probability of perfect zeros and the conditional mean of the imperfect state. The second formulation relaxes the shared random effects assumption by relating the conditional probability of perfect zeros and the conditional mean of the imperfect state to two different but correlated random effects variables. Under the conditional independence and the missing data at random assumption, a direct optimization of the marginal likelihood and an EM algorithm are proposed to fit the proposed models. Our proposed models are fitted to dental caries counts of children under the age of six in the city of Detroit. 相似文献

19.

Optimal design for quasi-likelihood estimation in Poisson regression with random coefficients

Mehrdad Niaparast Rainer Schwabe 《Journal of statistical planning and inference》2013

Count data may be described by a Poisson regression model. If random coefficients are involved, maximum likelihood is not feasible and alternative estimation methods have to be employed. For the approach based on quasi-likelihood estimation a characterization of design optimality is derived and optimal designs are determined numerically for an example with random slope parameters. 相似文献

20.

Random effects regression mixtures for analyzing infant habituation

Derek S. Young David R. Hunter 《Journal of applied statistics》2015,42(7):1421-1441

Random effects regression mixture models are a way to classify longitudinal data (or trajectories) having possibly varying lengths. The mixture structure of the traditional random effects regression mixture model arises through the distribution of the random regression coefficients, which is assumed to be a mixture of multivariate normals. An extension of this standard model is presented that accounts for various levels of heterogeneity among the trajectories, depending on their assumed error structure. A standard likelihood ratio test is presented for testing this error structure assumption. Full details of an expectation-conditional maximization algorithm for maximum likelihood estimation are also presented. This model is used to analyze data from an infant habituation experiment, where it is desirable to assess whether infants comprise different populations in terms of their habituation time. 相似文献