首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
SUMMARY We compare properties of parameter estimators under Akaike information criterion (AIC) and 'consistent' AIC (CAIC) model selection in a nested sequence of open population capture-recapture models. These models consist of product multinomials, where the cell probabilities are parameterized in terms of survival ( ) and capture ( p ) i i probabilities for each time interval i . The sequence of models is derived from 'treatment' effects that might be (1) absent, model H ; (2) only acute, model H ; or (3) acute and 0 2 p chronic, lasting several time intervals, model H . Using a 35 factorial design, 1000 3 repetitions were simulated for each of 243 cases. The true number of parameters ranged from 7 to 42, and the sample size ranged from approximately 470 to 55 000 per case. We focus on the quality of the inference about the model parameters and model structure that results from the two selection criteria. We use achieved confidence interval coverage as an integrating metric to judge what constitutes a 'properly parsimonious' model, and contrast the performance of these two model selection criteria for a wide range of models, sample sizes, parameter values and study interval lengths. AIC selection resulted in models in which the parameters were estimated with relatively little bias. However, these models exhibited asymptotic sampling variances that were somewhat too small, and achieved confidence interval coverage that was somewhat below the nominal level. In contrast, CAIC-selected models were too simple, the parameter estimators were often substantially biased, the asymptotic sampling variances were substantially too small and the achieved coverage was often substantially below the nominal level. An example case illustrates a pattern: with 20 capture occasions, 300 previously unmarked animals are released at each occasion, and the survival and capture probabilities in the control group on each occasion were 0.9 and 0.8 respectively using model H . There was a strong acute treatment effect 3 on the first survival ( ) and first capture probability ( p ), and smaller, chronic effects 1 2 on the second and third survival probabilities ( and ) as well as on the second capture 2 3 probability ( p ); the sample size for each repetition was approximately 55 000. CAIC 3 selection led to a model with exactly these effects in only nine of the 1000 repetitions, compared with 467 times under AIC selection. Under CAIC selection, even the two acute effects were detected only 555 times, compared with 998 for AIC selection. AIC selection exhibited a balance between underfitted and overfitted models (270 versus 263), while CAIC tended strongly to select underfitted models. CAIC-selected models were overly parsimonious and poor as a basis for statistical inferences about important model parameters or structure. We recommend the use of the AIC and not the CAIC for analysis and inference from capture-recapture data sets.  相似文献   

2.
ABSTRACT

Inflated data are prevalent in many situations and a variety of inflated models with extensions have been derived to fit data with excessive counts of some particular responses. The family of information criteria (IC) has been used to compare the fit of models for selection purposes. Yet despite the common use in statistical applications, there are not too many studies evaluating the performance of IC in inflated models. In this study, we studied the performance of IC for data with dual-inflated data. The new zero- and K-inflated Poisson (ZKIP) regression model and conventional inflated models including Poisson regression and zero-inflated Poisson (ZIP) regression were fitted for dual-inflated data and the performance of IC were compared. The effect of sample sizes and the proportions of inflated observations towards selection performance were also examined. The results suggest that the Bayesian information criterion (BIC) and consistent Akaike information criterion (CAIC) are more accurate than the Akaike information criterion (AIC) in terms of model selection when the true model is simple (i.e. Poisson regression (POI)). For more complex models, such as ZIP and ZKIP, the AIC was consistently better than the BIC and CAIC, although it did not reach high levels of accuracy when sample size and the proportion of zero observations were small. The AIC tended to over-fit the data for the POI, whereas the BIC and CAIC tended to under-parameterize the data for ZIP and ZKIP. Therefore, it is desirable to study other model selection criteria for dual-inflated data with small sample size.  相似文献   

3.
In real‐data analysis, deciding the best subset of variables in regression models is an important problem. Akaike's information criterion (AIC) is often used in order to select variables in many fields. When the sample size is not so large, the AIC has a non‐negligible bias that will detrimentally affect variable selection. The present paper considers a bias correction of AIC for selecting variables in the generalized linear model (GLM). The GLM can express a number of statistical models by changing the distribution and the link function, such as the normal linear regression model, the logistic regression model, and the probit model, which are currently commonly used in a number of applied fields. In the present study, we obtain a simple expression for a bias‐corrected AIC (corrected AIC, or CAIC) in GLMs. Furthermore, we provide an ‘R’ code based on our formula. A numerical study reveals that the CAIC has better performance than the AIC for variable selection.  相似文献   

4.
I review the use of auxiliary variables in capture-recapture models for estimation of demographic parameters (e.g. capture probability, population size, survival probability, and recruitment, emigration and immigration numbers). I focus on what has been done in current research and what still needs to be done. Typically in the literature, covariate modelling has made capture and survival probabilities functions of covariates, but there are good reasons also to make other parameters functions of covariates as well. The types of covariates considered include environmental covariates that may vary by occasion but are constant over animals, and individual animal covariates that are usually assumed constant over time. I also discuss the difficulties of using time-dependent individual animal covariates and some possible solutions. Covariates are usually assumed to be measured without error, and that may not be realistic. For closed populations, one approach to modelling heterogeneity in capture probabilities uses observable individual covariates and is thus related to the primary purpose of this paper. The now standard Huggins-Alho approach conditions on the captured animals and then uses a generalized Horvitz-Thompson estimator to estimate population size. This approach has the advantage of simplicity in that one does not have to specify a distribution for the covariates, and the disadvantage is that it does not use the full likelihood to estimate population size. Alternately one could specify a distribution for the covariates and implement a full likelihood approach to inference to estimate the capture function, the covariate probability distribution, and the population size. The general Jolly-Seber open model enables one to estimate capture probability, population sizes, survival rates, and birth numbers. Much of the focus on modelling covariates in program MARK has been for survival and capture probability in the Cormack-Jolly-Seber model and its generalizations (including tag-return models). These models condition on the number of animals marked and released. A related, but distinct, topic is radio telemetry survival modelling that typically uses a modified Kaplan-Meier method and Cox proportional hazards model for auxiliary variables. Recently there has been an emphasis on integration of recruitment in the likelihood, and research on how to implement covariate modelling for recruitment and perhaps population size is needed. The combined open and closed 'robust' design model can also benefit from covariate modelling and some important options have already been implemented into MARK. Many models are usually fitted to one data set. This has necessitated development of model selection criteria based on the AIC (Akaike Information Criteria) and the alternative of averaging over reasonable models. The special problems of estimating over-dispersion when covariates are included in the model and then adjusting for over-dispersion in model selection could benefit from further research.  相似文献   

5.
I review the use of auxiliary variables in capture-recapture models for estimation of demographic parameters (e.g. capture probability, population size, survival probability, and recruitment, emigration and immigration numbers). I focus on what has been done in current research and what still needs to be done. Typically in the literature, covariate modelling has made capture and survival probabilities functions of covariates, but there are good reasons also to make other parameters functions of covariates as well. The types of covariates considered include environmental covariates that may vary by occasion but are constant over animals, and individual animal covariates that are usually assumed constant over time. I also discuss the difficulties of using time-dependent individual animal covariates and some possible solutions. Covariates are usually assumed to be measured without error, and that may not be realistic. For closed populations, one approach to modelling heterogeneity in capture probabilities uses observable individual covariates and is thus related to the primary purpose of this paper. The now standard Huggins-Alho approach conditions on the captured animals and then uses a generalized Horvitz-Thompson estimator to estimate population size. This approach has the advantage of simplicity in that one does not have to specify a distribution for the covariates, and the disadvantage is that it does not use the full likelihood to estimate population size. Alternately one could specify a distribution for the covariates and implement a full likelihood approach to inference to estimate the capture function, the covariate probability distribution, and the population size. The general Jolly-Seber open model enables one to estimate capture probability, population sizes, survival rates, and birth numbers. Much of the focus on modelling covariates in program MARK has been for survival and capture probability in the Cormack-Jolly-Seber model and its generalizations (including tag-return models). These models condition on the number of animals marked and released. A related, but distinct, topic is radio telemetry survival modelling that typically uses a modified Kaplan-Meier method and Cox proportional hazards model for auxiliary variables. Recently there has been an emphasis on integration of recruitment in the likelihood, and research on how to implement covariate modelling for recruitment and perhaps population size is needed. The combined open and closed 'robust' design model can also benefit from covariate modelling and some important options have already been implemented into MARK. Many models are usually fitted to one data set. This has necessitated development of model selection criteria based on the AIC (Akaike Information Criteria) and the alternative of averaging over reasonable models. The special problems of estimating over-dispersion when covariates are included in the model and then adjusting for over-dispersion in model selection could benefit from further research.  相似文献   

6.
Existing models for ring recovery and recapture data analysis treat temporal variations in annual survival probability (S) as fixed effects. Often there is no explainable structure to the temporal variation in S 1 , … , S k ; random effects can then be a useful model: Si = E(S) + k i . Here, the temporal variation in survival probability is treated as random with average value E( k 2 ) = σ 2 . This random effects model can now be fit in program MARK. Resultant inferences include point and interval estimation for process variation, σ 2 , estimation of E(S) and var(Ê(S)) where the latter includes a component for σ 2 as well as the traditional component for v ar(S|S). Furthermore, the random effects model leads to shrinkage estimates, S i , as improved (in mean square error) estimators of Si compared to the MLE, S i , from the unrestricted time-effects model. Appropriate confidence intervals based on the S i are also provided. In addition, AIC has been generalized to random effects models. This paper presents results of a Monte Carlo evaluation of inference performance under the simple random effects model. Examined by simulation, under the simple one group Cormack-Jolly-Seber (CJS) model, are issues such as bias of σ 2 , confidence interval coverage on σ 2 , coverage and mean square error comparisons for inference about Si based on shrinkage versus maximum likelihood estimators, and performance of AIC model selection over three models: S i = S (no effects), Si = E(S) + k i (random effects), and S 1 , … , S k (fixed effects). For the cases simulated, the random effects methods performed well and were uniformly better than fixed effects MLE for the S i .  相似文献   

7.
A model for analyzing release-recapture data is presented that generalizes a previously existing individual covariate model to include multiple groups of animals. As in the previous model, the generalized version includes selection parameters that relate individual covariates to survival potential. Significance of the selection parameters was equivalent to significance of the individual covariates. Simulation studies were conducted to investigate three inferential properties with respect to the selection parameters: (1) sample size requirements, (2) validity of the likelihood ratio test (LRT) and (3) power of the LRT. When the survival and capture probabilities ranged from 0.5 to 1.0, a total sample size of 300 was necessary to achieve a power of 0.80 at a significance level of 0.1 when testing the significance of the selection parameters. However, only half that (a total of 150) was necessary for the distribution of the maximum likelihood estimators of the selection parameters to approximate their asymptotic distributions. In general, as the survival and capture probabilities decreased, the sample size requirements increased. The validity of the LRT for testing the significance of the selection parameters was confirmed because the LRT statistic was distributed as theoretically expected under the null hypothesis, i.e. like a chi 2 random variable. When the baseline survival model was fully parameterized with population and interval effects, the LRT was also valid in the presence of unaccounted for random variation. The power of the LRT for testing the selection parameters was unaffected by over-parameterization of the baseline survival and capture models. The simulation studies showed that for testing the significance of individual covariates to survival the LRT was remarkably robust to assumption violations.  相似文献   

8.
A model for analyzing release-recapture data is presented that generalizes a previously existing individual covariate model to include multiple groups of animals. As in the previous model, the generalized version includes selection parameters that relate individual covariates to survival potential. Significance of the selection parameters was equivalent to significance of the individual covariates. Simulation studies were conducted to investigate three inferential properties with respect to the selection parameters: (1) sample size requirements, (2) validity of the likelihood ratio test (LRT) and (3) power of the LRT. When the survival and capture probabilities ranged from 0.5 to 1.0, a total sample size of 300 was necessary to achieve a power of 0.80 at a significance level of 0.1 when testing the significance of the selection parameters. However, only half that (a total of 150) was necessary for the distribution of the maximum likelihood estimators of the selection parameters to approximate their asymptotic distributions. In general, as the survival and capture probabilities decreased, the sample size requirements increased. The validity of the LRT for testing the significance of the selection parameters was confirmed because the LRT statistic was distributed as theoretically expected under the null hypothesis, i.e. like a chi 2 random variable. When the baseline survival model was fully parameterized with population and interval effects, the LRT was also valid in the presence of unaccounted for random variation. The power of the LRT for testing the selection parameters was unaffected by over-parameterization of the baseline survival and capture models. The simulation studies showed that for testing the significance of individual covariates to survival the LRT was remarkably robust to assumption violations.  相似文献   

9.
We conducted an experiment to examine the effect of neckbands, controlling for differences in sex, species and year of study (1991-1997), on probabilities of capture, survival, reporting, and fidelity in non-breeding small Canada ( Branta canadensis hutchinsi ) and white-fronted ( Anser albifrons frontalis ) geese. In Canada's central arctic, we systematically double-marked about half of the individuals from each species with neckbands and legbands, and we marked the other half only with legbands. We considered 48 a priori models that included combinations of sex, species, year, and neckband effects on the four population parameters produced by Burnham's (1993) model, using AIC for model selection. The four best approximating models each included a negative effect of neckbands on survival, and effect size varied among years. True survival probability of neckbanded birds annually ranged from 0.006 to 0.23 and 0.039 to 0.22 (Canada and white-fronted geese, respectively) lower than for conspecifics without neckbands. Changes in estimates of survival probability in neckbanded birds appeared to attenuate more recently, particularly in Canada Geese, a result that we suspect was related to lower retention rates of neckbands. We urge extreme caution in use of neckbands for estimation of certain population parameters, and discourage their use for estimation of unbiased survival probability in these two species.  相似文献   

10.
Jolly-Seber models A, B, D and 2 were used to investigate capture-recapture data. The standard Jolly-Seber model A (time-dependent survival phi and capture probability p ) fits capture-recapture data of migrating passerines. Captures from a long-term mist-netting study (Mettnau Peninsula, SW Germany) at a stop-over site were used to estimate stop-over length from survival rate between days and capture probability. For some data, model 2 could be used, indicating a termporary reduction in 'survival' rate. Application of models B and D gave poor results. The total number of birds stopping over, i.e. population size, was estimated from captures of 1-5 line transects of nets in the spatial trapping design. Behaviour, movements within the stop-over site, catchability and ecophysiological covariables such as moult, fat deposition and climatic parameters are likely to have strong influence on the estimation of capture parameters.  相似文献   

11.
The Fay–Herriot model, a popular approach in small area estimation, uses relevant covariates to improve the inference for quantities of interest in small sub-populations. The conditional Akaike information (AI) (Vaida and Blanchard, 2005 [23]) in linear mixed-effect models with i.i.d. errors can be extended to the Fay–Herriot model for measuring prediction performance. In this paper, we derive the unbiased conditional AIC (cAIC) for three popular approaches to fitting the Fay–Herriot model. The three cAIC have closed forms and are convenient to implement. We conduct a simulation study to demonstrate their accuracy in estimating the conditional AI and superior performance in model selection than the classic AIC. We also apply the cAIC in estimating county-level prevalence rates of obesity for working-age Hispanic females in California.  相似文献   

12.
Summary.  We propose a mixture of binomial and beta–binomial distributions for estimating the size of closed populations. The new mixture model is applied to several real capture–recapture data sets and is shown to provide a convenient, objective framework for model selection. The new model is compared with three alternative models in a simulation study, and the results shed light on the general performance of models in this area. The new model provides a robust flexible analysis, which automatically deals with small capture probabilities.  相似文献   

13.
We consider the Arnason-Schwarz model, usually used to estimate survival and movement probabilities from capture-recapture data. A missing data structure of this model is constructed which allows a clear separation of information relative to capture and relative to movement. Extensions of the Arnason-Schwarz model are considered. For example, we consider a model that takes into account both the individual migration history and the individual reproduction history. Biological assumptions of these extensions are summarized via a directed graph. Owing to missing data, the posterior distribution of parameters is numerically intractable. To overcome those computational difficulties we advocate a Gibbs sampling algorithm that takes advantage of the missing data structure inherent in capture-recapture models. Prior information on survival, capture and movement probabilities typically consists of a prior mean and of a prior 95% credible confidence interval. Dirichlet distributions are used to incorporate some prior information on capture, survival probabilities, and movement probabilities. Finally, the influence of the prior on the Bayesian estimates of movement probabilities is examined.  相似文献   

14.
We consider the Arnason-Schwarz model, usually used to estimate survival and movement probabilities from capture-recapture data. A missing data structure of this model is constructed which allows a clear separation of information relative to capture and relative to movement. Extensions of the Arnason-Schwarz model are considered. For example, we consider a model that takes into account both the individual migration history and the individual reproduction history. Biological assumptions of these extensions are summarized via a directed graph. Owing to missing data, the posterior distribution of parameters is numerically intractable. To overcome those computational difficulties we advocate a Gibbs sampling algorithm that takes advantage of the missing data structure inherent in capture-recapture models. Prior information on survival, capture and movement probabilities typically consists of a prior mean and of a prior 95% credible confidence interval. Dirichlet distributions are used to incorporate some prior information on capture, survival probabilities, and movement probabilities. Finally, the influence of the prior on the Bayesian estimates of movement probabilities is examined.  相似文献   

15.
Model choice is one of the most crucial aspect in any statistical data analysis. It is well known that most models are just an approximation to the true data-generating process but among such model approximations, it is our goal to select the ‘best’ one. Researchers typically consider a finite number of plausible models in statistical applications, and the related statistical inference depends on the chosen model. Hence, model comparison is required to identify the ‘best’ model among several such candidate models. This article considers the problem of model selection for spatial data. The issue of model selection for spatial models has been addressed in the literature by the use of traditional information criteria-based methods, even though such criteria have been developed based on the assumption of independent observations. We evaluate the performance of some of the popular model selection critera via Monte Carlo simulation experiments using small to moderate samples. In particular, we compare the performance of some of the most popular information criteria such as Akaike information criterion (AIC), Bayesian information criterion, and corrected AIC in selecting the true model. The ability of these criteria to select the correct model is evaluated under several scenarios. This comparison is made using various spatial covariance models ranging from stationary isotropic to nonstationary models.  相似文献   

16.
随着大数据和网络的不断发展,网络调查越来越广泛,大部分网络调查样本属于非概率样本,难以采用传统的抽样推断理论进行推断,如何解决网络调查样本的推断问题是大数据背景下网络调查发展的迫切需求。本文首次从建模的角度提出了解决该问题的基本思路:一是入样概率的建模推断,可以考虑构建基于机器学习与变量选择的倾向得分模型来估计入样概率推断总体;二是目标变量的建模推断,可以考虑直接对目标变量建立参数、非参数或半参数超总体模型进行估计;三是入样概率与目标变量的双重建模推断,可以考虑进行倾向得分模型与超总体模型的加权估计与混合推断。最后,以基于广义Boosted模型的入样概率建模推断为例演示了具体解决方法。  相似文献   

17.
Stock & Watson (1999) consider the relative quality of different univariate forecasting techniques. This paper extends their study on forecasting practice, comparing the forecasting performance of two popular model selection procedures, the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). This paper considers several topics: how AIC and BIC choose lags in autoregressive models on actual series, how models so selected forecast relative to an AR(4) model, the effect of using a maximum lag on model selection, and the forecasting performance of combining AR(4), AIC, and BIC models with an equal weight.  相似文献   

18.
The continual reassessment method (CRM) is a commonly used dose-finding design for phase I clinical trials. Practical applications of this method have been restricted by two limitations: (1) the requirement that the toxicity outcome needs to be observed shortly after the initiation of the treatment; and (2) the potential sensitivity to the prespecified toxicity probability at each dose. To overcome these limitations, we naturally treat the unobserved toxicity outcomes as missing data, and use the expectation-maximization (EM) algorithm to estimate the dose toxicity probabilities based on the incomplete data to direct dose assignment. To enhance the robustness of the design, we propose prespecifying multiple sets of toxicity probabilities, each set corresponding to an individual CRM model. We carry out these multiple CRMs in parallel, across which model selection and model averaging procedures are used to make more robust inference. We evaluate the operating characteristics of the proposed robust EM-CRM designs through simulation studies and show that the proposed methods satisfactorily resolve both limitations of the CRM. Besides improving the MTD selection percentage, the new designs dramatically shorten the duration of the trial, and are robust to the prespecification of the toxicity probabilities.  相似文献   

19.
Transition probabilities can be estimated when capture-recapture data are available from each stratum on every capture occasion using a conditional likelihood approach with the Arnason-Schwarz model. To decompose the fundamental transition probabilities into derived parameters, all movement probabilities must sum to 1 and all individuals in stratum r at time i must have the same probability of survival regardless of which stratum the individual is in at time i + 1. If movement occurs among strata at the end of a sampling interval, survival rates of individuals from the same stratum are likely to be equal. However, if movement occurs between sampling periods and survival rates of individuals from the same stratum are not the same, estimates of stratum survival can be confounded with estimates of movement causing both estimates to be biased. Monte Carlo simulations were made of a three-sample model for a population with two strata using SURVIV. When differences were created in transition-specific survival rates for survival rates from the same stratum, relative bias was <2% in estimates of stratum survival and capture rates but relative bias in movement rates was much higher and varied. The magnitude of the relative bias in the movement estimate depended on the relative difference between the transition-specific survival rates and the corresponding stratum survival rate. The direction of the bias in movement rate estimates was opposite to the direction of this difference. Increases in relative bias due to increasing heterogeneity in probabilities of survival, movement and capture were small except when survival and capture probabilities were positively correlated within individuals.  相似文献   

20.
In this paper, we extend the focused information criterion (FIC) to copula models. Copulas are often used for applications where the joint tail behavior of the variables is of particular interest, and selecting a copula that captures this well is then essential. Traditional model selection methods such as the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) aim at finding the overall best‐fitting model, which is not necessarily the one best suited for the application at hand. The FIC, on the other hand, evaluates and ranks candidate models based on the precision of their point estimates of a context‐given focus parameter. This could be any quantity of particular interest, for example, the mean, a correlation, conditional probabilities, or measures of tail dependence. We derive FIC formulae for the maximum likelihood estimator, the two‐stage maximum likelihood estimator, and the so‐called pseudo‐maximum‐likelihood (PML) estimator combined with parametric margins. Furthermore, we confirm the validity of the AIC formula for the PML estimator combined with parametric margins. To study the numerical behavior of FIC, we have carried out a simulation study, and we have also analyzed a multivariate data set pertaining to abalones. The results from the study show that the FIC successfully ranks candidate models in terms of their performance, defined as how well they estimate the focus parameter. In terms of estimation precision, FIC clearly outperforms AIC, especially when the focus parameter relates to only a specific part of the model, such as the conditional upper‐tail probability.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号