首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
As new technologies permit the generation of hitherto unprecedented volumes of data (e.g. genome-wide association study data), researchers struggle to keep up with the added complexity and time commitment required for its analysis. For this reason, model selection commonly relies on machine learning and data-reduction techniques, which tend to afford models with obscure interpretations. Even in cases with straightforward explanatory variables, the so-called ‘best’ model produced by a given model-selection technique may fail to capture information of vital importance to the domain-specific questions at hand. Herein we propose a new concept for model selection, feasibility, for use in identifying multiple models that are in some sense optimal and may unite to provide a wider range of information relevant to the topic of interest, including (but not limited to) interaction terms. We further provide an R package and associated Shiny Applications for use in identifying or validating feasible models, the performance of which we demonstrate on both simulated and real-life data.  相似文献   

2.
Multiple-membership logit models with random effects are models for clustered binary data, where each statistical unit can belong to more than one group. The likelihood function of these models is analytically intractable. We propose two different approaches for parameter estimation: indirect inference and data cloning (DC). The former is a non-likelihood-based method which uses an auxiliary model to select reasonable estimates. We propose an auxiliary model with the same dimension of parameter space as the target model, which is particularly convenient to reach good estimates very fast. The latter method computes maximum likelihood estimates through the posterior distribution of an adequate Bayesian model, fitted to cloned data. We implement a DC algorithm specifically for multiple-membership models. A Monte Carlo experiment compares the two methods on simulated data. For further comparison, we also report Bayesian posterior mean and Integrated Nested Laplace Approximation hybrid DC estimates. Simulations show a negligible loss of efficiency for the indirect inference estimator, compensated by a relevant computational gain. The approaches are then illustrated with two real examples on matched paired data.  相似文献   

3.
In this study, an evaluation of Bayesian hierarchical models is made based on simulation scenarios to compare single-stage and multi-stage Bayesian estimations. Simulated datasets of lung cancer disease counts for men aged 65 and older across 44 wards in the London Health Authority were analysed using a range of spatially structured random effect components. The goals of this study are to determine which of these single-stage models perform best given a certain simulating model, how estimation methods (single- vs. multi-stage) compare in yielding posterior estimates of fixed effects in the presence of spatially structured random effects, and finally which of two spatial prior models – the Leroux or ICAR model, perform best in a multi-stage context under different assumptions concerning spatial correlation. Among the fitted single-stage models without covariates, we found that when there is low amount of variability in the distribution of disease counts, the BYM model is relatively robust to misspecification in terms of DIC, while the Leroux model is the least robust to misspecification. When these models were fit to data generated from models with covariates, we found that when there was one set of covariates – either spatially correlated or non-spatially correlated, changing the values of the fixed coefficients affected the ability of either the Leroux or ICAR model to fit the data well in terms of DIC. When there were multiple sets of spatially correlated covariates in the simulating model, however, we could not distinguish the goodness of fit to the data between these single-stage models. We found that the multi-stage modelling process via the Leroux and ICAR models generally reduced the variance of the posterior estimated fixed effects for data generated from models with covariates and a UH term compared to analogous single-stage models. Finally, we found the multi-stage Leroux model compares favourably to the multi-stage ICAR model in terms of DIC. We conclude that the mutli-stage Leroux model should be seriously considered in applications of Bayesian disease mapping when an investigator desires to fit a model with both fixed effects and spatially structured random effects to Poisson count data.  相似文献   

4.
We propose several diagnostic methods for checking the adequacy of marginal regression models for analyzing correlated binary data. We use a parametric marginal model based on latent variables and derive the projection (hat) matrix, Cook's distance, various residuals and Mahalanobis distance between the observed binary responses and the estimated probabilities for a cluster. Emphasized are several graphical methods including the simulated Q-Q plot, the half-normal probability plot with a simulated envelope, and the partial residual plot. The methods are illustrated with a real life example.  相似文献   

5.
In this paper the exponentiated-Weibull model is modified to model the possibility that long-term survivors are present in the data. The modification leads to an exponentiated-Weibull mixture model which encompasses as special cases the exponential and Weibull mixture models typically used to model such data. Inference for the model parameters is considered via maximum likelihood and also via Bayesian inference by using Markov chain Monte Carlo simulation. Model comparison is considered by using likelihood ratio statistics and also the pseudo Bayes factor, which can be computed by using the generated samples. An example of a data set is considered for which the exponentiated-Weibull mixture model presents a better fit than the Weibull mixture model. Results of simulation studies are also reported, which show that the likelihood ratio statistics seems to be somewhat deficient for small and moderate sample sizes.  相似文献   

6.
While there has been considerable research on the analysis of extreme values and outliers by using heavy-tailed distributions, little is known about the semi-heavy-tailed behaviors of data when there are a few suspicious outliers. To address the situation where data are skewed possessing semi-heavy tails, we introduce two new skewed distribution families of the hyperbolic secant with exciting properties. We extend the semi-heavy-tailedness property of data to a linear regression model. In particular, we investigate the asymptotic properties of the ML estimators of the regression parameters when the error term has a semi-heavy-tailed distribution. We conduct simulation studies comparing the ML estimators of the regression parameters under various assumptions for the distribution of the error term. We also provide three real examples to show the priority of the semi-heavy-tailedness of the error term comparing to heavy-tailedness. Online supplementary materials for this article are available. All the new proposed models in this work are implemented by the shs R package, which can be found on the GitHub webpage.  相似文献   

7.
The interval-censored survival data appear very frequently, where the event of interest is not observed exactly but it is only known to occur within some time interval. In this paper, we propose a location-scale regression model based on the log-generalized gamma distribution for modelling interval-censored data. We shall be concerned only with parametric forms. The proposed model for interval-censored data represents a parametric family of models that has, as special submodels, other regression models which are broadly used in lifetime data analysis. Assuming interval-censored data, we consider a frequentist analysis, a Jackknife estimator and a non-parametric bootstrap for the model parameters. We derive the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes and present some techniques to perform global influence.  相似文献   

8.
In this article, Bayesian inference for the half-normal and half-t distributions using uninformative priors is considered. It is shown that exact Bayesian inference can be undertaken for the half-normal distribution without the need for Gibbs sampling. Simulation is then used to compare the sampling properties of Bayesian point and interval estimators with those of their maximum likelihood based counterparts. Inference for the half-t distribution based on the use of Gibbs sampling is outlined, and an approach to model comparison based on the use of Bayes factors is discussed. The fitting of the half-normal and half-t models is illustrated using real data on the body fat measurements of elite athletes.  相似文献   

9.
Advances in computation mean that it is now possible to fit a wide range of complex models to data, but there remains the problem of selecting a model on which to base reported inferences. Following an early suggestion of Box & Tiao, it seems reasonable to seek 'inference robustness' in reported models, so that alternative assumptions that are reasonably well supported would not lead to substantially different conclusions. We propose a four-stage modelling strategy in which we iteratively assess and elaborate an initial model, measure the support for each of the resulting family of models, assess the influence of adopting alternative models on the conclusions of primary interest, and identify whether an approximate model can be reported. The influence-support plot is then introduced as a tool to aid model comparison. The strategy is semi-formal, in that it could be embedded in a decision-theoretic framework but requires substantive input for any specific application. The one restriction of the strategy is that the quantity of interest, or 'focus', must retain its interpretation across all candidate models. It is, therefore, applicable to analyses whose goal is prediction, or where a set of common model parameters are of interest and candidate models make alternative distributional assumptions. The ideas are illustrated by two examples. Technical issues include the calibration of the Kullback-Leibler divergence between marginal distributions, and the use of alternative measures of support for the range of models fitted.  相似文献   

10.
A study is carried out of a sampling from a half-normal and exponential distributions to develop a test of hypothesis on the mean. Although these distributions are similar, the corresponding uniformly most paerful test statistics are different. The exact distributions of these statistics my be written in terms of the incomplete gamma function. If the experimental data my be fitted by either distributions, it is advisable to carryout the test based on the half-normal distribution as it is generally more powerful than the one based on the exponential one.  相似文献   

11.
Longitudinal data are commonly modeled with the normal mixed-effects models. Most modeling methods are based on traditional mean regression, which results in non robust estimation when suffering extreme values or outliers. Median regression is also not a best choice to estimation especially for non normal errors. Compared to conventional modeling methods, composite quantile regression can provide robust estimation results even for non normal errors. In this paper, based on a so-called pseudo composite asymmetric Laplace distribution (PCALD), we develop a Bayesian treatment to composite quantile regression for mixed-effects models. Furthermore, with the location-scale mixture representation of the PCALD, we establish a Bayesian hierarchical model and achieve the posterior inference of all unknown parameters and latent variables using Markov Chain Monte Carlo (MCMC) method. Finally, this newly developed procedure is illustrated by some Monte Carlo simulations and a case analysis of HIV/AIDS clinical data set.  相似文献   

12.
This study considers a goodness-of-fit test for location-scale time series models with heteroscedasticity, including a broad class of generalized autoregressive conditional heteroscedastic-type models. In financial time series analysis, the correct identification of model innovations is crucial for further inferences in diverse applications such as risk management analysis. To implement a goodness-of-fit test, we employ the residual-based entropy test generated from the residual empirical process. Since this test often shows size distortions and is affected by parameter estimation, its bootstrap version is considered. It is shown that the bootstrap entropy test is weakly consistent, and thereby its usage is justified. A simulation study and data analysis are conducted by way of an illustration.  相似文献   

13.
We propose a new simulation method, SimSel, for variable selection in linear and nonlinear modelling problems. SimSel works by disturbing the input data with pseudo-errors. We then study how this disturbance affects the quality of an approximative model fitted to the data. The main idea is that disturbing unimportant variables does not affect the quality of the model fit. The use of an approximative model has the advantage that the true underlying function does not need to be known and that the method becomes insensitive to model misspecifications. We demonstrate SimSel on simulated data from linear and nonlinear models and on two real data sets. The simulation studies suggest that SimSel works well in complicated situations, such as nonlinear errors-in-variable models.  相似文献   

14.
We formulate and study a four-parameter lifetime model called the beta extended half-normal distribution. This model includes as sub-models the exponential, extended half-normal and half-normal distributions. We derive expansions for the new density function which do not depend on complicated functions. We obtain explicit expressions for the moments and incomplete moments, generating function, mean deviations, Bonferroni and Lorenz curves and Rényi entropy. In addition, the model parameters are estimated by maximum likelihood. We provide the observed information matrix. The new model is modified to cope with possible long-term survivors in the data. The usefulness of the new distribution is shown by means of two real data sets.  相似文献   

15.
We consider a generalized leverage matrix useful for the identification of influential units and observations in linear mixed models and show how a decomposition of this matrix may be employed to identify high leverage points for both the marginal fitted values and the random effect component of the conditional fitted values. We illustrate the different uses of the two components of the decomposition with a simulated example as well as with a real data set.  相似文献   

16.
In this paper, we focus on models for recovery data from birds ringed as young. In some cases, it is important to be able to include in these models a degree of age variation in the reporting probability. For certain models this has been found, empirically, to result in completely flat likelihood surfaces, due to parameter redundancy. These models cannot then be fitted to the data, to produce unique parameter estimates. However, empirical evidence also exists that other models with such age variation can be fitted to data by maximum likelihood. Using the approach of Catchpole and Morgan (1996b), we can now identify which models in this area are parameter-redundant, and which are not. Models which are not parameter-redundant may still perform poorly in practice, and this is investigated through examples, involving both real and simulated data. The Akaike Information Criterion is found to select inappropriate models in a number of instances. The paper ends with guidelines for fitting models to data from birds ringed as young, when age dependence is expected in the reporting probability.  相似文献   

17.
This paper provides alternative methods for fitting symmetry and diagonal-parameters symmetry models to square tables having ordered categories. We demonstrate here the implementation of the class of models discussed in Goodman (1979c) using GEN-MOD in SAS. We also provide procedures for testing hypotheses involving model parameters. The methodology provided here can readily be used to fit the class of models discussed in Lawal and Upton (1995). If desired, composite models can be fitted. Two data sets, the 4 × 4 unaided distance vision of 4746 Japanese students Tomizawa (1985) and the 5 × 5 British social mobility data Glass (1954) are employed to demonstrate the fitting of these models. Results obtained are consistent with those from Goodman (1972, 1979c, 1986) and Tomizawa (1985, 1987).  相似文献   

18.
In this paper we propose a general cure rate aging model. Our approach enables different underlying activation mechanisms which lead to the event of interest. The number of competing causes of the event of interest is assumed to follow a logarithmic distribution. The model is parameterized in terms of the cured fraction which is then linked to covariates. We explore the use of Markov chain Monte Carlo methods to develop a Bayesian analysis for the proposed model. Moreover, some discussions on the model selection to compare the fitted models are given, as well as case deletion influence diagnostics are developed for the joint posterior distribution based on the ψ-divergence, which has several divergence measures as particular cases, such as the Kullback–Leibler (K-L), J-distance, L1 norm, and χ2-square divergence measures. Simulation studies are performed and experimental results are illustrated based on a real malignant melanoma data.  相似文献   

19.

Cluster point processes comprise a class of models that have been used for a wide range of applications. While several models have been studied for the probability density function of the offspring displacements and the parent point process, there are few examples of non-Poisson distributed cluster sizes. In this paper, we introduce a generalization of the Thomas process, which allows for the cluster sizes to have a variance that is greater or less than the expected value. We refer to this as the cluster sizes being over- and under-dispersed, respectively. To fit the model, we introduce minimum contrast methods and a Bayesian MCMC algorithm. These are evaluated in a simulation study. It is found that using the Bayesian MCMC method, we are in most cases able to detect over- and under-dispersion in the cluster sizes. We use the MCMC method to fit the model to nerve fiber data, and contrast the results to those of a fitted Thomas process.

  相似文献   

20.
Since the pioneering work by Koenker and Bassett [27], quantile regression models and its applications have become increasingly popular and important for research in many areas. In this paper, a random effects ordinal quantile regression model is proposed for analysis of longitudinal data with ordinal outcome of interest. An efficient Gibbs sampling algorithm was derived for fitting the model to the data based on a location-scale mixture representation of the skewed double-exponential distribution. The proposed approach is illustrated using simulated data and a real data example. This is the first work to discuss quantile regression for analysis of longitudinal data with ordinal outcome.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号