首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 624 毫秒
1.
This paper presents a methodology for model fitting and inference in the context of Bayesian models of the type f(Y | X,θ)f(X|θ)f(θ), where Y is the (set of) observed data, θ is a set of model parameters and X is an unobserved (latent) stationary stochastic process induced by the first order transition model f(X (t+1)|X (t),θ), where X (t) denotes the state of the process at time (or generation) t. The crucial feature of the above type of model is that, given θ, the transition model f(X (t+1)|X (t),θ) is known but the distribution of the stochastic process in equilibrium, that is f(X|θ), is, except in very special cases, intractable, hence unknown. A further point to note is that the data Y has been assumed to be observed when the underlying process is in equilibrium. In other words, the data is not collected dynamically over time. We refer to such specification as a latent equilibrium process (LEP) model. It is motivated by problems in population genetics (though other applications are discussed), where it is of interest to learn about parameters such as mutation and migration rates and population sizes, given a sample of allele frequencies at one or more loci. In such problems it is natural to assume that the distribution of the observed allele frequencies depends on the true (unobserved) population allele frequencies, whereas the distribution of the true allele frequencies is only indirectly specified through a transition model. As a hierarchical specification, it is natural to fit the LEP within a Bayesian framework. Fitting such models is usually done via Markov chain Monte Carlo (MCMC). However, we demonstrate that, in the case of LEP models, implementation of MCMC is far from straightforward. The main contribution of this paper is to provide a methodology to implement MCMC for LEP models. We demonstrate our approach in population genetics problems with both simulated and real data sets. The resultant model fitting is computationally intensive and thus, we also discuss parallel implementation of the procedure in special cases.  相似文献   

2.
ABSTRACT

In many clinical studies, patients are followed over time with their responses measured longitudinally. Using mixed model theory, one can characterize these data using a wide array of across subject models. A state-space representation of the mixed effects model and use of the Kalman filter allows one to have great flexibility in choosing the within error correlation structure even in the presence of missing or unequally spaced observations. Furthermore, using the state-space approach, one can avoid inverting large matrices resulting in efficient computation. The approach also allows one to make detailed inference about the error correlation structure. We consider a bivariate situation where the longitudinal responses are unequally spaced and assume that the within subject errors follows a continuous first-order autoregressive (CAR(1)) structure. Since a large number of nonlinear parameters need to be estimated, the modeling strategy and numerical techniques are critical in the process. We developed both a Visual Fortran® and a SAS® program for modeling such data. A simulation study was conducted to investigate the robustness of the model assumptions. We also use data from a psychiatric study to demonstrate our model fitting procedure.  相似文献   

3.
ABSTRACT

A long-standing puzzle in macroeconomic forecasting has been that a wide variety of multivariate models have struggled to out-predict univariate models consistently. We seek an explanation for this puzzle in terms of population properties. We derive bounds for the predictive R2 of the true, but unknown, multivariate model from univariate ARMA parameters alone. These bounds can be quite tight, implying little forecasting gain even if we knew the true multivariate model. We illustrate using CPI inflation data. Supplementary materials for this article are available online.  相似文献   

4.
Popular rank-2 and rank-3 models for two-way tables have geometrical properties which can be used as diagnostic keys in screening for an appropriate model. Row and column levels of two-way tables are represented by points in two or three dimensional space, whereupon collinearity and coplanarity of row and column points provide diagnostic keys for informal model choice. Coordinates are obtained from a factorization of the two-way table Y in the matrix product UV T. The rows of U then contain row-point coordinates and the rows of V column-point coordinates. Illustrations of applications of diagnostic biplots in the literature were restricted to data from chemistry and physics with little or no noise. In plant breeding, two-way tables containing substantial amounts of noise regularly arise in the form of genotype by environment tables. To investigate the usefulness of diagnostic biplots for model screening for genotype by environment tables, data tables were generated from a range of two-way models under the addition of various amounts of noise. Chances for correct diagnosis of the generating model depended on the type of model. Diagnostic biplots on their own do not seem to provide a sufficient means for model selection for genotype by environment tables, but in combination with other methods they certainly can provide extra insight into the structure of the data.  相似文献   

5.
In partly linear models, the dependence of the response y on (x T, t) is modeled through the relationship y=x T β+g(t)+?, where ? is independent of (x T, t). We are interested in developing an estimation procedure that allows us to combine the flexibility of the partly linear models, studied by several authors, but including some variables that belong to a non-Euclidean space. The motivating application of this paper deals with the explanation of the atmospheric SO2 pollution incidents using these models when some of the predictive variables belong in a cylinder. In this paper, the estimators of β and g are constructed when the explanatory variables t take values on a Riemannian manifold and the asymptotic properties of the proposed estimators are obtained under suitable conditions. We illustrate the use of this estimation approach using an environmental data set and we explore the performance of the estimators through a simulation study.  相似文献   

6.
Abstract

In this article, we have considered three different shared frailty models under the assumption of generalized Pareto Distribution as baseline distribution. Frailty models have been used in the survival analysis to account for the unobserved heterogeneity in an individual risks to disease and death. These three frailty models are with gamma frailty, inverse Gaussian frailty and positive stable frailty. Then we introduce the Bayesian estimation procedure using Markov chain Monte Carlo (MCMC) technique to estimate the parameters. We applied these three models to a kidney infection data and find the best fitted model for kidney infection data. We present a simulation study to compare true value of the parameters with the estimated values. Model comparison is made using Bayesian model selection criterion and a well-fitted model is suggested for the kidney infection data.  相似文献   

7.
ABSTRACT

Asymmetric models have been discussed quite extensively in recent years, in situations where the normality assumption is suspected due to lack of symmetry in the data. Techniques for assessing the quality of fit and diagnostic analysis are important for model validation. This paper presents a study of the mean-shift method for the detection of outliers in regression models under skew scale-mixtures of normal distributions. Analytical solutions for the estimators of the parameters are obtained through the use of Expectation–Maximization algorithm. The observed information matrix for the calculation of standard errors is obtained for each distribution. Simulation studies and an application to the analysis of a data have been carried out, showing the efficiency of the proposed method in detecting outliers.  相似文献   

8.
Linear mixed effects model (LMEM) is efficient in modeling repeated measures longitudinal data. However, little research has been done in developing goodness-of-fit measures that can evaluate the models, particularly those that can be interpreted in an absolute sense without referencing a null model. This paper proposes three coefficient of determination (R 2) as goodness-of-fit measures for LMEM with repeated measures longitudinal data. Theorems are presented describing the properties of R 2 and relationships between the R 2 statistics. A simulation study was conducted to evaluate and compare the R 2 along with other criteria from literature. Finally, we applied the proposed R 2 to a real virologic response data of an HIV-patient cohort. We conclude that our proposed R 2 statistics have more advantages than other goodness-of-fit measures in the literature, in terms of robustness to sample size, intuitive interpretation, well-defined range, and unnecessary to determine a null model.  相似文献   

9.
Abstract

This paper is concerned with model averaging procedure for varying-coefficient partially linear models. We proposed a jackknife model averaging method that involves minimizing a leave-one-out cross-validation criterion, and developed a computational shortcut to optimize the cross-validation criterion for weight choice. The resulting model average estimator is shown to be asymptotically optimal in terms of achieving the smallest possible squared error. The simulation studies have provided evidence of the superiority of the proposed procedures. Our approach is further applied to a real data.  相似文献   

10.
ABSTRACT

A general class of models for discrete and/or continuous responses is proposed in which joint distributions are constructed via the conditional approach. It is assumed that the distributions of one response and of the other response given the first one belong to exponential family of distributions. Furthermore, the marginal means are related to the covariates by link functions and a dependency structure between the responses is inserted into the model. Estimation methods, diagnostic analysis and a simulation study considering a Bernoulli-exponential model, a particular case of the class, are presented. Finally, this model is used in a real data set.  相似文献   

11.
ABSTRACT

The paper provides a Bayesian analysis for the zero-inflated regression models based on the generalized power series distribution. The approach is based on Markov chain Monte Carlo methods. The residual analysis is discussed and case-deletion influence diagnostics are developed for the joint posterior distribution, based on the ψ-divergence, which includes several divergence measures such as the Kullback–Leibler, J-distance, L1 norm, and χ2-square in zero-inflated general power series models. The methodology is reflected in a data set collected by wildlife biologists in a state park in California.  相似文献   

12.
ABSTRACT

Inflated data are prevalent in many situations and a variety of inflated models with extensions have been derived to fit data with excessive counts of some particular responses. The family of information criteria (IC) has been used to compare the fit of models for selection purposes. Yet despite the common use in statistical applications, there are not too many studies evaluating the performance of IC in inflated models. In this study, we studied the performance of IC for data with dual-inflated data. The new zero- and K-inflated Poisson (ZKIP) regression model and conventional inflated models including Poisson regression and zero-inflated Poisson (ZIP) regression were fitted for dual-inflated data and the performance of IC were compared. The effect of sample sizes and the proportions of inflated observations towards selection performance were also examined. The results suggest that the Bayesian information criterion (BIC) and consistent Akaike information criterion (CAIC) are more accurate than the Akaike information criterion (AIC) in terms of model selection when the true model is simple (i.e. Poisson regression (POI)). For more complex models, such as ZIP and ZKIP, the AIC was consistently better than the BIC and CAIC, although it did not reach high levels of accuracy when sample size and the proportion of zero observations were small. The AIC tended to over-fit the data for the POI, whereas the BIC and CAIC tended to under-parameterize the data for ZIP and ZKIP. Therefore, it is desirable to study other model selection criteria for dual-inflated data with small sample size.  相似文献   

13.
ABSTRACT

Nonhomogeneous Poisson processes (NHPP) provide many models for hardware and software reliability analysis. In order to get an appropriate NHPP model, goodness-of-Fit (GOF for short) tests have to be carried out. For the power-law processes, lots of GOF tests have been developed. For other NHPP models, only the Conditional Probability Integral Transformation (CPIT) test has been proposed. However, the CPIT test is less powerful and cannot be applied to some NHPP models. This article proposes a general GOF test based on the Laplace statistic for a large class of NHPP models with intensity functions of the form αλ(t, β). The simulation results show that this test is more powerful than CPIT test.  相似文献   

14.
Nonparametric regression models are often used to check or suggest a parametric model. Several methods have been proposed to test the hypothesis of a parametric regression function against an alternative smoothing spline model. Some tests such as the locally most powerful (LMP) test by Cox et al. (Cox, D., Koh, E., Wahba, G. and Yandell, B. (1988). Testing the (parametric) null model hypothesis in (semiparametric) partial and generalized spline models. Ann. Stat., 16, 113–119.), the generalized maximum likelihood (GML) ratio test and the generalized cross validation (GCV) test by Wahba (Wahba, G. (1990). Spline models for observational data. CBMS-NSF Regional Conference Series in Applied Mathematics, SIAM.) were developed from the corresponding Bayesian models. Their frequentist properties have not been studied. We conduct simulations to evaluate and compare finite sample performances. Simulation results show that the performances of these tests depend on the shape of the true function. The LMP and GML tests are more powerful for low frequency functions while the GCV test is more powerful for high frequency functions. For all test statistics, distributions under the null hypothesis are complicated. Computationally intensive Monte Carlo methods can be used to calculate null distributions. We also propose approximations to these null distributions and evaluate their performances by simulations.  相似文献   

15.
ABSTRACT

This article discusses two asymmetrization methods, Azzalini's representation and beta generation, to generate asymmetric bimodal models including two novel beta-generated models. The practical utility of these models is assessed with nine data sets from different fields of applied sciences. Besides this tutorial assessment, some methodological contributions are made: a random number generator for the asymmetric Rathie–Swamee model is developed (generators for the other models are already known and briefly described) and a new likelihood ratio test of unimodality is compared via simulations with other available tests. Several tools have been used to quantify and test for bimodality and assess goodness of fit including Bayesian information criterion, measures of agreement with the empirical distribution and the Kolmogorov–Smirnoff test. In the nine case studies, the results favoured models derived from Azzalini's asymmetrization, but no single model provided a best fit across the applications considered. In only two cases the normal mixture was selected as best model. Parameter estimation has been done by likelihood maximization. Numerical optimization must be performed with care since local optima are often present. We concluded that the models considered are flexible enough to fit different bimodal shapes and that the tools studied should be used with care and attention to detail.  相似文献   

16.
ABSTRACT

The clinical trials are usually designed with the implicit assumption that data analysis will occur only after the trial is completed. It is a challenging problem if the sponsor wishes to evaluate the drug efficacy in the middle of the study without breaking the randomization codes. In this article, the randomized response model and mixture model are introduced to analyze the data, masking the randomization codes of the crossover design. Given the probability of treatment sequence, the test of mixture model provides higher power than the test of randomized response model, which is inadequate in the example. The paired t-test has higher powers than both models if the randomization codes are broken. The sponsor may stop the trial early to claim the effectiveness of the study drug if the mixture model concludes a positive result.  相似文献   

17.
ABSTRACT

In some applications, the quality of a process or product is best characterized by a functional relationship between a response variable and one or more explanatory variables. Profile monitoring is used to understand and to check the stability of this relationship or curve over time. In the existing simple linear regression profile models, it is often assumed that the data follow a single mode distribution and consequently the noise of the functional relationship follows a normal distribution. However, in some applications, it is likely that the data may follow a multiple-modes distribution. In this case, it is more appropriate to assume that the data follow a mixture profile. In this study, we focus on a mixture simple linear profile model, and propose new control schemes for Phase II monitoring. The proposed methods are shown to have good performance in a simulation study.  相似文献   

18.

Influence diagnostics are investigated in this study. In particular, an approach based on the generalized linear mixed model setting is presented for formulating ordered categorical counts in stratified contingency tables. Deletion diagnostics and their first-order approximations are developed for assessing the stratum-specific influence on parameter estimates in the models. To illustrate the proposed model diagnostic technique, the method is applied to analyze two sets of data: a clinical trial and a survey study. The two examples demonstrate that the presence of influential strata may substantially change the results in ordinal contingency table analysis.  相似文献   

19.
ABSTRACT

For many years, detection of clusters has been of great public health interest and widely studied. Several methods have been developed to detect clusters and their performance has been evaluated in various contexts. Spatial scan statistics are widely used for geographical cluster detection and inference. Different types of discrete or continuous data can be analyzed using spatial scan statistics for Bernoulli, Poisson, ordinal, exponential, and normal models. In this paper, we propose a scan statistic for survival data which is based on generalized life distribution model that provides three important life distributions, viz. Weibull, exponential, and Rayleigh. The proposed method is applied to the survival data of tuberculosis patients in Nainital district of Uttarakhand, India, for the year 2004–05. The Monte Carlo simulation studies reveal that the proposed method performs well for different survival distributions.  相似文献   

20.
The development of randomized response models for personal interview surveys has attracted much attention since the pioneering work of Warner [1965. Randomized response: a survey technique for eliminating evasive answer bias. J. Amer. Statist. Assoc. 60, 63–69]. Several randomized response models have been developed by researchers for collecting data on both qualitative and the quantitative variables, but none of these models discuss matched pair data. In this paper, we develop a new randomized response model and study its application to an important political question.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号