首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Models that involve an outcome variable, covariates, and latent variables are frequently the target for estimation and inference. The presence of missing covariate or outcome data presents a challenge, particularly when missingness depends on the latent variables. This missingness mechanism is called latent ignorable or latent missing at random and is a generalisation of missing at random. Several authors have previously proposed approaches for handling latent ignorable missingness, but these methods rely on prior specification of the joint distribution for the complete data. In practice, specifying the joint distribution can be difficult and/or restrictive. We develop a novel sequential imputation procedure for imputing covariate and outcome data for models with latent variables under latent ignorable missingness. The proposed method does not require a joint model; rather, we use results under a joint model to inform imputation with less restrictive modelling assumptions. We discuss identifiability and convergence‐related issues, and simulation results are presented in several modelling settings. The method is motivated and illustrated by a study of head and neck cancer recurrence. Imputing missing data for models with latent variables under latent‐dependent missingness without specifying a full joint model.  相似文献   

2.
Non‐random sampling is a source of bias in empirical research. It is common for the outcomes of interest (e.g. wage distribution) to be skewed in the source population. Sometimes, the outcomes are further subjected to sample selection, which is a type of missing data, resulting in partial observability. Thus, methods based on complete cases for skew data are inadequate for the analysis of such data and a general sample selection model is required. Heckman proposed a full maximum likelihood estimation method under the normality assumption for sample selection problems, and parametric and non‐parametric extensions have been proposed. We generalize Heckman selection model to allow for underlying skew‐normal distributions. Finite‐sample performance of the maximum likelihood estimator of the model is studied via simulation. Applications illustrate the strength of the model in capturing spurious skewness in bounded scores, and in modelling data where logarithm transformation could not mitigate the effect of inherent skewness in the outcome variable.  相似文献   

3.
Very often, in psychometric research, as in educational assessment, it is necessary to analyze item response from clustered respondents. The multiple group item response theory (IRT) model proposed by Bock and Zimowski [12] provides a useful framework for analyzing such type of data. In this model, the selected groups of respondents are of specific interest such that group-specific population distributions need to be defined. The usual assumption for parameter estimation in this model, which is that the latent traits are random variables following different symmetric normal distributions, has been questioned in many works found in the IRT literature. Furthermore, when this assumption does not hold, misleading inference can result. In this paper, we consider that the latent traits for each group follow different skew-normal distributions, under the centered parameterization. We named it skew multiple group IRT model. This modeling extends the works of Azevedo et al. [4], Bazán et al. [11] and Bock and Zimowski [12] (concerning the latent trait distribution). Our approach ensures that the model is identifiable. We propose and compare, concerning convergence issues, two Monte Carlo Markov Chain (MCMC) algorithms for parameter estimation. A simulation study was performed in order to evaluate parameter recovery for the proposed model and the selected algorithm concerning convergence issues. Results reveal that the proposed algorithm recovers properly all model parameters. Furthermore, we analyzed a real data set which presents asymmetry concerning the latent traits distribution. The results obtained by using our approach confirmed the presence of negative asymmetry for some latent trait distributions.  相似文献   

4.
We consider a set of data from 80 stations in the Venezuelan state of Guárico consisting of accumulated monthly rainfall in a time span of 16 years. The problem of modelling rainfall accumulated over fixed periods of time and recorded at meteorological stations at different sites is studied by using a model based on the assumption that the data follow a truncated and transformed multivariate normal distribution. The spatial correlation is modelled by using an exponentially decreasing correlation function and an interpolating surface for the means. Missing data and dry periods are handled within a Markov chain Monte Carlo framework using latent variables. We estimate the amount of rainfall as well as the probability of a dry period by using the predictive density of the data. We considered a model based on a full second-degree polynomial over the spatial co-ordinates as well as the first two Fourier harmonics to describe the variability during the year. Predictive inferences on the data show very realistic results, capturing the typical rainfall variability in time and space for that region. Important extensions of the model are also discussed.  相似文献   

5.
Emrah Altun 《Statistics》2019,53(2):364-386
In this paper, we introduce a new distribution, called generalized Gudermannian (GG) distribution, and its skew extension for GARCH models in modelling daily Value-at-Risk (VaR). Basic structural properties of the proposed distribution are obtained including probability density and cumulative distribution functions, moments, and stochastic representation. The maximum likelihood method is used to estimate unknown parameters of the proposed model and finite sample performance of maximum likelihood estimates are evaluated by means of Monte-Carlo simulation study. The real data application on Nikkei 225 index is given to demonstrate the performance of GARCH model specified under skew extension of GG innovation distribution against normal, Student's-t, skew normal and generalized error and skew generalized error distributions in terms of the accuracy of VaR forecasts. The empirical results show that the GARCH model with GG innovation distribution produces the most accurate VaR forecasts for all confidence levels.  相似文献   

6.
To model extreme spatial events, a general approach is to use the generalized extreme value (GEV) distribution with spatially varying parameters such as spatial GEV models and latent variable models. In the literature, this approach is mostly used to capture spatial dependence for only one type of event. This limits the applications to air pollutants data as different pollutants may chemically interact with each other. A recent advancement in spatial extremes modelling for multiple variables is the multivariate max-stable processes. Similarly to univariate max-stable processes, the multivariate version also assumes standard distributions such as unit-Fréchet as margins. Additional modelling is required for applications such as spatial prediction. In this paper, we extend the marginal methods such as spatial GEV models and latent variable models into a multivariate setting based on copulas so that it is capable of handling both the spatial dependence and the dependence among multiple pollutants. We apply our proposed model to analyse weekly maxima of nitrogen dioxide, sulphur dioxide, respirable suspended particles, fine suspended particles, and ozone collected in Pearl River Delta in China.  相似文献   

7.
In this paper, a joint model for analyzing multivariate mixed ordinal and continuous responses, where continuous outcomes may be skew, is presented. For modeling the discrete ordinal responses, a continuous latent variable approach is considered and for describing continuous responses, a skew-normal mixed effects model is used. A Bayesian approach using Markov Chain Monte Carlo (MCMC) is adopted for parameter estimation. Some simulation studies are performed for illustration of the proposed approach. The results of the simulation studies show that the use of the separate models or the normal distributional assumption for shared random effects and within-subject errors of continuous and ordinal variables, instead of the joint modeling under a skew-normal distribution, leads to biased parameter estimates. The approach is used for analyzing a part of the British Household Panel Survey (BHPS) data set. Annual income and life satisfaction are considered as the continuous and the ordinal longitudinal responses, respectively. The annual income variable is severely skewed, therefore, the use of the normality assumption for the continuous response does not yield acceptable results. The results of data analysis show that gender, marital status, educational levels and the amount of money spent on leisure have a significant effect on annual income, while marital status has the highest impact on life satisfaction.  相似文献   

8.
A general framework is proposed for modelling clustered mixed outcomes. A mixture of generalized linear models is used to describe the joint distribution of a set of underlying variables, and an arbitrary function relates the underlying variables to be observed outcomes. The model accommodates multilevel data structures, general covariate effects and distinct link functions and error distributions for each underlying variable. Within the framework proposed, novel models are developed for clustered multiple binary, unordered categorical and joint discrete and continuous outcomes. A Markov chain Monte Carlo sampling algorithm is described for estimating the posterior distributions of the parameters and latent variables. Because of the flexibility of the modelling framework and estimation procedure, extensions to ordered categorical outcomes and more complex data structures are straightforward. The methods are illustrated by using data from a reproductive toxicity study.  相似文献   

9.
Skew‐symmetric models offer a very flexible class of distributions for modelling data. These distributions can also be viewed as selection models for the symmetric component of the specified skew‐symmetric distribution. The estimation of the location and scale parameters corresponding to the symmetric component is considered here, with the symmetric component known. Emphasis is placed on using the empirical characteristic function to estimate these parameters. This is made possible by an invariance property of the skew‐symmetric family of distributions, namely that even transformations of random variables that are skew‐symmetric have a distribution only depending on the symmetric density. A distance metric between the real components of the empirical and true characteristic functions is minimized to obtain the estimators. The method is semiparametric, in that the symmetric component is specified, but the skewing function is assumed unknown. Furthermore, the methodology is extended to hypothesis testing. Two tests for a null hypothesis of specific parameter values are considered, as well as a test for the hypothesis that the symmetric component has a specific parametric form. A resampling algorithm is described for practical implementation of these tests. The outcomes of various numerical experiments are presented.  相似文献   

10.
This paper is concerned with Bayesian estimation of a spatial regression model with skew non-Gaussian errors. The regression parameters are estimated by using a closed skew normal (CSN) distribution, which is closed under conditioning and linear combination. The proposed model captures skewness in the response variable. Sometimes, we may encounter missing observations in the response variable, accordingly we model and predict the missing observations by a Bayesian approach using Gibbs sampling methods. Next, a simulation study is performed to asses our model validity. Also, the proposed model in this work is applied to CO data from Tehran, the capital city of Iran. Then, the accuracy of the CSN and Gaussian models is compared by cross validation criterion.  相似文献   

11.
Latent variable models are widely used for jointly modeling of mixed data including nominal, ordinal, count and continuous data. In this paper, we consider a latent variable model for jointly modeling relationships between mixed binary, count and continuous variables with some observed covariates. We assume that, given a latent variable, mixed variables of interest are independent and count and continuous variables have Poisson distribution and normal distribution, respectively. As such data may be extracted from different subpopulations, consideration of an unobserved heterogeneity has to be taken into account. A mixture distribution is considered (for the distribution of the latent variable) which accounts the heterogeneity. The generalized EM algorithm which uses the Newton–Raphson algorithm inside the EM algorithm is used to compute the maximum likelihood estimates of parameters. The standard errors of the maximum likelihood estimates are computed by using the supplemented EM algorithm. Analysis of the primary biliary cirrhosis data is presented as an application of the proposed model.  相似文献   

12.
Nonlinear mixed‐effects models are being widely used for the analysis of longitudinal data, especially from pharmaceutical research. They use random effects which are latent and unobservable variables so the random‐effects distribution is subject to misspecification in practice. In this paper, we first study the consequences of misspecifying the random‐effects distribution in nonlinear mixed‐effects models. Our study is focused on Gauss‐Hermite quadrature, which is now the routine method for calculation of the marginal likelihood in mixed models. We then present a formal diagnostic test to check the appropriateness of the assumed random‐effects distribution in nonlinear mixed‐effects models, which is very useful for real data analysis. Our findings show that the estimates of fixed‐effects parameters in nonlinear mixed‐effects models are generally robust to deviations from normality of the random‐effects distribution, but the estimates of variance components are very sensitive to the distributional assumption of random effects. Furthermore, a misspecified random‐effects distribution will either overestimate or underestimate the predictions of random effects. We illustrate the results using a real data application from an intensive pharmacokinetic study.  相似文献   

13.
Arjun K. Gupta  J. Tang 《Statistics》2013,47(4):301-309
It is well known that many data, such as the financial or demographic data, exhibit asymmetric distributions. In recent years, researchers have concentrated their efforts to model this asymmetry. Skew normal model is one of such models that are skew and yet possess many properties of the normal model. In this paper, a new multivariate skew model is proposed, along with its statistical properties. It includes the multivariate normal distribution and multivariate skew normal distribution as special cases. The quadratic form of this random vector follows a χ2 distribution. The roles of the parameters in the model are investigated using contour plots of bivariate densities.  相似文献   

14.
A correct detection of areas with excess of pollution relies first on accurate predictions of pollutant concentrations, a task that is usually complicated by skewed histograms and the presence of censored data. The unified skew-Gaussian (SUG) random field proposed by Zareifard and Jafari Khaledi [19] offers a more flexible class of sampling spatial models to account for skewness. In this paper, we adopt a Bayesian framework to perform prediction for the SUG model in the presence of censored data. Owing to the presence of many latent variables with strongly dependent components in the model, we encounter convergence issues when using Monte Carlo Markov Chain algorithms. To overcome this obstacle, we use a computationally efficient inverse Bayes formulas sampling procedure to obtain approximately independent samples from the posterior distribution of latent variables. Then they are applied to update parameters in a Gibbs sampler scheme. This hybrid algorithm provides effective samples, resulting in some computational advantages and precise predictions. The proposed approach is illustrated with a simulation study and applied to a spatial data set which contains right censored data.  相似文献   

15.
Usually in latent class (LC) analysis, external predictors are taken to be cluster conditional probability predictors (LC models with external predictors), and/or score conditional probability predictors (LC regression models). In such cases, their distribution is not of interest. Class-specific distribution is of interest in the distal outcome model, when the distribution of the external variables is assumed to depend on LC membership. In this paper, we consider a more general formulation, that embeds both the LC regression and the distal outcome models, as is typically done in cluster-weighted modelling. This allows us to investigate (1) whether the distribution of the external variables differs across classes, (2) whether there are significant direct effects of the external variables on the indicators, by modelling jointly the relationship between the external and the latent variables. We show the advantages of the proposed modelling approach through a set of artificial examples, an extensive simulation study and an empirical application about psychological contracts among employees and employers in Belgium and the Netherlands.  相似文献   

16.
We study the correlation structure for a mixture of ordinal and continuous repeated measures using a Bayesian approach. We assume a multivariate probit model for the ordinal variables and a normal linear regression for the continuous variables, where latent normal variables underlying the ordinal data are correlated with continuous variables in the model. Due to the probit model assumption, we are required to sample a covariance matrix with some of the diagonal elements equal to one. The key computational idea is to use parameter-extended data augmentation, which involves applying the Metropolis-Hastings algorithm to get a sample from the posterior distribution of the covariance matrix incorporating the relevant restrictions. The methodology is illustrated through a simulated example and through an application to data from the UCLA Brain Injury Research Center.  相似文献   

17.
For observable indicators with ordered categories one can assume underlying latent variables following certain marginal distributions. Transforming the latent variables changes its marginal distributions but not the observable qualitative indicators. The joint distribution of the latent variables can be constructed from the marginal distributions. There is a broad class of multivariate distributions for which the observable indicators are equivalent. By choosing the multivariate normal distribution from this class we can analyse a linear relationship between the transformed latent variables. This leads to latent structural equation models. Estimation of these latter models is therefore more general than the distributional assumption might initially suggest. Robustness of the estimation procedure is also discussed for deviations from this distribution family. Using ordinal business survey data of the German Ifo-institute we test the efficiency of firms' price expectations implied by the rational expectation hypothesis.  相似文献   

18.

Structural change in any time series is practically unavoidable, and thus correctly detecting breakpoints plays a pivotal role in statistical modelling. This research considers segmented autoregressive models with exogenous variables and asymmetric GARCH errors, GJR-GARCH and exponential-GARCH specifications, which utilize the leverage phenomenon to demonstrate asymmetry in response to positive and negative shocks. The proposed models incorporate skew Student-t distribution and prove the advantages of the fat-tailed skew Student-t distribution versus other distributions when structural changes appear in financial time series. We employ Bayesian Markov Chain Monte Carlo methods in order to make inferences about the locations of structural change points and model parameters and utilize deviance information criterion to determine the optimal number of breakpoints via a sequential approach. Our models can accurately detect the number and locations of structural change points in simulation studies. For real data analysis, we examine the impacts of daily gold returns and VIX on S&P 500 returns during 2007–2019. The proposed methods are able to integrate structural changes through the model parameters and to capture the variability of a financial market more efficiently.

  相似文献   

19.
Abstract. In geophysical and environmental problems, it is common to have multiple variables of interest measured at the same location and time. These multiple variables typically have dependence over space (and/or time). As a consequence, there is a growing interest in developing models for multivariate spatial processes, in particular, the cross‐covariance models. On the other hand, many data sets these days cover a large portion of the Earth such as satellite data, which require valid covariance models on a globe. We present a class of parametric covariance models for multivariate processes on a globe. The covariance models are flexible in capturing non‐stationarity in the data yet computationally feasible and require moderate numbers of parameters. We apply our covariance model to surface temperature and precipitation data from an NCAR climate model output. We compare our model to the multivariate version of the Matérn cross‐covariance function and models based on coregionalization and demonstrate the superior performance of our model in terms of AIC (and/or maximum loglikelihood values) and predictive skill. We also present some challenges in modelling the cross‐covariance structure of the temperature and precipitation data. Based on the fitted results using full data, we give the estimated cross‐correlation structure between the two variables.  相似文献   

20.
Logistic models with a random intercept are prevalent in medical and social research where clustered and longitudinal data are often collected. Traditionally, the random intercept in these models is assumed to follow some parametric distribution such as the normal distribution. However, such an assumption inevitably raises concerns about model misspecification and misleading inference conclusions, especially when there is dependence between the random intercept and model covariates. To protect against such issues, we use a semiparametric approach to develop a computationally simple and consistent estimator where the random intercept is distribution‐free. The estimator is revealed to be optimal and achieve the efficiency bound without the need to postulate or estimate any latent variable distributions. We further characterize other general mixed models where such an optimal estimator exists.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号