首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 328 毫秒
1.
Summary.  We consider three sorts of diagnostics for random imputations: displays of the completed data, which are intended to reveal unusual patterns that might suggest problems with the imputations, comparisons of the distributions of observed and imputed data values and checks of the fit of observed data to the model that is used to create the imputations. We formulate these methods in terms of sequential regression multivariate imputation, which is an iterative procedure in which the missing values of each variable are randomly imputed conditionally on all the other variables in the completed data matrix. We also consider a recalibration procedure for sequential regression imputations. We apply these methods to the 2002 environmental sustainability index, which is a linear aggregation of 64 environmental variables on 142 countries.  相似文献   

2.
Multiple imputation has emerged as a popular approach to handling data sets with missing values. For incomplete continuous variables, imputations are usually produced using multivariate normal models. However, this approach might be problematic for variables with a strong non-normal shape, as it would generate imputations incoherent with actual distributions and thus lead to incorrect inferences. For non-normal data, we consider a multivariate extension of Tukey's gh distribution/transformation [38] to accommodate skewness and/or kurtosis and capture the correlation among the variables. We propose an algorithm to fit the incomplete data with the model and generate imputations. We apply the method to a national data set for hospital performance on several standard quality measures, which are highly skewed to the left and substantially correlated with each other. We use Monte Carlo studies to assess the performance of the proposed approach. We discuss possible generalizations and give some advices to practitioners on how to handle non-normal incomplete data.  相似文献   

3.
A general framework is proposed for modelling clustered mixed outcomes. A mixture of generalized linear models is used to describe the joint distribution of a set of underlying variables, and an arbitrary function relates the underlying variables to be observed outcomes. The model accommodates multilevel data structures, general covariate effects and distinct link functions and error distributions for each underlying variable. Within the framework proposed, novel models are developed for clustered multiple binary, unordered categorical and joint discrete and continuous outcomes. A Markov chain Monte Carlo sampling algorithm is described for estimating the posterior distributions of the parameters and latent variables. Because of the flexibility of the modelling framework and estimation procedure, extensions to ordered categorical outcomes and more complex data structures are straightforward. The methods are illustrated by using data from a reproductive toxicity study.  相似文献   

4.
We consider a five-dimensional normal distribution and derive the exact joint distribution one variable, linear combinations of order statistics from two other variables, and linear combinations of the corresponding concomitants of these order statistics. We show that this joint distribution is a mixture of trivariate unified skew-normal distributions. This mixture representation enables us to predict one variable based on linear combinations of order statistics from two other variables and linear combinations of the corresponding concomitants. We finally illustrate the usefulness of these results by using a real data.  相似文献   

5.
For observable indicators with ordered categories one can assume underlying latent variables following certain marginal distributions. Transforming the latent variables changes its marginal distributions but not the observable qualitative indicators. The joint distribution of the latent variables can be constructed from the marginal distributions. There is a broad class of multivariate distributions for which the observable indicators are equivalent. By choosing the multivariate normal distribution from this class we can analyse a linear relationship between the transformed latent variables. This leads to latent structural equation models. Estimation of these latter models is therefore more general than the distributional assumption might initially suggest. Robustness of the estimation procedure is also discussed for deviations from this distribution family. Using ordinal business survey data of the German Ifo-institute we test the efficiency of firms' price expectations implied by the rational expectation hypothesis.  相似文献   

6.
Sequential regression multiple imputation has emerged as a popular approach for handling incomplete data with complex features. In this approach, imputations for each missing variable are produced based on a regression model using other variables as predictors in a cyclic manner. Normality assumption is frequently imposed for the error distributions in the conditional regression models for continuous variables, despite that it rarely holds in real scenarios. We use a simulation study to investigate the performance of several sequential regression imputation methods when the error distribution is flat or heavy tailed. The methods evaluated include the sequential normal imputation and its several extensions which adjust for non normal error terms. The results show that all methods perform well for estimating the marginal mean and proportion, as well as the regression coefficient when the error distribution is flat or moderately heavy tailed. When the error distribution is strongly heavy tailed, all methods retain their good performances for the mean and the adjusted methods have robust performances for the proportion; but all methods can have poor performances for the regression coefficient because they cannot accommodate the extreme values well. We caution against the mechanical use of sequential regression imputation without model checking and diagnostics.  相似文献   

7.
Summary.  We consider joint probability distributions generated recursively in terms of univariate conditional distributions satisfying conditional independence restrictions. The independences are captured by missing edges in a directed graph. A matrix form of such a graph, called the generating edge matrix, is triangular so the distributions that are generated over such graphs are called triangular systems. We study consequences of triangular systems after grouping or reordering of the variables for analyses as chain graph models, i.e. for alternative recursive factorizations of the given density using joint conditional distributions. For this we introduce families of linear triangular equations which do not require assumptions of distributional form. The strength of the associations that are implied by such linear families for chain graph models is derived. The edge matrices of chain graphs that are implied by any triangular system are obtained by appropriately transforming the generating edge matrix. It is shown how induced independences and dependences can be studied by graphs, by edge matrix calculations and via the properties of densities. Some ways of using the results are illustrated.  相似文献   

8.
A new method for analyzing high-dimensional categorical data, Linear Latent Structure (LLS) analysis, is presented. LLS models belong to the family of latent structure models, which are mixture distribution models constrained to satisfy the local independence assumption. LLS analysis explicitly considers a family of mixed distributions as a linear space, and LLS models are obtained by imposing linear constraints on the mixing distribution.LLS models are identifiable under modest conditions and are consistently estimable. A remarkable feature of LLS analysis is the existence of a high-performance numerical algorithm, which reduces parameter estimation to a sequence of linear algebra problems. Simulation experiments with a prototype of the algorithm demonstrated a good quality of restoration of model parameters.  相似文献   

9.
This paper presents a new method for the reconciliation of data described by arbitrary continuous probability distributions, with the focus on nonlinear constraints. The main idea, already applied to linear constraints in a previous paper, is to restrict the joint prior probability distribution of the observed variables with model constraints to get a joint posterior probability distribution. Because in general the posterior probability density function cannot be calculated analytically, it is shown that it has decisive advantages to sample from the posterior distribution by a Markov chain Monte Carlo (MCMC) method. From the resulting sample of observed and unobserved variables various characteristics of the posterior distribution can be estimated, such as the mean, the full covariance matrix, marginal posterior densities, as well as marginal moments, quantiles, and HPD intervals. The procedure is illustrated by examples from material flow analysis and chemical engineering.  相似文献   

10.
Bivariate uniform distributions with dependent components are readily derived by distribution function transformations of the components of non-uniform dependent continuous bivariate random variables (X,Y). Contour plots of joint density functions show the various, and varying, forms of dependence which can arise from different distributional forms for (X,Y) and aids the choice of bivariate uniform distributions as empirical models.  相似文献   

11.
The construction of a joint model for mixed discrete and continuous random variables that accounts for their associations is an important statistical problem in many practical applications. In this paper, we use copulas to construct a class of joint distributions of mixed discrete and continuous random variables. In particular, we employ the Gaussian copula to generate joint distributions for mixed variables. Examples include the robit-normal and probit-normal-exponential distributions, the first for modelling the distribution of mixed binary-continuous data and the second for a mixture of continuous, binary and trichotomous variables. The new class of joint distributions is general enough to include many mixed-data models currently available. We study properties of the distributions and outline likelihood estimation; a small simulation study is used to investigate the finite-sample properties of estimates obtained by full and pairwise likelihood methods. Finally, we present an application to discriminant analysis of multiple correlated binary and continuous data from a study involving advanced breast cancer patients.  相似文献   

12.
This article considers a Bayesian hierarchical model for multiple comparisons in linear models where the population medians satisfy a simple order restriction. Representing the asymmetric Laplace distribution as a scale mixture of normals with an exponential mixing density and a continuous prior restricted to order constraints, a Gibbs sampling algorithm for parameter estimation and simultaneous comparison of treatment medians is proposed. Posterior probabilities of all possible hypotheses on the equality/inequality of treatment medians are estimated using Bayes factors that are computed via the Savage-Dickey density ratios. The performance of the proposed median-based model is investigated in the simulated and real datasets. The results show that the proposed method can outperform the commonly used method that is based on treatment means, when data are from nonnormal distributions.  相似文献   

13.
To model an hypothesis of double monotone dependence between two ordinal categorical variables A and B usually a set of symmetric odds ratios defined on the joint probability function is subject to linear inequality constraints. Conversely in this paper two sets of asymmetric odds ratios defined, respectively, on the conditional distributions of A given B and on the conditional distributions of B given A are subject to linear inequality constraints. If the joint probabilities are parameterized by a saturated log-linear model, these constraints are nonlinear inequality constraints on the log-linear parameters. The problem here considered is a non-standard one both for the presence of nonlinear inequality constraints and for the fact that the number of these constraints is greater than the number of the parameters of the saturated log-linear model.This work has been supported by the COFIN 2002 project, references 2002133957_002, 2002133957_004. Preliminary findings have been presented at SIS (Società Italiana di Statistica) Annual Meeting, Bari, 2004.  相似文献   

14.
Hea-Jung Kim 《Statistics》2013,47(3):325-341
This article derives and studies several types of conditional correlations. The correlations are obtained by a class of two-piece scale mixture skew-normal distributions. The class is obtained by applying a set of nonlinear constraints to the bivariate scale mixture of normal distributions. The correlations of the class are invariant with respect to the choice of the scale mixing function, however, they are dependent upon the type of the nonlinear truncation. Moreover, their respective upper and lower limits are no longer 1.00 and?1.00. They are useful for the truncated data analysis, the multivariate interdependence methods (such as the principal component analysis and the factor analysis), and the random truncation modelling. Some distributional properties and the Bayesian computation of the correlations are considered when developing necessary theories and providing illustrative examples, respectively. Two applications are also given to demonstrate the usefulness of the conditional correlations in a multivariate analysis.  相似文献   

15.
To approximate the joint distribution of the two-colony stepping-stone model, a finite mixture approach is proposed for constructing discrete multi-variate distributions. This approach generahzes the classic method of linear combinations of independent variables. The stepping-stone model is approximated through matching known moments. Numerical examples from entomology are given. Comparisons are made with the work by Wehrly et al (1993).  相似文献   

16.
The sensitivity of multiple imputation methods to deviations from their distributional assumptions is investigated using simulations, where the parameters of scientific interest are the coefficients of a linear regression model, and values in predictor variables are missing at random. The performance of a newly proposed imputation method based on generalized additive models for location, scale, and shape (GAMLSS) is investigated. Although imputation methods based on predictive mean matching are virtually unbiased, they suffer from mild to moderate under-coverage, even in the experiment where all variables are jointly normal distributed. The GAMLSS method features better coverage than currently available methods.  相似文献   

17.
Estimates of the largest wind gust that will occur at a given location over a specified period are required by civil engineers. Estimation is usually based on models which are derived from the limiting distributions of maxima of stationary time series and which are fitted to data on extreme gusts. In this paper we develop a model for maximum gusts which also incorporates data on hourly mean speeds through a distributional relationship between maxima and means. This joint model is closely linked to the physical processes which generate the most extreme values and thus provides a mechanism by which data on means can augment those on gusts. It is argued that this increases the credibility of extrapolation in estimates of long period return gusts. The model is shown to provide a good fit to data obtained at a location in northern England and is compared with a more traditional modelling approach, which also performs well for this site.  相似文献   

18.
Models incorporating “latent” variables have been commonplace in financial, social, and behavioral sciences. Factor model, the most popular latent model, explains the continuous observed variables in a smaller set of latent variables (factors) in a matter of linear relationship. However, complex data often simultaneously display asymmetric dependence, asymptotic dependence, and positive (negative) dependence between random variables, which linearity and Gaussian distributions and many other extant distributions are not capable of modeling. This article proposes a nonlinear factor model that can model the above-mentioned variable dependence features but still possesses a simple form of factor structure. The random variables, marginally distributed as unit Fréchet distributions, are decomposed into max linear functions of underlying Fréchet idiosyncratic risks, transformed from Gaussian copula, and independent shared external Fréchet risks. By allowing the random variables to share underlying (latent) pervasive risks with random impact parameters, various dependence structures are created. This innovates a new promising technique to generate families of distributions with simple interpretations. We dive in the multivariate extreme value properties of the proposed model and investigate maximum composite likelihood methods for the impact parameters of the latent risks. The estimates are shown to be consistent. The estimation schemes are illustrated on several sets of simulated data, where comparisons of performance are addressed. We employ a bootstrap method to obtain standard errors in real data analysis. Real application to financial data reveals inherent dependencies that previous work has not disclosed and demonstrates the model’s interpretability to real data. Supplementary materials for this article are available online.  相似文献   

19.
In this article, the effects of mixtures of two normal distributions on the fraction non-conforming are studied in the context of capability analysis. When the output from several processes is mixed, the quality characteristic variables of the resulting mix may result in a normal mixture distribution. This can happen in cases such as monitoring an output from several suppliers, several machines, or several workers. This study considered the independence case and autocorrelated processes for a mixture of two normal distributions, using an autoregressive model of order one, AR(1). It is shown that the true attained process fraction non-conforming (corresponding to specific values for some capability index) can be very different from what is expected when the data are independent normal random variables.  相似文献   

20.
We consider different censoring models for a two-sample and find the joint distribution of the rank vector and number of uncensored observations under each censoring model when the distributions of life times and/or distributions of censoring of censoring variables satisfy the condition for the Lehmann type of alternatives.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号