首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Models that involve an outcome variable, covariates, and latent variables are frequently the target for estimation and inference. The presence of missing covariate or outcome data presents a challenge, particularly when missingness depends on the latent variables. This missingness mechanism is called latent ignorable or latent missing at random and is a generalisation of missing at random. Several authors have previously proposed approaches for handling latent ignorable missingness, but these methods rely on prior specification of the joint distribution for the complete data. In practice, specifying the joint distribution can be difficult and/or restrictive. We develop a novel sequential imputation procedure for imputing covariate and outcome data for models with latent variables under latent ignorable missingness. The proposed method does not require a joint model; rather, we use results under a joint model to inform imputation with less restrictive modelling assumptions. We discuss identifiability and convergence‐related issues, and simulation results are presented in several modelling settings. The method is motivated and illustrated by a study of head and neck cancer recurrence. Imputing missing data for models with latent variables under latent‐dependent missingness without specifying a full joint model.  相似文献   

2.
Sarjinder Singh 《Statistics》2013,47(5):499-511
In this paper, an alternative estimator of population mean in the presence of non-response has been suggested which comes in the form of Walsh's estimator. The estimator of mean obtained from the proposed technique remains better than the estimators obtained from ratio or mean methods of imputation. The mean-squared error (MSE) of the resultant estimator is less than that of the estimator obtained on the basis of ratio method of imputation for the optimum choice of parameters. An estimator for estimating a parameter involved in the process of new method of imputation has been discussed. A suggestion to form ‘warm deck’ method of imputation has been suggested. The MSE expressions for the proposed estimators have been derived analytically and compared empirically. The work has been extended to the case of multi-auxiliary information to be used for imputation. Numerical illustrations are also provided.  相似文献   

3.
The multiple imputation technique has proven to be a useful tool in missing data analysis. We propose a Markov chain Monte Carlo method to conduct multiple imputation for incomplete correlated ordinal data using the multivariate probit model. We conduct a thorough simulation study to compare the performance of our proposed method with two available imputation methods – multivariate normal-based and chain equation methods for various missing data scenarios. For illustration, we present an application using the data from the smoking cessation treatment study for low-income community corrections smokers.  相似文献   

4.
ABSTRACT

We propose a multiple imputation method based on principal component analysis (PCA) to deal with incomplete continuous data. To reflect the uncertainty of the parameters from one imputation to the next, we use a Bayesian treatment of the PCA model. Using a simulation study and real data sets, the method is compared to two classical approaches: multiple imputation based on joint modelling and on fully conditional modelling. Contrary to the others, the proposed method can be easily used on data sets where the number of individuals is less than the number of variables and when the variables are highly correlated. In addition, it provides unbiased point estimates of quantities of interest, such as an expectation, a regression coefficient or a correlation coefficient, with a smaller mean squared error. Furthermore, the widths of the confidence intervals built for the quantities of interest are often smaller whilst ensuring a valid coverage.  相似文献   

5.
This paper deals with an important problem with large and complex Bayesian networks. Exact inference in these networks is simply not feasible owing to the huge storage requirements of exact methods. Markov chain Monte Carlo methods, however, are able to deal with these large networks but to do this they require an initial legal configuration to set off the sampler. So far nondeterministic methods such as forward sampling have often been used for this, even though the forward sampler may take an eternity to come up with a legal configuration. In this paper a novel algorithm will be presented that allows a legal configuration in a general Bayesian network to be found in polynomial time in almost all cases. The algorithm will not be proved deterministic but empirical results will demonstrate that this holds in most cases. Also, the algorithm will be justified by its simplicity and ease of implementation.  相似文献   

6.
The purpose of this study is to highlight dangerous motorways via estimating the intensity of accidents and study its pattern across the UK motorway network. Two methods have been developed to achieve this aim. First, the motorway-specific intensity is estimated by using a homogeneous Poisson process. The heterogeneity across motorways is incorporated using two-level hierarchical models. The data structure is multilevel since each motorway consists of junctions that are joined by grouped segments. In the second method, the segment-specific intensity is estimated. The homogeneous Poisson process is used to model accident data within grouped segments but heterogeneity across grouped segments is incorporated using three-level hierarchical models. A Bayesian method via Markov Chain Monte Carlo is used to estimate the unknown parameters in the models and the sensitivity to the choice of priors is assessed. The performance of the proposed models is evaluated by a simulation study and an application to traffic accidents in 2016 on the UK motorway network. The deviance information criterion (DIC) and the widely applicable information criterion (WAIC) are employed to choose between models.  相似文献   

7.
A Bayesian network (BN) is a probabilistic graphical model that represents a set of variables and their probabilistic dependencies. Formally, BNs are directed acyclic graphs whose nodes represent variables, and whose arcs encode the conditional dependencies among the variables. Nodes can represent any kind of variable, be it a measured parameter, a latent variable, or a hypothesis. They are not restricted to represent random variables, which form the “Bayesian” aspect of a BN. Efficient algorithms exist that perform inference and learning in BNs. BNs that model sequences of variables are called dynamic BNs. In this context, [A. Harel, R. Kenett, and F. Ruggeri, Modeling web usability diagnostics on the basis of usage statistics, in Statistical Methods in eCommerce Research, W. Jank and G. Shmueli, eds., Wiley, 2008] provide a comparison between Markov Chains and BNs in the analysis of web usability from e-commerce data. A comparison of regression models, structural equation models, and BNs is presented in Anderson et al. [R.D. Anderson, R.D. Mackoy, V.B. Thompson, and G. Harrell, A bayesian network estimation of the service–profit Chain for transport service satisfaction, Decision Sciences 35(4), (2004), pp. 665–689]. In this article we apply BNs to the analysis of customer satisfaction surveys and demonstrate the potential of the approach. In particular, BNs offer advantages in implementing models of cause and effect over other statistical techniques designed primarily for testing hypotheses. Other advantages include the ability to conduct probabilistic inference for prediction and diagnostic purposes with an output that can be intuitively understood by managers.  相似文献   

8.
Kontkanen  P.  Myllymäki  P.  Silander  T.  Tirri  H.  Grünwald  P. 《Statistics and Computing》2000,10(1):39-54
In this paper we are interested in discrete prediction problems for a decision-theoretic setting, where the task is to compute the predictive distribution for a finite set of possible alternatives. This question is first addressed in a general Bayesian framework, where we consider a set of probability distributions defined by some parametric model class. Given a prior distribution on the model parameters and a set of sample data, one possible approach for determining a predictive distribution is to fix the parameters to the instantiation with the maximum a posteriori probability. A more accurate predictive distribution can be obtained by computing the evidence (marginal likelihood), i.e., the integral over all the individual parameter instantiations. As an alternative to these two approaches, we demonstrate how to use Rissanen's new definition of stochastic complexity for determining predictive distributions, and show how the evidence predictive distribution with Jeffrey's prior approaches the new stochastic complexity predictive distribution in the limit with increasing amount of sample data. To compare the alternative approaches in practice, each of the predictive distributions discussed is instantiated in the Bayesian network model family case. In particular, to determine Jeffrey's prior for this model family, we show how to compute the (expected) Fisher information matrix for a fixed but arbitrary Bayesian network structure. In the empirical part of the paper the predictive distributions are compared by using the simple tree-structured Naive Bayes model, which is used in the experiments for computational reasons. The experimentation with several public domain classification datasets suggest that the evidence approach produces the most accurate predictions in the log-score sense. The evidence-based methods are also quite robust in the sense that they predict surprisingly well even when only a small fraction of the full training set is used.  相似文献   

9.
Inference in hybrid Bayesian networks using dynamic discretization   总被引:1,自引:0,他引:1  
We consider approximate inference in hybrid Bayesian Networks (BNs) and present a new iterative algorithm that efficiently combines dynamic discretization with robust propagation algorithms on junction trees. Our approach offers a significant extension to Bayesian Network theory and practice by offering a flexible way of modeling continuous nodes in BNs conditioned on complex configurations of evidence and intermixed with discrete nodes as both parents and children of continuous nodes. Our algorithm is implemented in a commercial Bayesian Network software package, AgenaRisk, which allows model construction and testing to be carried out easily. The results from the empirical trials clearly show how our software can deal effectively with different type of hybrid models containing elements of expert judgment as well as statistical inference. In particular, we show how the rapid convergence of the algorithm towards zones of high probability density, make robust inference analysis possible even in situations where, due to the lack of information in both prior and data, robust sampling becomes unfeasible.  相似文献   

10.
Mixtures of truncated exponentials (MTE) potentials are an alternative to discretization and Monte Carlo methods for solving hybrid Bayesian networks. Any probability density function (PDF) can be approximated by an MTE potential, which can always be marginalized in closed form. This allows propagation to be done exactly using the Shenoy-Shafer architecture for computing marginals, with no restrictions on the construction of a join tree. This paper presents MTE potentials that approximate standard PDF’s and applications of these potentials for solving inference problems in hybrid Bayesian networks. These approximations will extend the types of inference problems that can be modelled with Bayesian networks, as demonstrated using three examples.  相似文献   

11.
Survival data obtained from prevalent cohort study designs are often subject to length-biased sampling. Frequentist methods including estimating equation approaches, as well as full likelihood methods, are available for assessing covariate effects on survival from such data. Bayesian methods allow a perspective of probability interpretation for the parameters of interest, and may easily provide the predictive distribution for future observations while incorporating weak prior knowledge on the baseline hazard function. There is lack of Bayesian methods for analyzing length-biased data. In this paper, we propose Bayesian methods for analyzing length-biased data under a proportional hazards model. The prior distribution for the cumulative hazard function is specified semiparametrically using I-Splines. Bayesian conditional and full likelihood approaches are developed for analyzing simulated and real data.  相似文献   

12.
Sex-related homicides tend to arouse wide media coverage and thus raise the urgency to find the responsible offender. However, due to the low frequency of such crimes, domain knowledge lacks completeness. We have therefore accumulated a large data-set and apply several structural learning algorithms to the data in order to combine their results into a single general graphic model. The graphical model broadly presents a distinction between an offender and a situation-driven crime. A situation-driven crime may be characterised by, amongst others, an offender lacking preparation and typically attacking a known victim in familiar surroundings. On the other hand, offender-driven crimes may be identified by the high level of forensic awareness demonstrated by the offender and the sophisticated measures applied to control the victim. The prediction performance of the graphical model is evaluated via a model averaging approach on the outcome variable offender's age. The combined graph undercuts the error rate of the single algorithms and an appropriate threshold results in an error rate of less than 10%, which describes a promising level for an actual implementation by the police.  相似文献   

13.
In this paper, a multivariate Bayesian variable sampling interval (VSI) control chart for the economic design and optimization of statistical parameters is designed. Based on the VSI sampling strategy of a multivariate Bayesian control chart with dual control limits, the optimal expected cost function is constructed. The proposed model allows the determination of the scheme parameters that minimize the expected cost per time of the process. The effectiveness of the Bayesian VSI chart is estimated through economic comparisons with the Bayesian fixed sampling interval and the Hotelling's T2 chart. This study is an in-depth study on a Bayesian multivariate control chart with variable parameter. Furthermore, it is shown that significant cost improvement may be realized through the new model.  相似文献   

14.
Using reinforced processes related to beta-Stacy process and generalized Pólya urn scheme jointly with a structure assumption about dependence, a Bayesian nonparametric prior and a predictive estimator for a multivariate survival function are provided. This estimator can be computed through an easy implementation of a Gibbs sampler algorithm. Moreover consistency of the estimator is studied.  相似文献   

15.
Bayesian network (BN) is an efficient graphical method that uses directed acyclic graphs (DAG) to provide information about a set of data. BNs consist of nodes and arcs (or edges) where nodes represent variables and arcs represent relations and influences between nodes. Interest in organic food has been increasing in the world during the last decade. The same trend is also valid in Turkey. Although there are numerous studies that deal with customer perception of organic food and customer characteristics, none of them used BNs. Thus, this study, which shows a new application area of BNs, aims to reveal the perception and characteristics of organic food buyers. In this work, a survey is designed and applied in seven different organic bazaars in Turkey. Afterwards, BNs are constructed with the data gathered from 611 organic food consumers. The findings match with the previous studies as factors such as health, environmental factors, food availability, product price, consumers' income and trust to organization are found to influence consumers effectively.  相似文献   

16.
It is well known that if a multivariate outlier has one or more missing component values, then multiple imputation (MI) methods tend to impute nonextreme values and make the outlier become less extreme and less likely to be detected. In this paper, nonparametric depth-based multivariate outlier identifiers are used as criteria in a numerical study comparing several established methods of MI as well as a new proposed one, nine in all, in a setting of several actual clinical laboratory data sets of different dimensions. Two criteria, an ‘outlier recovery probability’ and a ‘relative accuracy measure’, are developed, based on depth functions. Three outlier identifiers, based on Mahalanobis distance, robust Mahalanobis distance, and generalized principle component analysis are also included in the study. Consequently, not only the comparison of imputation methods but also the comparison of outlier detection methods is accomplished in this study. Our findings show that the performance of an MI method depends on the choice of depth-based outlier detection criterion, as well as the size and dimension of the data and the fraction of missing components. By taking these features into account, an MI method for a given data set can be selected more optimally.  相似文献   

17.
This article seeks to measure deprivation among Portuguese households, taking into account four well-being dimensions – housing, durable goods, economic strain and social relationships – with survey data from the European Community Household Panel. We propose a multi-stage approach to a cross-sectional analysis, side-stepping the sparse nature of the contingency tables caused by the large number of variables considered and bringing together partial and overall analyses of deprivation that are based on Bayesian latent class models via Markov Chain Monte Carlo methods. The outcomes demonstrate that there was a substantial improvement on household overall well-being between 1995 and 2001. The dimensions that most contributed to the risk of household deprivation were found to be economic strain and social relationships.  相似文献   

18.
In longitudinal studies, nonlinear mixed-effects models have been widely applied to describe the intra- and the inter-subject variations in data. The inter-subject variation usually receives great attention and it may be partially explained by time-dependent covariates. However, some covariates may be measured with substantial errors and may contain missing values. We proposed a multiple imputation method, implemented by a Markov Chain Monte-Carlo method along with Gibbs sampler, to address the covariate measurement errors and missing data in nonlinear mixed-effects models. The multiple imputation method is illustrated in a real data example. Simulation studies show that the multiple imputation method outperforms the commonly used naive methods.  相似文献   

19.
Multiple imputation (MI) is an increasingly popular method for analysing incomplete multivariate data sets. One of the most crucial assumptions of this method relates to mechanism leading to missing data. Distinctness is typically assumed, which indicates a complete independence of mechanisms underlying missingness and data generation. In addition, missing at random or missing completely at random is assumed, which explicitly states under which conditions missingness is independent of observed data. Despite common use of MI under these assumptions, plausibility and sensitivity to these fundamental assumptions have not been well-investigated. In this work, we investigate the impact of non-distinctness and non-ignorability. In particular, non-ignorability is due to unobservable cluster-specific effects (e.g. random-effects). Through a comprehensive simulation study, we show that MI inferences suggest that nonignoriability due to non-distinctness do not immediately imply dismal performance while non-ignorability due to missing not at random leads to quite subpar performance.  相似文献   

20.
There is an increasing amount of literature focused on Bayesian computational methods to address problems with intractable likelihood. One approach is a set of algorithms known as Approximate Bayesian Computational (ABC) methods. One of the problems with these algorithms is that their performance depends on the appropriate choice of summary statistics, distance measure and tolerance level. To circumvent this problem, an alternative method based on the empirical likelihood has been introduced. This method can be easily implemented when a set of constraints, related to the moments of the distribution, is specified. However, the choice of the constraints is sometimes challenging. To overcome this difficulty, we propose an alternative method based on a bootstrap likelihood approach. The method is easy to implement and in some cases is actually faster than the other approaches considered. We illustrate the performance of our algorithm with examples from population genetics, time series and stochastic differential equations. We also test the method on a real dataset.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号