首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Among the goals of statistical matching, a very important one is the estimation of the joint distribution of variables not jointly observed in a sample survey but separately available from independent sample surveys. The absence of joint information on the variables of interest leads to uncertainty about the data generating model since the available sample information is unable to discriminate among a set of plausible joint distributions. In the present paper a short review of the concept of uncertainty in statistical matching under logical constraints, as well as how to measure uncertainty for continuous variables is presented. The notion of matching error is related to an appropriate measure of uncertainty and a criterion of selecting matching variables by choosing the variables minimizing such an uncertainty measure is introduced. Finally, a method to choose a plausible joint distribution for the variables of interest via iterative proportional fitting algorithm is described. The proposed methodology is then applied to household income and expenditure data when extra sample information regarding the average propensity to consume is available. This leads to a reconstructed complete dataset where each record includes measures on income and expenditure.  相似文献   

2.
The Fisher distribution is frequently used as a model for the probability distribution of directional data, which may be specified either in terms of unit vectors or angular co-ordinates (co-latitude and azimuth). If, in practical situations, only the co-latitudes can be observed, the available data must be regarded as a sample from the corresponding marginal distribution. This paper discusses the estimation by Maximum Likelihood (ML) and the Method of Moments of the two parameters of this marginal Fisher distribution. The moment estimators are generally simpler to compute than the ML estimators, and have high asymptotic efficiency.  相似文献   

3.
An approach for the multiple response robust parameter design problem based on a methodology by Peterson (2000) is presented. The approach is Bayesian, and consists of maximizing the posterior predictive probability that the process satisfies a set of constraints on the responses. In order to find a solution robust to variation in the noise variables, the predictive density is integrated not only with respect to the response variables but also with respect to the assumed distribution of the noise variables. The maximization problem involves repeated Monte Carlo integrations, and two different methods to solve it are evaluated. A Matlab code was written that rapidly finds an optimal (robust) solution in case it exists. Two examples taken from the literature are used to illustrate the proposed method.  相似文献   

4.
Abstract

A key question for understanding the cross-section of expected returns of equities is the following: which factors, from a given collection of factors, are risk factors, equivalently, which factors are in the stochastic discount factor (SDF)? Though the SDF is unobserved, assumptions about which factors (from the available set of factors) are in the SDF restricts the joint distribution of factors in specific ways, as a consequence of the economic theory of asset pricing. A different starting collection of factors that go into the SDF leads to a different set of restrictions on the joint distribution of factors. The conditional distribution of equity returns has the same restricted form, regardless of what is assumed about the factors in the SDF, as long as the factors are traded, and hence the distribution of asset returns is irrelevant for isolating the risk-factors. The restricted factors models are distinct (nonnested) and do not arise by omitting or including a variable from a full model, thus precluding analysis by standard statistical variable selection methods, such as those based on the lasso and its variants. Instead, we develop what we call a Bayesian model scan strategy in which each factor is allowed to enter or not enter the SDF and the resulting restricted models (of which there are 114,674 in our empirical study) are simultaneously confronted with the data. We use a Student-t distribution for the factors, and model-specific independent Student-t distribution for the location parameters, a training sample to fix prior locations, and a creative way to arrive at the joint distribution of several other model-specific parameters from a single prior distribution. This allows our method to be essentially a scaleable and tuned-black-box method that can be applied across our large model space with little to no user-intervention. The model marginal likelihoods, and implied posterior model probabilities, are compared with the prior probability of 1/114,674 of each model to find the best-supported model, and thus the factors most likely to be in the SDF. We provide detailed simulation evidence about the high finite-sample accuracy of the method. Our empirical study with 13 leading factors reveals that the highest marginal likelihood model is a Student-t distributed factor model with 5 degrees of freedom and 8 risk factors.  相似文献   

5.
Consider a sequence of independent observations which change their marginal distribution at most once somewhere in the sequence and one is not certain where the change has occurred. One would be interested in detecting the change and determining the two distributions which would describe the sequence. On the other hand if no change had occurred, one would want to know the common distribution of the observations. This study develops a Bayesian test for detecting a switch from one linear model to another. The test is based on the marginal posterior mass function of the switch point and the posterior probability of a stable model. This test and an informal sequential procedure of Smith are illustrated with data generated from an unstable linear regression model, which changes the linear relationship between the dependent and independent variables  相似文献   

6.
Many statistical agencies, survey organizations, and research centers collect data that suffer from item nonresponse and erroneous or inconsistent values. These data may be required to satisfy linear constraints, for example, bounds on individual variables and inequalities for ratios or sums of variables. Often these constraints are designed to identify faulty values, which then are blanked and imputed. The data also may exhibit complex distributional features, including nonlinear relationships and highly nonnormal distributions. We present a fully Bayesian, joint model for modeling or imputing data with missing/blanked values under linear constraints that (i) automatically incorporates the constraints in inferences and imputations, and (ii) uses a flexible Dirichlet process mixture of multivariate normal distributions to reflect complex distributional features. Our strategy for estimation is to augment the observed data with draws from a hypothetical population in which the constraints are not present, thereby taking advantage of computationally expedient methods for fitting mixture models. Missing/blanked items are sampled from their posterior distribution using the Hit-and-Run sampler, which guarantees that all imputations satisfy the constraints. We illustrate the approach using manufacturing data from Colombia, examining the potential to preserve joint distributions and a regression from the plant productivity literature. Supplementary materials for this article are available online.  相似文献   

7.
A method for inducing a desired rank correlation matrix on multivariate input vectors for simulation studies has recently been developed by Iman and Conover (1982). The primary intention of this procedure is to produce correlated input variables for use with computer models. Since this procedure is distribution free and allows the exact marginal distributions to remain intact it can be used with any marginal distributions for which it is reasonable to think in terms of correlation. In this paper we present a series of rank correlation plots based on this procedure when the marginal distributions are normal, lognormal, uniform and loguniform. These plots provide a convenient tool both for aiding the modeler in determining the degree of dependence among input variables (rather than guessing) and for communicating with the modeler the effect of different correlation assumptions. In addition this procedure can be used with sample multivariate data by sampling directly from the respective marginal empirical distribution functions.  相似文献   

8.
In this article, we apply the Bayesian approach to the linear mixed effect models with autoregressive(p) random errors under mixture priors obtained with the Markov chain Monte Carlo (MCMC) method. The mixture structure of a point mass and continuous distribution can help to select the variables in fixed and random effects models from the posterior sample generated using the MCMC method. Bayesian prediction of future observations is also one of the major concerns. To get the best model, we consider the commonly used highest posterior probability model and the median posterior probability model. As a result, both criteria tend to be needed to choose the best model from the entire simulation study. In terms of predictive accuracy, a real example confirms that the proposed method provides accurate results.  相似文献   

9.
Summary In the log-linear model for bivariate probability functions the conditional and joint probabilities have a simple form. This property make the log-linear parametrization useful when modeling these probabilities is the focus of the investigation. On the contrary, in the log-linear representation of bivariate probability functions, the marginal probabilities have a complex form. So the log-linear models are not useful when the marginal probabilities are of particular interest. In this paper the previous statements are discussed and a model obtained from the log-linear one by imposing suitable constraints on the marginal probabilities is introduced. This work was supported by a M.U.R.S.T. grant.  相似文献   

10.
This article uses a comprehensive model of economic inequality to examine the impact of relative price changes on inequality in the marginal distributions of various income components in which the marginal distributions are derived from a multidimensional joint distribution. The multidimensional joint distribution function is assumed to be a member of the Pearson Type VI family; that is, it is assumed to be a beta distribution of the second kind. The multidimensional joint distribution is so called because it is a joint distribution of components of income and expenditures on various commodity groups. Gini measures of inequality are devised from the marginal distributions of the various income components. The inequality measures are shown to depend on the parameters of the multidimensional joint distribution. It is then shown that the parameters of the multidimensional joint distribution depend on the relative prices of various commodity groups and several other specified exogenous variables. Thus, knowledge of how changes in relative prices affect the parameters of the multidimensional joint distribution is deductively equivalent to knowledge of how changes in relative prices affect inequality in the marginal distributions of various components of income. It is found that relative price changes have a statistically significant impact on inequality in various components of income.  相似文献   

11.
ABSTRACT

A common Bayesian hierarchical model is where high-dimensional observed data depend on high-dimensional latent variables that, in turn, depend on relatively few hyperparameters. When the full conditional distribution over latent variables has a known form, general MCMC sampling need only be performed on the low-dimensional marginal posterior distribution over hyperparameters. This improves on popular Gibbs sampling that computes over the full space. Sampling the marginal posterior over hyperparameters exhibits good scaling of compute cost with data size, particularly when that distribution depends on a low-dimensional sufficient statistic.  相似文献   

12.
Partial specification of a prior distribution can be appealing to an analyst, but there is no conventional way to update a partial prior. In this paper, we show how a framework for Bayesian updating with data can be based on the Dirichlet(a) process. Within this framework, partial information predictors generalize standard minimax predictors and have interesting multiple-point shrinkage properties. Approximations to partial-information estimators for squared error loss are defined straightforwardly, and an estimate of the mean shrinks the sample mean. The proposed updating of the partial prior is a consequence of four natural requirements when the Dirichlet parameter a is continuous. Namely, the updated partial posterior should be calculable from knowledge of only the data and partial prior, it should be faithful to the full posterior distribution, it should assign positive probability to every observed event {X,}, and it should not assign probability to unobserved events not included in the partial prior specification.  相似文献   

13.
Generalized exponential distribution has been used quite effectively to model positively skewed lifetime data as an alternative to the well known Weibull or gamma distributions. In this paper we introduce an absolute continuous bivariate generalized exponential distribution by using a simple transformation from a well known bivariate exchangeable distribution. The marginal distributions of the proposed bivariate generalized exponential distributions are generalized exponential distributions. The joint probability density function and the joint cumulative distribution function can be expressed in closed forms. It is observed that the proposed bivariate distribution can be obtained using Clayton copula with generalized exponential distribution as marginals. We derive different properties of this new distribution. It is a five-parameter distribution, and the maximum likelihood estimators of the unknown parameters cannot be obtained in closed forms. We propose some alternative estimators, which can be obtained quite easily, and they can be used as initial guesses to compute the maximum likelihood estimates. One data set has been analyzed for illustrative purposes. Finally we propose some generalization of the proposed model.  相似文献   

14.
Consider a population of individuals who are free of a disease under study, and who are exposed simultaneously at random exposure levels, say X,Y,Z,… to several risk factors which are suspected to cause the disease in the populationm. At any specified levels X=x, Y=y, Z=z, …, the incidence rate of the disease in the population ot risk is given by the exposure–response relationship r(x,y,z,…) = P(disease|x,y,z,…). The present paper examines the relationship between the joint distribution of the exposure variables X,Y,Z, … in the population at risk and the joint distribution of the exposure variables U,V,W,… among cases under the linear and the exponential risk models. It is proven that under the exponential risk model, these two joint distributions belong to the same family of multivariate probability distributions, possibly with different parameters values. For example, if the exposure variables in the population at risk have jointly a multivariate normal distribution, so do the exposure variables among cases; if the former variables have jointly a multinomial distribution, so do the latter. More generally, it is demonstrated that if the joint distribution of the exposure variables in the population at risk belongs to the exponential family of multivariate probability distributions, so does the joint distribution of exposure variables among cases. If the epidemiologist can specify the differnce among the mean exposure levels in the case and control groups which are considered to be clinically or etiologically important in the study, the results of the present paper may be used to make sample size determinations for the case–control study, corresponding to specified protection levels, i.e., size α and 1–β of a statistical test. The multivariate normal, the multinomial, the negative multinomial and Fisher's multivariate logarithmic series exposure distributions are used to illustrate our results.  相似文献   

15.
The paper proposes a new disclosure limitation procedure based on simulation. The key feature of the proposal is to protect actual microdata by drawing artificial units from a probability model, that is estimated from the observed data. Such a model is designed to maintain selected characteristics of the empirical distribution, thus providing a partial representation of the latter. The characteristics we focus on are the expected values of a set of functions; these are constrained to be equal to their corresponding sample averages; the simulated data, then, reproduce on average the sample characteristics. If the set of constraints covers the parameters of interest of a user, information loss is controlled for, while, as the model does not preserve individual values, re-identification attempts are impaired-synthetic individuals correspond to actual respondents with very low probability.Disclosure is mainly discussed from the viewpoint of record re-identification. According to this definition, as the pledge for confidentiality only involves the actual respondents, release of synthetic units should in principle rule out the concern for confidentiality.The simulation model is built on the Italian sample from the Community Innovation Survey (CIS). The approach can be applied in more generality, and especially suits quantitative traits. The model has a semi-parametric component, based on the maximum entropy principle, and, here, a parametric component, based on regression. The maximum entropy principle is exploited to match data traits; moreover, entropy measures uncertainty of a distribution: its maximisation leads to a distribution which is consistent with the given information but is maximally noncommittal with regard to missing information.Application results reveal that the fixed characteristics are sustained, and other features such as marginal distributions are well represented. Model specification is clearly a major point; related issues are selection of characteristics, goodness of fit and strength of dependence relations.  相似文献   

16.
The structural approach of inference for the parameters of a simultaneous equation model with heteroscedastic error variance is investigated in this paper. The joint and the marginal structural distributions for the coefficients of the exogenous variables and the scale parameters of the error variables, and the marginal likelihood function of the coefficients of the endogenous variables have been derived. The estimates are directly obtainable from the structural distribution and the marginal likelihood function of the parameters. The marginal distribution of a subset of coefficients of exogenous variables provides the basis for making inference for a particular subset of parameter of interest.  相似文献   

17.
It is well known that the Curse of Dimensionality causes the standard Kernel Density Estimator to break down quickly as the number of variables increases. In non-parametric regression, this effect is relieved in various ways, for example by assuming additivity or some other simplifying structure on the interaction between variables. This paper presents the Locally Gaussian Density Estimator (LGDE), which introduces a similar idea to the problem of density estimation. The LGDE is a new method for the non-parametric estimation of multivariate probability density functions. It is based on preliminary transformations of the marginal observation vectors towards standard normality, and a simplified local likelihood fit of the resulting distribution with standard normal marginals. The LGDE is introduced, and asymptotic theory is derived. In particular, it is shown that the LGDE converges at a speed that does not depend on the dimension. Examples using real and simulated data confirm that the new estimator performs very well on finite sample sizes.  相似文献   

18.
Bayesian dynamic linear models (DLMs) are useful in time series modelling, because of the flexibility that they off er for obtaining a good forecast. They are based on a decomposition of the relevant factors which explain the behaviour of the series through a series of state parameters. Nevertheless, the DLM as developed by West and Harrison depend on additional quantities, such as the variance of the system disturbances, which, in practice, are unknown. These are referred to here as 'hyper-parameters' of the model. In this paper, DLMs with autoregressive components are used to describe time series that show cyclic behaviour. The marginal posterior distribution for state parameters can be obtained by weighting the conditional distribution of state parameters by the marginal distribution of hyper-parameters. In most cases, the joint distribution of the hyperparameters can be obtained analytically but the marginal distributions of the components cannot, so requiring numerical integration. We propose to obtain samples of the hyperparameters by a variant of the sampling importance resampling method. A few applications are shown with simulated and real data sets.  相似文献   

19.
Bayesian marginal inference via candidate's formula   总被引:2,自引:0,他引:2  
Computing marginal probabilities is an important and fundamental issue in Bayesian inference. We present a simple method which arises from a likelihood identity for computation. The likelihood identity, called Candidate's formula, sets the marginal probability as a ratio of the prior likelihood to the posterior density. Based on Markov chain Monte Carlo output simulated from the posterior distribution, a nonparametric kernel estimate is used to estimate the posterior density contained in that ratio. This derived nonparametric Candidate's estimate requires only one evaluation of the posterior density estimate at a point. The optimal point for such evaluation can be chosen to minimize the expected mean square relative error. The results show that the best point is not necessarily the posterior mode, but rather a point compromising between high density and low Hessian. For high dimensional problems, we introduce a variance reduction approach to ease the tension caused by data sparseness. A simulation study is presented.  相似文献   

20.
采用基于时变参数Copula的ΔCoVaR度量方法,以动态参数Copula模型描述金融变量间的相依结构、以GARCH类模型描述各金融变量的边际分布,通过构建的联合分布计算ΔCoVaR。利用此方法度量中国大陆与美国、香港的股票市场间的极端风险溢出。实证结果表明:通过此方法计算的ΔCoVaR能同时反映时变波动性与时变相依性,可更灵敏准确地度量危机时的极端风险溢出。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号