首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Abstract

Covariance estimation and selection for multivariate datasets in a high-dimensional regime is a fundamental problem in modern statistics. Gaussian graphical models are a popular class of models used for this purpose. Current Bayesian methods for inverse covariance matrix estimation under Gaussian graphical models require the underlying graph and hence the ordering of variables to be known. However, in practice, such information on the true underlying model is often unavailable. We therefore propose a novel permutation-based Bayesian approach to tackle the unknown variable ordering issue. In particular, we utilize multiple maximum a posteriori estimates under the DAG-Wishart prior for each permutation, and subsequently construct the final estimate of the inverse covariance matrix. The proposed estimator has smaller variability and yields order-invariant property. We establish posterior convergence rates under mild assumptions and illustrate that our method outperforms existing approaches in estimating the inverse covariance matrices via simulation studies.  相似文献   

2.
Abstract

In this article, we study the variable selection and estimation for linear regression models with missing covariates. The proposed estimation method is almost as efficient as the popular least-squares-based estimation method for normal random errors and empirically shown to be much more efficient and robust with respect to heavy tailed errors or outliers in the responses and covariates. To achieve sparsity, a variable selection procedure based on SCAD is proposed to conduct estimation and variable selection simultaneously. The procedure is shown to possess the oracle property. To deal with the covariates missing, we consider the inverse probability weighted estimators for the linear model when the selection probability is known or unknown. It is shown that the estimator by using estimated selection probability has a smaller asymptotic variance than that with true selection probability, thus is more efficient. Therefore, the important Horvitz-Thompson property is verified for penalized rank estimator with the covariates missing in the linear model. Some numerical examples are provided to demonstrate the performance of the estimators.  相似文献   

3.

A goodness-of-fit technique for random samples from the exponential distribution based on the sample Lorenz curve is adapted for use in the exponential order statistic (EOS) model. In the EOS model, only those observations in a random sample from the exponential distribution of unknown size N that are less than some known stopping time T are observable. The model is known as the Jelinski-Moranda model in software reliability, where it is used to estimate the number of bugs in software during development. Distributional results are derived for the distance between the sample Lorenz curve and the population Lorenz curve so that it can be used as a goodness-of-fit test statistic. Simulations show that the test has good power against several alternative distributions. Simulations also indicate that in some cases, model misspecification leads to poor parameter estimation. A plotting procedure provides a means of graphical assessment of fit.  相似文献   

4.
ABSTRACT

Online consumer product ratings data are increasing rapidly. While most of the current graphical displays mainly represent the average ratings, Ho and Quinn proposed an easily interpretable graphical display based on an ordinal item response theory (IRT) model, which successfully accounts for systematic interrater differences. Conventionally, the discrimination parameters in IRT models are constrained to be positive, particularly in the modeling of scored data from educational tests. In this article, we use real-world ratings data to demonstrate that such a constraint can have a great impact on the parameter estimation. This impact on estimation was explained through rater behavior. We also discuss correlation among raters and assess the prediction accuracy for both the constrained and the unconstrained models. The results show that the unconstrained model performs better when a larger fraction of rater pairs exhibit negative correlations in ratings.  相似文献   

5.
We introduce two types of graphical log‐linear models: label‐ and level‐invariant models for triangle‐free graphs. These models generalise symmetry concepts in graphical log‐linear models and provide a tool with which to model symmetry in the discrete case. A label‐invariant model is category‐invariant and is preserved after permuting some of the vertices according to transformations that maintain the graph, whereas a level‐invariant model equates expected frequencies according to a given set of permutations. These new models can both be seen as instances of a new type of graphical log‐linear model termed the restricted graphical log‐linear model, or RGLL, in which equality restrictions on subsets of main effects and first‐order interactions are imposed. Their likelihood equations and graphical representation can be obtained from those derived for the RGLL models.  相似文献   

6.
Combining statistical models is an useful approach in all the research area where a global picture of the problem needs to be constructed by binding together evidence from different sources [M.S. Massa and S.L. Lauritzen Combining Statistical Models, M. Viana and H. Wynn, eds., American Mathematical Society, Providence, RI, 2010, pp. 239–259]. In this paper, we investigate the effectiveness of combining a fixed number of Gaussian graphical models respecting some consistency assumptions in problems of model building. In particular, we use the meta-Markov combination of Gaussian graphical models as detailed in Massa and Lauritzen and compare model selection results obtained by combining selections over smaller sets of variables with selection results over all variables of interest. In order to do so, we carry out some simulation studies in which different criteria are considered for the selection procedures. We conclude that the combination performs, generally, better than global estimation, is computationally simpler by virtue of having fewer and simpler models to work on, and has an intuitive appeal to a wide variety of contexts.  相似文献   

7.
An identification procedure for multivariate autoregressive moving average (ARMA) echelon-form models is proposed. It is based on the study of the linear dependence between rows of the Hankel matrix of serial correlations. To that end, we define a statistical test for checking the linear dependence between vectors of serial correlations. It is shown that the test statistic t?n considered is distributed asymptotically as a finite linear combination of independent chi-square random variables with one degree of freedom under the null hypothesis, whereas under the alternative hypothesis, t?N/N converges in probability to a positive constant. These results allow us, in particular, to compute the asymptotic probability of making a specification error with the proposed procedure. Links to other methods based on the application of canonical analysis are discussed. A simulation experiment was done in order to study the performance of the procedure. It is seen that the graphical representation of t?N, as a function of N, can be very useful in identifying the dynamic structure of ARMA models. Furthermore, for the model considered, the proposed identification procedure performs very well for series of 100 observations or more and reasonably well with short series of 50 observations.  相似文献   

8.
Abstract

In this paper we introduce continuous tree mixture model that is the mixture of undirected graphical models with tree structured graphs and is considered as multivariate analysis with a non parametric approach. We estimate its parameters, the component edge sets and mixture proportions through regularized maximum likalihood procedure. Our new algorithm, which uses expectation maximization algorithm and the modified version of Kruskal algorithm, simultaneosly estimates and prunes the mixture component trees. Simulation studies indicate this method performs better than the alternative Gaussian graphical mixture model. The proposed method is also applied to water-level data set and is compared with the results of Gaussian mixture model.  相似文献   

9.
Abstract

In this paper we are concerned with variable selection in finite mixture of semiparametric regression models. This task consists of model selection for non parametric component and variable selection for parametric part. Thus, we encountered separate model selections for every non parametric component of each sub model. To overcome this computational burden, we introduced a class of variable selection procedures for finite mixture of semiparametric regression models using penalized approach for variable selection. It is shown that the new method is consistent for variable selection. Simulations show that the performance of proposed method is good, and it consequently improves pervious works in this area and also requires much less computing power than existing methods.  相似文献   

10.
Mixture experiments are commonly encountered in many fields including chemical, pharmaceutical and consumer product industries. Due to their wide applications, mixture experiments, a special study of response surface methodology, have been given greater attention in both model building and determination of designs compared with other experimental studies. In this paper, some new approaches are suggested on model building and selection for the analysis of the data in mixture experiments by using a special generalized linear models, logistic regression model, proposed by Chen et al. [7]. Generally, the special mixture models, which do not have a constant term, are highly affected by collinearity in modeling the mixture experiments. For this reason, in order to alleviate the undesired effects of collinearity in the analysis of mixture experiments with logistic regression, a new mixture model is defined with an alternative ratio variable. The deviance analysis table is given for standard mixture polynomial models defined by transformations and special mixture models used as linear predictors. The effects of components on the response in the restricted experimental region are given by using an alternative representation of Cox's direction approach. In addition, odds ratio and the confidence intervals of odds ratio are identified according to the chosen reference and control groups. To compare the suggested models, some model selection criteria, graphical odds ratio and the confidence intervals of the odds ratio are used. The advantage of the suggested approaches is illustrated on tumor incidence data set.  相似文献   

11.
Biplots are useful tools to explore the relationship among variables. In this paper, the specific regression relationship between a set of predictors X and set of response variables Y by means of partial least-squares (PLS) regression is represented. The PLS biplot provides a single graphical representation of the samples together with the predictor and response variables, as well as their interrelationships in terms of the matrix of regression coefficients.  相似文献   

12.
ABSTRACT

This article discusses two asymmetrization methods, Azzalini's representation and beta generation, to generate asymmetric bimodal models including two novel beta-generated models. The practical utility of these models is assessed with nine data sets from different fields of applied sciences. Besides this tutorial assessment, some methodological contributions are made: a random number generator for the asymmetric Rathie–Swamee model is developed (generators for the other models are already known and briefly described) and a new likelihood ratio test of unimodality is compared via simulations with other available tests. Several tools have been used to quantify and test for bimodality and assess goodness of fit including Bayesian information criterion, measures of agreement with the empirical distribution and the Kolmogorov–Smirnoff test. In the nine case studies, the results favoured models derived from Azzalini's asymmetrization, but no single model provided a best fit across the applications considered. In only two cases the normal mixture was selected as best model. Parameter estimation has been done by likelihood maximization. Numerical optimization must be performed with care since local optima are often present. We concluded that the models considered are flexible enough to fit different bimodal shapes and that the tools studied should be used with care and attention to detail.  相似文献   

13.
Markov chain Monte Carlo techniques have revolutionized the field of Bayesian statistics. Their power is so great that they can even accommodate situations in which the structure of the statistical model itself is uncertain. However, the analysis of such trans-dimensional (TD) models is not easy and available software may lack the flexibility required for dealing with the complexities of real data, often because it does not allow the TD model to be simply part of some bigger model. In this paper we describe a class of widely applicable TD models that can be represented by a generic graphical model, which may be incorporated into arbitrary other graphical structures without significantly affecting the mechanism of inference. We also present a decomposition of the reversible jump algorithm into abstract and problem-specific components, which provides infrastructure for applying the method to all models in the class considered. These developments represent a first step towards a context-free method for implementing TD models that will facilitate their use by applied scientists for the practical exploration of model uncertainty. Our approach makes use of the popular WinBUGS framework as a sampling engine and we illustrate its use via two simple examples in which model uncertainty is a key feature.  相似文献   

14.
ABSTRACT

Inference for epidemic parameters can be challenging, in part due to data that are intrinsically stochastic and tend to be observed by means of discrete-time sampling, which are limited in their completeness. The problem is particularly acute when the likelihood of the data is computationally intractable. Consequently, standard statistical techniques can become too complicated to implement effectively. In this work, we develop a powerful method for Bayesian paradigm for susceptible–infected–removed stochastic epidemic models via data-augmented Markov Chain Monte Carlo. This technique samples all missing values as well as the model parameters, where the missing values and parameters are treated as random variables. These routines are based on the approximation of the discrete-time epidemic by diffusion process. We illustrate our techniques using simulated epidemics and finally we apply them to the real data of Eyam plague.  相似文献   

15.
In the context of the general linear model Y=Xβ+ε, the matrix Pz =Z(ZTZ)?1 ZT , where Z=(X: Y), plays an important role in determining least squares results. In this article we propose two graphical displays for the off-diagonal as well as the diagonal elements of PZ . The two graphs are based on simple ideas and are useful in the detection of potentially influential subsets of observations in regression. Since PZ is invariant with respect to permutations of the columns of Z, an added advantage of these graphs is that they can be used to detect outliers in multivariate data where the rows of Z are usually regarded as a random sample from a multivariate population. We also suggest two calibration points, one for the diagonal elements of PZ and the other for the off-diagonal elements. The advantage of these calibration points is that they take into consideration the variability of the off-diagonal as well as the diagonal elements of PZ . They also do not suffer from masking.  相似文献   

16.
Recent research has demonstrated that information learned from building a graphical model on the predictor set of a regularized linear regression model can be leveraged to improve prediction of a continuous outcome. In this article, we present a new model that encourages sparsity at both the level of the regression coefficients and the level of individual contributions in a decomposed representation. This model provides parameter estimates with a finite sample error bound and exhibits robustness to errors in the input graph structure. Through a simulation study and the analysis of two real data sets, we demonstrate that our model provides a predictive benefit when compared to previously proposed models. Furthermore, it is a highly flexible model that provides a unified framework for the fitting of many commonly used regularized regression models. The Canadian Journal of Statistics 47: 729–747; 2019 © 2019 Statistical Society of Canada  相似文献   

17.
ABSTRACT

Log-linear models for the distribution on a contingency table are represented as the intersection of only two kinds of log-linear models. One assuming that a certain group of the variables, if conditioned on all other variables, has a jointly independent distribution and another one assuming that a certain group of the variables, if conditioned on all other variables, has no highest order interaction. The subsets entering into these models are uniquely determined by the original log-linear model. This canonical representation suggests considering joint conditional independence and conditional no highest order association as the elementary building blocks of log-linear models.  相似文献   

18.
ABSTRACT

Nonstandard mixtures are those that result from a mixture of a discrete and a continuous random variable. They arise in practice, for example, in medical studies of exposure. Here, a random variable that models exposure might have a discrete mass point at no exposure, but otherwise may be continuous. In this article we explore estimating the distribution function associated with such a random variable from a nonparametric viewpoint. We assume that the locations of the discrete mass points are known so that we will be able to apply a classical nonparametric smoothing approach to the problem. The proposed estimator is a mixture of an empirical distribution function and a kernel estimate of a distribution function. A simple theoretical argument reveals that existing bandwidth selection algorithms can be applied to the smooth component of this estimator as well. The proposed approach is applied to two example sets of data.  相似文献   

19.
The graphical lasso has now become a useful tool to estimate high-dimensional Gaussian graphical models, but its practical applications suffer from the problem of choosing regularization parameters in a data-dependent way. In this article, we propose a model-averaged method for estimating sparse inverse covariance matrices for Gaussian graphical models. We consider the graphical lasso regularization path as the model space for Bayesian model averaging and use Markov chain Monte Carlo techniques for the regularization path point selection. Numerical performance of our method is investigated using both simulated and real datasets, in comparison with some state-of-art model selection procedures.  相似文献   

20.
Abstract

Augmented mixed beta regression models are suitable choices for modeling continuous response variables on the closed interval [0, 1]. The random eeceeects in these models are typically assumed to be normally distributed, but this assumption is frequently violated in some applied studies. In this paper, an augmented mixed beta regression model with skew-normal independent distribution for random effects are used. Next, we adopt a Bayesian approach for parameter estimation using the MCMC algorithm. The methods are then evaluated using some intensive simulation studies. Finally, the proposed models have applied to analyze a dataset from an Iranian Labor Force Survey.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号