首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 18 毫秒
Biplots are useful tools to explore the relationship among variables. In this paper, the specific regression relationship between a set of predictors X and set of response variables Y by means of partial least-squares (PLS) regression is represented. The PLS biplot provides a single graphical representation of the samples together with the predictor and response variables, as well as their interrelationships in terms of the matrix of regression coefficients.  相似文献   

Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

Variable selection over a potentially large set of covariates in a linear model is quite popular. In the Bayesian context, common prior choices can lead to a posterior expectation of the regression coefficients that is a sparse (or nearly sparse) vector with a few nonzero components, those covariates that are most important. This article extends the “global‐local” shrinkage idea to a scenario where one wishes to model multiple response variables simultaneously. Here, we have developed a variable selection method for a K‐outcome model (multivariate regression) that identifies the most important covariates across all outcomes. The prior for all regression coefficients is a mean zero normal with coefficient‐specific variance term that consists of a predictor‐specific factor (shared local shrinkage parameter) and a model‐specific factor (global shrinkage term) that differs in each model. The performance of our modeling approach is evaluated through simulation studies and a data example.  相似文献   

Biplots of compositional data   总被引:6,自引:0,他引:6  
Summary. The singular value decomposition and its interpretation as a linear biplot have proved to be a powerful tool for analysing many forms of multivariate data. Here we adapt biplot methodology to the specific case of compositional data consisting of positive vectors each of which is constrained to have unit sum. These relative variation biplots have properties relating to the special features of compositional data: the study of ratios, subcompositions and models of compositional relationships. The methodology is applied to a data set consisting of six-part colour compositions in 22 abstract paintings, showing how the singular value decomposition can achieve an accurate biplot of the colour ratios and how possible models interrelating the colours can be diagnosed.  相似文献   

Both the least squares estimator and M-estimators of regression coefficients are susceptible to distortion when high leverage points occur among the predictor variables in a multiple linear regression model. In this article a weighting scheme which enables one to bound the leverage values of a weighted matrix of predictor variables is proposed. Bounded-leverage weighting of the predictor variables followed by M-estimation of the regression coefficients is shown to be effective in protecting against distortion due to extreme predictor-variable values, extreme response values, or outlier-induced multieollinearites. Bounded-leverage estimators can also protect against distortion by small groups of high leverage points.  相似文献   

A robust biplot     
This paper introduces a robust biplot which is related to multivariate M-estimates. The n × p data matrix is first considered as a sample of size n from some p-variate population, and robust M-estimates of the population location vector and scatter matrix are calculated. In the construction of the biplot, each row of the data matrix is assigned a weight determined in the preliminary robust estimation. In a robust biplot, one can plot the variables in order to represent characteristics of the robust variance-covariance matrix: the length of the vector representing a variable is proportional to its robust standard deviation, while the cosine of the angle between two variables is approximately equal to their robust correlation. The proposed biplot also permits a meaningful representation of the variables in a robust principal-component analysis. The discrepancies between least-squares and robust biplots are illustrated in an example.  相似文献   

Panel studies are statistical studies in which two or more variables are observed for two or more subjects at two or more points in time. Cross-lagged panel studies are comprised of continuous variables which divide naturally into two sets, and otten the primary statistical issue Is to estimate and test the cross-effects which indicate the degree to which each set is related to the other over time. By taking a regression approach to modeling the relationships, we apply multivariate regression methodology to make inferences about the regression coefficients in a cross-lagged panel model. In particular we develop a test of the hypothesis that the regression coefficients indicating the cross-effects are equal and develop simultaneous confidence bounds for various linear combinations of these regression coefficients.  相似文献   

Nonparametric seemingly unrelated regression provides a powerful alternative to parametric seemingly unrelated regression for relaxing the linearity assumption. The existing methods are limited, particularly with sharp changes in the relationship between the predictor variables and the corresponding response variable. We propose a new nonparametric method for seemingly unrelated regression, which adopts a tree-structured regression framework, has satisfiable prediction accuracy and interpretability, no restriction on the inclusion of categorical variables, and is less vulnerable to the curse of dimensionality. Moreover, an important feature is constructing a unified tree-structured model for multivariate data, even though the predictor variables corresponding to the response variable are entirely different. This unified model can offer revelatory insights such as underlying economic meaning. We propose the key factors of tree-structured regression, which are an impurity function detecting complex nonlinear relationships between the predictor variables and the response variable, split rule selection with negligible selection bias, and tree size determination solving underfitting and overfitting problems. We demonstrate our proposed method using simulated data and illustrate it using data from the Korea stock exchange sector indices.  相似文献   


In some situations, for example, in biology or psychology studies, we wish to determine whether the linear relationship between response variable and predictor variables differs in two populations. The analysis of the covariance (ANCOVA) or, equivalently, the partial F-test approaches are the commonly used methods. In this study, the asymptotic distribution for the difference between two independent regression coefficients was established. The proposed method was used to derive the asymptotic confidence set for the difference between coefficients and hypothesis testing for the equality of the two regression models. Then a simulation study was conducted to compare the proposed method with the partial F method. The performance of the new method was comparable with that of the partial F method.  相似文献   

Multinomial logit (also termed multi-logit) models permit the analysis of the statistical relation between a categorical response variable and a set of explicative variables (called covariates or regressors). Although multinomial logit is widely used in both the social and economic sciences, the interpretation of regression coefficients may be tricky, as the effect of covariates on the probability distribution of the response variable is nonconstant and difficult to quantify. The ternary plots illustrated in this article aim at facilitating the interpretation of regression coefficients and permit the effect of covariates (either singularly or jointly considered) on the probability distribution of the dependent variable to be quantified. Ternary plots can be drawn both for ordered and for unordered categorical dependent variables, when the number of possible outcomes equals three (trinomial response variable); these plots allow not only to represent the covariate effects over the whole parameter space of the dependent variable but also to compare the covariate effects of any given individual profile. The method is illustrated and discussed through analysis of a dataset concerning the transition of master’s graduates of the University of Trento (Italy) from university to employment.  相似文献   

Fitting multiplicative models by robust alternating regressions   总被引:1,自引:0,他引:1  
In this paper a robust approach for fitting multiplicative models is presented. Focus is on the factor analysis model, where we will estimate factor loadings and scores by a robust alternating regression algorithm. The approach is highly robust, and also works well when there are more variables than observations. The technique yields a robust biplot, depicting the interaction structure between individuals and variables. This biplot is not predetermined by outliers, which can be retrieved from the residual plot. Also provided is an accompanying robust R 2-plot to determine the appropriate number of factors. The approach is illustrated by real and artificial examples and compared with factor analysis based on robust covariance matrix estimators. The same estimation technique can fit models with both additive and multiplicative effects (FANOVA models) to two-way tables, thereby extending the median polish technique.  相似文献   

Biplots represent a widely used statistical tool for visualizing the resulting loadings and scores of a dimension reduction technique applied to multivariate data. If the underlying data carry only relative information (i.e. compositional data expressed in proportions, mg/kg, etc.) they have to be pre-processed with a logratio transformation before the dimension reduction is carried out. In the context of principal component analysis, the resulting biplot is called compositional biplot. We introduce an alternative, the ilr biplot, which is based on a special choice of orthonormal coordinates resulting from an isometric logratio (ilr) transformation. This allows to incorporate also external non-compositional variables, and to study the relations to the compositional variables. The methodology is demonstrated on real data sets.  相似文献   

Motivated from problems in canonical correlation analysis, reduced rank regression and sufficient dimension reduction, we introduce a double dimension reduction model where a single index of the multivariate response is linked to the multivariate covariate through a single index of these covariates, hence the name double single index model. Because nonlinear association between two sets of multivariate variables can be arbitrarily complex and even intractable in general, we aim at seeking a principal one‐dimensional association structure where a response index is fully characterized by a single predictor index. The functional relation between the two single‐indices is left unspecified, allowing flexible exploration of any potential nonlinear association. We argue that such double single index association is meaningful and easy to interpret, and the rest of the multi‐dimensional dependence structure can be treated as nuisance in model estimation. We investigate the estimation and inference of both indices and the regression function, and derive the asymptotic properties of our procedure. We illustrate the numerical performance in finite samples and demonstrate the usefulness of the modelling and estimation procedure in a multi‐covariate multi‐response problem concerning concrete.  相似文献   

In contrast to the common belief that the logit model has no analytical presentation, it is possible to find such a solution in the case of categorical predictors. This paper shows that a binary logistic regression by categorical explanatory variables can be constructed in a closed-form solution. No special software and no iterative procedures of nonlinear estimation are needed to obtain a model with all its parameters and characteristics, including coefficients of regression, their standard errors and t-statistics, as well as the residual and null deviances. The derivation is performed for logistic models with one binary or categorical predictor, and several binary or categorical predictors. The analytical formulae can be used for arithmetical calculation of all the parameters of the logit regression. The explicit expressions for the characteristics of logit regression are convenient for the analysis and interpretation of the results of logistic modeling.  相似文献   

Summary. When a number of distinct models contend for use in prediction, the choice of a single model can offer rather unstable predictions. In regression, stochastic search variable selection with Bayesian model averaging offers a cure for this robustness issue but at the expense of requiring very many predictors. Here we look at Bayes model averaging incorporating variable selection for prediction. This offers similar mean-square errors of prediction but with a vastly reduced predictor space. This can greatly aid the interpretation of the model. It also reduces the cost if measured variables have costs. The development here uses decision theory in the context of the multivariate general linear model. In passing, this reduced predictor space Bayes model averaging is contrasted with single-model approximations. A fast algorithm for updating regressions in the Markov chain Monte Carlo searches for posterior inference is developed, allowing many more variables than observations to be contemplated. We discuss the merits of absolute rather than proportionate shrinkage in regression, especially when there are more variables than observations. The methodology is illustrated on a set of spectroscopic data used for measuring the amounts of different sugars in an aqueous solution.  相似文献   

Nonlinear regression models arise when definite information is available about the form of the relationship between the response and predictor variables. Such information might involve direct knowledge of the actual form of the true model or might be represented by a set of differential equations that the model must satisfy. We develop M-procedures for estimating parameters and testing hypotheses of interest about these parameters in nonlinear regression models for repeated measurement data. Under regularity conditions, the asymptotic properties of the M-procedures are presented, including the uniform linearity, normality and consistency. The computation of the M-estimators of the model parameters is performed with iterative procedures, similar to Newton–Raphson and Fisher's scoring methods. The methodology is illustrated by using a multivariate logistic regression model with real data, along with a simulation study.  相似文献   

Quantile regression models are a powerful tool for studying different points of the conditional distribution of univariate response variables. Their multivariate counterpart extension though is not straightforward, starting with the definition of multivariate quantiles. We propose here a flexible Bayesian quantile regression model when the response variable is multivariate, where we are able to define a structured additive framework for all predictor variables. We build on previous ideas considering a directional approach to define the quantiles of a response variable with multiple outputs, and we define noncrossing quantiles in every directional quantile model. We define a Markov chain Monte Carlo (MCMC) procedure for model estimation, where the noncrossing property is obtained considering a Gaussian process design to model the correlation between several quantile regression models. We illustrate the results of these models using two datasets: one on dimensions of inequality in the population, such as income and health; the second on scores of students in the Brazilian High School National Exam, considering three dimensions for the response variable.  相似文献   

In the literature, there are many results on the consequences of mis-specified models for linear models with error in the response only, see, e.g., Seber(1977). There are also discussions of estimation for the model writh errors both in the response and in the predictor variables (called measurement error models; see, e.g., Fuller(1987)). In this paper, we consider the problem of model mis-specification for measurement error models. Only a few special cases have been tackled in the past (Edland, 1996; Carroll and Ruppert, 1996 and Lakshminarayanan Amp; Gunst, 1984); we deal with the situation here in some generality. Results have been obtained as follows: (a) When a model is under-fitted, the estimate of the variance of the measurement error will be asymptotically biased, as will the regression coefficients, and the asymptotic biases in the estimates of the regression coefficients will always exist for under-fitted models. Even orthogonality of the variables in the model will not make the biases vanish. (b)For over-fitting, the estimates of the variances of measurement errors and of the regression coefficients are asymptotically unbiased. However, the variance of the estimated regression coefficients will increase. Over-fitting will cause larger changes in the variances of the estimated parameters in measurement error models than in no measurement error models.  相似文献   

When data sets are multilevel (group nesting or repeated measures), different sources of variations must be identified. In the framework of unsupervised analyses, multilevel simultaneous component analysis (MSCA) has recently been proposed as the most satisfactory option for analyzing multilevel data. MSCA estimates submodels for the different levels in data and thereby separates the “within”-subject and “between”-subject variations in the variables. Following the principles of MSCA and the strategy of decomposing the available data matrix into orthogonal blocks, and taking into account the between- and the within data structures, we generalize, in a multilevel perspective, multivariate models in which a matrix of response variables can be used to guide the projections (formed by responses predicted by explanatory variables or by a limited number of their combinations/composites) into choices of meaningful directions. To this end, the current paper proposes the multilevel version of the multivariate regression model and dimensionality-reduction methods (used to predict responses with fewer linear composites of explanatory variables). The principle findings of the study are that the minimization of the loss functions related to multivariate regression, principal-component regression, reduced-rank regression, and canonical-correlation regression are equivalent to the separate minimization of the sum of two separate loss functions corresponding to the between and within structures, under some constraints. The paper closes with a case study of an application focusing on the relationships between mental health severity and the intensity of care in the Lombardy region mental health system.  相似文献   

Count data often contain many zeros. In parametric regression analysis of zero-inflated count data, the effect of a covariate of interest is typically modelled via a linear predictor. This approach imposes a restrictive, and potentially questionable, functional form on the relation between the independent and dependent variables. To address the noted restrictions, a flexible parametric procedure is employed to model the covariate effect as a linear combination of fixed-knot cubic basis splines or B-splines. The semiparametric zero-inflated Poisson regression model is fitted by maximizing the likelihood function through an expectation–maximization algorithm. The smooth estimate of the functional form of the covariate effect can enhance modelling flexibility. Within this modelling framework, a log-likelihood ratio test is used to assess the adequacy of the covariate function. Simulation results show that the proposed test has excellent power in detecting the lack of fit of a linear predictor. A real-life data set is used to illustrate the practicality of the methodology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号