首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The multiple non symmetric correspondence analysis (MNSCA) is a useful technique for analyzing a two-way contingency table. In more complex cases, the predictor variables are more than one. In this paper, the MNSCA, along with the decomposition of the Gray–Williams Tau index, in main effects and interaction term, is used to analyze a contingency table with two predictor categorical variables and an ordinal response variable. The Multiple-Tau index is a measure of association that contains both main effects and interaction term. The main effects represent the change in the response variables due to the change in the level/categories of the predictor variables, considering the effects of their addition, while the interaction effect represents the combined effect of predictor categorical variables on the ordinal response variable. Moreover, for ordinal scale variables, we propose a further decomposition in order to check the existence of power components by using Emerson's orthogonal polynomials.  相似文献   

2.
Non-symmetric correspondence analysis (NSCA) is a useful technique for analysing a two-way contingency table. Frequently, the predictor variables are more than one; in this paper, we consider two categorical variables as predictor variables and one response variable. Interaction represents the joint effects of predictor variables on the response variable. When interaction is present, the interpretation of the main effects is incomplete or misleading. To separate the main effects and the interaction term, we introduce a method that, starting from the coordinates of multiple NSCA and using a two-way analysis of variance without interaction, allows a better interpretation of the impact of the predictor variable on the response variable. The proposed method has been applied on a well-known three-way contingency table proposed by Bockenholt and Bockenholt in which they cross-classify subjects by person's attitude towards abortion, number of years of education and religion. We analyse the case where the variables education and religion influence a person's attitude towards abortion.  相似文献   

3.
Biplots are useful tools to explore the relationship among variables. In this paper, the specific regression relationship between a set of predictors X and set of response variables Y by means of partial least-squares (PLS) regression is represented. The PLS biplot provides a single graphical representation of the samples together with the predictor and response variables, as well as their interrelationships in terms of the matrix of regression coefficients.  相似文献   

4.
At the core of multivariate statistics is the investigation of relationships between different sets of variables. More precisely, the inter-variable relationships and the causal relationships. The latter is a regression problem, where one set of variables is referred to as the response variables and the other set of variables as the predictor variables. In this situation, the effect of the predictors on the response variables is revealed through the regression coefficients. Results from the resulting regression analysis can be viewed graphically using the biplot. The consequential biplot provides a single graphical representation of the samples together with the predictor variables and response variables. In addition, their effect in terms of the regression coefficients can be visualized, although sub-optimally, in the said biplot.KEYWORDS: Biplot, regression analysis, multivariate regression, rank approximation  相似文献   

5.
In this article, we consider the problem of selecting functional variables using the L1 regularization in a functional linear regression model with a scalar response and functional predictors, in the presence of outliers. Since the LASSO is a special case of the penalized least-square regression with L1 penalty function, it suffers from the heavy-tailed errors and/or outliers in data. Recently, Least Absolute Deviation (LAD) and the LASSO methods have been combined (the LAD-LASSO regression method) to carry out robust parameter estimation and variable selection simultaneously for a multiple linear regression model. However, variable selection of the functional predictors based on LASSO fails since multiple parameters exist for a functional predictor. Therefore, group LASSO is used for selecting functional predictors since group LASSO selects grouped variables rather than individual variables. In this study, we propose a robust functional predictor selection method, the LAD-group LASSO, for a functional linear regression model with a scalar response and functional predictors. We illustrate the performance of the LAD-group LASSO on both simulated and real data.  相似文献   

6.
Abstract. The Yule–Simpson paradox notes that an association between random variables can be reversed when averaged over a background variable. Cox and Wermuth introduced the concept of distribution dependence between two random variables X and Y, and gave two dependence conditions, each of which guarantees that reversal of qualitatively similar conditional dependences cannot occur after marginalizing over the background variable. Ma, Xie and Geng studied the uniform collapsibility of distribution dependence over a background variable W, under stronger homogeneity condition. Collapsibility ensures that associations are the same for conditional and marginal models. In this article, we use the notion of average collapsibility, which requires only the conditional effects average over the background variable to the corresponding marginal effect and investigate its conditions for distribution dependence and for quantile regression coefficients.  相似文献   

7.
A primary focus of an increasing number of scientific studies is to determine whether two exposures interact in the effect that they produce on an outcome of interest. Interaction is commonly assessed by fitting regression models in which the linear predictor includes the product between those exposures. When the main interest lies in the interaction, this approach is not entirely satisfactory because it is prone to (possibly severe) bias when the main exposure effects or the association between outcome and extraneous factors are misspecified. In this article, we therefore consider conditional mean models with identity or log link which postulate the statistical interaction in terms of a finite-dimensional parameter, but which are otherwise unspecified. We show that estimation of the interaction parameter is often not feasible in this model because it would require nonparametric estimation of auxiliary conditional expectations given high-dimensional variables. We thus consider 'multiply robust estimation' under a union model that assumes at least one of several working submodels holds. Our approach is novel in that it makes use of information on the joint distribution of the exposures conditional on the extraneous factors in making inferences about the interaction parameter of interest. In the special case of a randomized trial or a family-based genetic study in which the joint exposure distribution is known by design or by Mendelian inheritance, the resulting multiply robust procedure leads to asymptotically distribution-free tests of the null hypothesis of no interaction on an additive scale. We illustrate the methods via simulation and the analysis of a randomized follow-up study.  相似文献   

8.
Classification models can demonstrate apparent prediction accuracy even when there is no underlying relationship between the predictors and the response. Variable selection procedures can lead to false positive variable selections and overestimation of true model performance. A simulation study was conducted using logistic regression with forward stepwise, best subsets, and LASSO variable selection methods with varying total sample sizes (20, 50, 100, 200) and numbers of random noise predictor variables (3, 5, 10, 15, 20, 50). Using our critical values can help reduce needless follow-up on variables having no true association with the outcome.  相似文献   

9.
It is frequently the case that a response will be related to both a vector of finite length and a function-valued random variable as predictor variables. In this paper, we propose new estimators for the parameters of a partial functional linear model which explores the relationship between a scalar response variable and mixed-type predictors. Asymptotic properties of the proposed estimators are established and finite sample behavior is studied through a small simulation experiment.  相似文献   

10.
ABSTRACT

In some situations, for example, in biology or psychology studies, we wish to determine whether the linear relationship between response variable and predictor variables differs in two populations. The analysis of the covariance (ANCOVA) or, equivalently, the partial F-test approaches are the commonly used methods. In this study, the asymptotic distribution for the difference between two independent regression coefficients was established. The proposed method was used to derive the asymptotic confidence set for the difference between coefficients and hypothesis testing for the equality of the two regression models. Then a simulation study was conducted to compare the proposed method with the partial F method. The performance of the new method was comparable with that of the partial F method.  相似文献   

11.
Nonparametric seemingly unrelated regression provides a powerful alternative to parametric seemingly unrelated regression for relaxing the linearity assumption. The existing methods are limited, particularly with sharp changes in the relationship between the predictor variables and the corresponding response variable. We propose a new nonparametric method for seemingly unrelated regression, which adopts a tree-structured regression framework, has satisfiable prediction accuracy and interpretability, no restriction on the inclusion of categorical variables, and is less vulnerable to the curse of dimensionality. Moreover, an important feature is constructing a unified tree-structured model for multivariate data, even though the predictor variables corresponding to the response variable are entirely different. This unified model can offer revelatory insights such as underlying economic meaning. We propose the key factors of tree-structured regression, which are an impurity function detecting complex nonlinear relationships between the predictor variables and the response variable, split rule selection with negligible selection bias, and tree size determination solving underfitting and overfitting problems. We demonstrate our proposed method using simulated data and illustrate it using data from the Korea stock exchange sector indices.  相似文献   

12.
In this paper, we derive some simple formulae to express the association between two random variables in the case of a linear relationship, One of these representations, the cube of the correlation coefficient, is given as the ratio of the skewness of the response variable to that of the explanatory variable. This result, along with other expressions of the correlation coefficient presented in this paper, has implications for choosing the response variable in a linear regression modelling.  相似文献   

13.
A general modeling procedure for analyzing genetic data is reviewed. We review ANOVA type model that can handle both the continuous and discrete genetic variables in one modeling framework. Unlike the regression type models which typically set the phenotype variable as a response, this ANOVA model treats the phenotype variable as an explanatory variable. By reversely treating the phenotype variable, usual high dimensional problem is turned into low dimension. Instead, the ANOVA model always includes interaction term between the genetic locations and phenotype variable to find potential association between them. The interaction term is designed to be low rank with the multiplication of bilinear terms so that the required number of parameters is kept in a manageable degree. We compare the performance of the reviewed ANOVA model to the other popular methods via microarray and SNP data sets.  相似文献   

14.
In some situations, for example in agriculture, biology, hydrology, and psychology, researchers wish to determine whether the relationship between response variable and predictor variables differs in two populations. In other words, we are interested in comparing two regression models for two independent datasets. In this work, we will use the parametric and nonparametric methods to establish hypothesis testing for the equality of two independent regression models. Then the simulation study is provided to investigate the performance of the proposed method.  相似文献   

15.
A number of articles have discussed the way lower order polynomial and interaction terms should be handled in linear regression models. Only if all lower order terms are included in the model will the regression model be invariant with respect to coding transformations of the variables. If lower order terms are omitted, the regression model will not be well formulated. In this paper, we extend this work to examine the implications of the ordering of variables in the linear mixed-effects model. We demonstrate how linear transformations of the variables affect the model and tests of significance of fixed effects in the model. We show how the transformations modify the random effects in the model, as well as their covariance matrix and the value of the restricted log-likelihood. We suggest a variable selection strategy for the linear mixed-effects model.  相似文献   

16.
Rock bursts are sudden and violent failures of surrounding rockmasses in underground mines and excavations. In this paper, a database consisting of 188 case histories was collected. Each case history contains some of the predictor variables ‘overburden thickness, maximum tangential stress, uniaxial compressive strength of rock, tensile strength of rock, stress ratio, brittleness ratio and elastic energy index’ and one of the four defined classes for the dependent variable ‘rock burst intensity’. A strategy, including ‘outlier detection and substitution, normality evaluation, deduction of distribution functions, estimation of mean and mean variation ranges, evaluation of mean-equality and distribution function-equality hypotheses, correlation analysis and factor analysis for in-review variables’, was implemented. The strategy led to conclude that some predictor variables with available case histories have no contributions for rock burst prediction. These inferences were in accordance with the results of regression techniques for qualitative dependent variables. Besides, many predictor variable arrangements were incompatible with factor analysis. In the case of compatible arrangements, the variation of the predictor variables cannot be considerably reflected. Application of nonlinear principal component analysis using auto-associative neural networks did not also lead to representative components. Therefore, the significant predictor variables can only be used to design new classifiers.  相似文献   

17.
Researchers in the medical, health, and social sciences routinely encounter ordinal variables such as self‐reports of health or happiness. When modelling ordinal outcome variables, it is common to have covariates, for example, attitudes, family income, retrospective variables, measured with error. As is well known, ignoring even random error in covariates can bias coefficients and hence prejudice the estimates of effects. We propose an instrumental variable approach to the estimation of a probit model with an ordinal response and mismeasured predictor variables. We obtain likelihood‐based and method of moments estimators that are consistent and asymptotically normally distributed under general conditions. These estimators are easy to compute, perform well and are robust against the normality assumption for the measurement errors in our simulation studies. The proposed method is applied to both simulated and real data. The Canadian Journal of Statistics 47: 653–667; 2019 © 2019 Statistical Society of Canada  相似文献   

18.
Interaction is very common in reality, but has received little attention in logistic regression literature. This is especially true for higher-order interactions. In conventional logistic regression, interactions are typically ignored. We propose a model selection procedure by implementing an association rules analysis. We do this by (1) exploring the combinations of input variables which have significant impacts to response (via association rules analysis); (2) selecting the potential (low- and high-order) interactions; (3) converting these potential interactions into new dummy variables; and (4) performing variable selections among all the input variables and the newly created dummy variables (interactions) to build up the optimal logistic regression model. Our model selection procedure establishes the optimal combination of main effects and potential interactions. The comparisons are made through thorough simulations. It is shown that the proposed method outperforms the existing methods in all cases. A real-life example is discussed in detail to demonstrate the proposed method.  相似文献   

19.
Mixed effects models and Berkson measurement error models are widely used. They share features which the author uses to develop a unified estimation framework. He deals with models in which the random effects (or measurement errors) have a general parametric distribution, whereas the random regression coefficients (or unobserved predictor variables) and error terms have nonparametric distributions. He proposes a second-order least squares estimator and a simulation-based estimator based on the first two moments of the conditional response variable given the observed covariates. He shows that both estimators are consistent and asymptotically normally distributed under fairly general conditions. The author also reports Monte Carlo simulation studies showing that the proposed estimators perform satisfactorily for relatively small sample sizes. Compared to the likelihood approach, the proposed methods are computationally feasible and do not rely on the normality assumption for random effects or other variables in the model.  相似文献   

20.
Multinomial logit (also termed multi-logit) models permit the analysis of the statistical relation between a categorical response variable and a set of explicative variables (called covariates or regressors). Although multinomial logit is widely used in both the social and economic sciences, the interpretation of regression coefficients may be tricky, as the effect of covariates on the probability distribution of the response variable is nonconstant and difficult to quantify. The ternary plots illustrated in this article aim at facilitating the interpretation of regression coefficients and permit the effect of covariates (either singularly or jointly considered) on the probability distribution of the dependent variable to be quantified. Ternary plots can be drawn both for ordered and for unordered categorical dependent variables, when the number of possible outcomes equals three (trinomial response variable); these plots allow not only to represent the covariate effects over the whole parameter space of the dependent variable but also to compare the covariate effects of any given individual profile. The method is illustrated and discussed through analysis of a dataset concerning the transition of master’s graduates of the University of Trento (Italy) from university to employment.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号