首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 296 毫秒

In this paper, under the assumption of linear relationship between two variables we provide alternative simple method of proving the existing result connecting correlation coefficient with those of skewness of response and explanatory variables. Further we have given a relationship between correlation coefficient and coefficient of kurtosis of response and explanatory variables assuming the linear relationship between the two variables. Simple alternative way of deriving the formula, which helps in finding the direction dependence in linear regression, is discussed.  相似文献   


In this paper, we investigate the objective function and deflation process for sparse Partial Least Squares (PLS) regression with multiple components. While many have considered variations on the objective for sparse PLS, the deflation process for sparse PLS has not received as much attention. Our work highlights a flaw in the Statistically Inspired Modification of Partial Least Squares (SIMPLS) deflation method when applied in sparse PLS regression. We also consider the Nonlinear Iterative Partial Least Squares (NIPALS) deflation in sparse PLS regression. To remedy the flaw in the SIMPLS method, we propose a new sparse PLS method wherein the direction vectors are constrained to be sparse and lie in a chosen subspace. We give insight into this new PLS procedure and show through examples and simulation studies that the proposed technique can outperform alternative sparse PLS techniques in coefficient estimation. Moreover, our analysis reveals a simple renormalization step that can be used to improve the estimation of sparse PLS direction vectors generated using any convex relaxation method.  相似文献   


In some situations, for example, in biology or psychology studies, we wish to determine whether the linear relationship between response variable and predictor variables differs in two populations. The analysis of the covariance (ANCOVA) or, equivalently, the partial F-test approaches are the commonly used methods. In this study, the asymptotic distribution for the difference between two independent regression coefficients was established. The proposed method was used to derive the asymptotic confidence set for the difference between coefficients and hypothesis testing for the equality of the two regression models. Then a simulation study was conducted to compare the proposed method with the partial F method. The performance of the new method was comparable with that of the partial F method.  相似文献   


The application of conventional statistical methods to directional data generally produces erroneous results. Various regression models for a circular response have been presented in the literature, however these are unsatisfactory either in the limited relationships that can be modeled, or the limitations on the number or type of covariates admissible. One difficulty with circular regression is devising a meaningful regression function. This problem is exacerbated when trying to incorporate both linear and circular variables as covariates. Due to these complexities, circular regression is ripe for exploration via tree-based methods, in which a formal regression function is not needed, but where insight into the general structure and relationship between predictors and the response may be obtained. A basic framework for regression trees, predicting a circular response from a combination of circular and linear predictors, will be presented.  相似文献   

At the core of multivariate statistics is the investigation of relationships between different sets of variables. More precisely, the inter-variable relationships and the causal relationships. The latter is a regression problem, where one set of variables is referred to as the response variables and the other set of variables as the predictor variables. In this situation, the effect of the predictors on the response variables is revealed through the regression coefficients. Results from the resulting regression analysis can be viewed graphically using the biplot. The consequential biplot provides a single graphical representation of the samples together with the predictor variables and response variables. In addition, their effect in terms of the regression coefficients can be visualized, although sub-optimally, in the said biplot.KEYWORDS: Biplot, regression analysis, multivariate regression, rank approximation  相似文献   

To reduce the dimensionality of regression problems, sliced inverse regression approaches make it possible to determine linear combinations of a set of explanatory variables X related to the response variable Y in general semiparametric regression context. From a practical point of view, the determination of a suitable dimension (number of the linear combination of X) is important. In the literature, statistical tests based on the nullity of some eigenvalues have been proposed. Another approach is to consider the quality of the estimation of the effective dimension reduction (EDR) space. The square trace correlation between the true EDR space and its estimate can be used as goodness of estimation. In this article, we focus on the SIRα method and propose a naïve bootstrap estimation of the square trace correlation criterion. Moreover, this criterion could also select the α parameter in the SIRα method. We indicate how it can be used in practice. A simulation study is performed to illustrate the behavior of this approach.  相似文献   


M-estimation is a widely used technique for robust statistical inference. In this paper, we study robust partially functional linear regression model in which a scale response variable is explained by a function-valued variable and a finite number of real-valued variables. For the estimation of the regression parameters, which include the infinite dimensional function as well as the slope parameters for the real-valued variables, we use polynomial splines to approximate the slop parameter. The estimation procedure is easy to implement, and it is resistant to heavy-tailederrors or outliers in the response. The asymptotic properties of the proposed estimators are established. Finally, we assess the finite sample performance of the proposed method by Monte Carlo simulation studies.  相似文献   

In partly linear models, the dependence of the response y on (x T, t) is modeled through the relationship y=x T β+g(t)+?, where ? is independent of (x T, t). We are interested in developing an estimation procedure that allows us to combine the flexibility of the partly linear models, studied by several authors, but including some variables that belong to a non-Euclidean space. The motivating application of this paper deals with the explanation of the atmospheric SO2 pollution incidents using these models when some of the predictive variables belong in a cylinder. In this paper, the estimators of β and g are constructed when the explanatory variables t take values on a Riemannian manifold and the asymptotic properties of the proposed estimators are obtained under suitable conditions. We illustrate the use of this estimation approach using an environmental data set and we explore the performance of the estimators through a simulation study.  相似文献   


Ridge penalized least-squares estimators has been suggested as an alternative to the minimum penalized sum of squares estimates in the presence of collinearity among the explanatory variables in semiparametric regression models (SPRMs). This paper studies the local influence of minor perturbations on the ridge estimates in the SPRM. The diagnostics under the perturbation of ridge penalized sum of squares, response variable, explanatory variables and ridge parameter are considered. Some local influence diagnostics are given. A Monte Carlo simulation study and a real example are used to illustrate the proposed perturbations.  相似文献   


Time averaging has been the traditional approach to handle mixed sampling frequencies. However, it ignores information possibly embedded in high frequency. Mixed data sampling (MIDAS) regression models provide a concise way to utilize the additional information in high-frequency variables. In this paper, we propose a specification test to choose between time averaging and MIDAS models, based on a Durbin-Wu-Hausman test. In particular, a set of instrumental variables is proposed and theoretically validated when the frequency ratio is large. As a result, our method tends to be more powerful than existing methods, as reconfirmed through the simulations.  相似文献   


In this paper we introduce the exponentiated Fréchet regression for modelling positive responses having a long-tailed distribution in a regression model, which are common in actuarial statistics. We propose two parameterizations each of which links the regression parameters with the explanatory variables. We then discuss the maximum likelihood estimation of the parameters both theoretically and empirically. In order to meet the needs of an actuary, closed-form expressions for certain risk measures for the exponentiated Fréchet distribution are also derived. We employ the proposed model to a motorcycle claim size data set.  相似文献   


In some applications, the quality of a process or product is best characterized by a functional relationship between a response variable and one or more explanatory variables. Profile monitoring is used to understand and to check the stability of this relationship or curve over time. In the existing simple linear regression profile models, it is often assumed that the data follow a single mode distribution and consequently the noise of the functional relationship follows a normal distribution. However, in some applications, it is likely that the data may follow a multiple-modes distribution. In this case, it is more appropriate to assume that the data follow a mixture profile. In this study, we focus on a mixture simple linear profile model, and propose new control schemes for Phase II monitoring. The proposed methods are shown to have good performance in a simulation study.  相似文献   


In the fields of internet financial transactions and reliability engineering, there could be more zero and one observations simultaneously. In this paper, considering that it is beyond the range where the conventional model can fit, zero-and-one-inflated geometric distribution regression model is proposed. Ingeniously introducing Pólya-Gamma latent variables in the Bayesian inference, posterior sampling with high-dimensional parameters is converted to latent variables sampling and posterior sampling with lower-dimensional parameters, respectively. Circumventing the need for Metropolis-Hastings sampling, the sample with higher sampling efficiency is obtained. A simulation study is conducted to assess the performance of the proposed estimation for various sample sizes. Finally, a doctoral dissertation data set is analyzed to illustrate the practicability of the proposed method, research shows that zero-and-one-inflated geometric distribution regression model using Pólya-Gamma latent variables can achieve better fitting results.  相似文献   

Most methods for survival prediction from high-dimensional genomic data combine the Cox proportional hazards model with some technique of dimension reduction, such as partial least squares regression (PLS). Applying PLS to the Cox model is not entirely straightforward, and multiple approaches have been proposed. The method of Park et al. (Bioinformatics 18(Suppl. 1):S120–S127, 2002) uses a reformulation of the Cox likelihood to a Poisson type likelihood, thereby enabling estimation by iteratively reweighted partial least squares for generalized linear models. We propose a modification of the method of park et al. (2002) such that estimates of the baseline hazard and the gene effects are obtained in separate steps. The resulting method has several advantages over the method of park et al. (2002) and other existing Cox PLS approaches, as it allows for estimation of survival probabilities for new patients, enables a less memory-demanding estimation procedure, and allows for incorporation of lower-dimensional non-genomic variables like disease grade and tumor thickness. We also propose to combine our Cox PLS method with an initial gene selection step in which genes are ordered by their Cox score and only the highest-ranking k% of the genes are retained, obtaining a so-called supervised partial least squares regression method. In simulations, both the unsupervised and the supervised version outperform other Cox PLS methods.  相似文献   


The varying-coefficient single-index model (VCSIM) is a very general and flexible tool for exploring the relationship between a response variable and a set of predictors. Popular special cases include single-index models and varying-coefficient models. In order to estimate the index-coefficient and the non parametric varying-coefficients in the VCSIM, we propose a two-stage composite quantile regression estimation procedure, which integrates the local linear smoothing method and the information of quantile regressions at a number of conditional quantiles of the response variable. We establish the asymptotic properties of the proposed estimators for the index-coefficient and varying-coefficients when the error is heterogeneous. When compared with the existing mean-regression-based estimation method, our simulation results indicate that our proposed method has comparable performance for normal error and is more robust for error with outliers or heavy tail. We illustrate our methodologies with a real example.  相似文献   

Partial least squares regression (PLS) is one method to estimate parameters in a linear model when predictor variables are nearly collinear. One way to characterize PLS is in terms of the scaling (shrinkage or expansion) along each eigenvector of the predictor correlation matrix. This characterization is useful in providing a link between PLS and other shrinkage estimators, such as principal components regression (PCR) and ridge regression (RR), thus facilitating a direct comparison of PLS with these methods. This paper gives a detailed analysis of the shrinkage structure of PLS, and several new results are presented regarding the nature and extent of shrinkage.  相似文献   

Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

The partial least squares (PLS) approach first constructs new explanatory variables, known as factors (or components), which are linear combinations of available predictor variables. A small subset of these factors is then chosen and retained for prediction. We study the performance of PLS in estimating single-index models, especially when the predictor variables exhibit high collinearity. We show that PLS estimates are consistent up to a constant of proportionality. We present three simulation studies that compare the performance of PLS in estimating single-index models with that of sliced inverse regression (SIR). In the first two studies, we find that PLS performs better than SIR when collinearity exists. In the third study, we learn that PLS performs well even when there are multiple dependent variables, the link function is non-linear and the shape of the functional form is not known.  相似文献   


The conditional density offers the most informative summary of the relationship between explanatory and response variables. We need to estimate it in place of the simple conditional mean when its shape is not well-behaved. A motivation for estimating conditional densities, specific to the circular setting, lies in the fact that a natural alternative of it, like quantile regression, could be considered problematic because circular quantiles are not rotationally equivariant. We treat conditional density estimation as a local polynomial fitting problem as proposed by Fan et al. [Estimation of conditional densities and sensitivity measures in nonlinear dynamical systems. Biometrika. 1996;83:189–206] in the Euclidean setting, and discuss a class of estimators in the cases when the conditioning variable is either circular or linear. Asymptotic properties for some members of the proposed class are derived. The effectiveness of the methods for finite sample sizes is illustrated by simulation experiments and an example using real data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号