首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
ABSTRACT

The application of conventional statistical methods to directional data generally produces erroneous results. Various regression models for a circular response have been presented in the literature, however these are unsatisfactory either in the limited relationships that can be modeled, or the limitations on the number or type of covariates admissible. One difficulty with circular regression is devising a meaningful regression function. This problem is exacerbated when trying to incorporate both linear and circular variables as covariates. Due to these complexities, circular regression is ripe for exploration via tree-based methods, in which a formal regression function is not needed, but where insight into the general structure and relationship between predictors and the response may be obtained. A basic framework for regression trees, predicting a circular response from a combination of circular and linear predictors, will be presented.  相似文献   

2.
Nonparametric seemingly unrelated regression provides a powerful alternative to parametric seemingly unrelated regression for relaxing the linearity assumption. The existing methods are limited, particularly with sharp changes in the relationship between the predictor variables and the corresponding response variable. We propose a new nonparametric method for seemingly unrelated regression, which adopts a tree-structured regression framework, has satisfiable prediction accuracy and interpretability, no restriction on the inclusion of categorical variables, and is less vulnerable to the curse of dimensionality. Moreover, an important feature is constructing a unified tree-structured model for multivariate data, even though the predictor variables corresponding to the response variable are entirely different. This unified model can offer revelatory insights such as underlying economic meaning. We propose the key factors of tree-structured regression, which are an impurity function detecting complex nonlinear relationships between the predictor variables and the response variable, split rule selection with negligible selection bias, and tree size determination solving underfitting and overfitting problems. We demonstrate our proposed method using simulated data and illustrate it using data from the Korea stock exchange sector indices.  相似文献   

3.
In regression analysis, it is assumed that the response (or dependent variable) distribution is Normal, and errors are homoscedastic and uncorrelated. However, in practice, these assumptions are rarely satisfied by a real data set. To stabilize the heteroscedastic response variance, generally, log-transformation is suggested. Consequently, the response variable distribution approaches nearer to the Normal distribution. As a result, the model fit of the data is improved. Practically, a proper (seems to be suitable) transformation may not always stabilize the variance, and the response distribution may not reduce to Normal distribution. The present article assumes that the response distribution is log-normal with compound autocorrelated errors. Under these situations, estimation and testing of hypotheses regarding regression parameters have been derived. From a set of reduced data, we have derived the best linear unbiased estimators of all the regression coefficients, except the intercept which is often unimportant in practice. Unknown correlation parameters have been estimated. In this connection, we have derived a test rule for testing any set of linear hypotheses of the unknown regression coefficients. In addition, we have developed the confidence ellipsoids of a set of estimable functions of regression coefficients. For the fitted regression equation, an index of fit has been proposed. A simulated study illustrates the results derived in this report.  相似文献   

4.
The suitability of a normal linear regression model may require transformation of the original response, and transformation diagnostics are designed to detect the need for such transformation. A common approach to transformation diagnostics is to construct an artificial explanatory variable, which is then tested in the augmented linear regression model for the original response. This paper describes corresponding diagnostics based directly on score statistics with accurate approximations for their standard errors. Several transformation models are covered. Some numerical illustrations are given.  相似文献   

5.
In this paper, we investigate a mixture problem with two responses, which are functions of the mixing proportions, and are correlated with known dispersion matrix. We obtain D- and A-optimal designs for estimating the parameters of the response functions, when none or some of the regression coefficients of the two functions are the same. It is shown that when no prior knowledge about the regression coefficients is available, the D-optimal design is independent of the dispersion matrix, while the A-optimal design depends on it, provided the response functions are of different degree. On the other hand, when some of the regression coefficients are known to be the same for both the functions, the D-optimal design depends on the dispersion matrix when the two response functions are not of the same degree.  相似文献   

6.
Generalized linear models with random effects and/or serial dependence are commonly used to analyze longitudinal data. However, the computation and interpretation of marginal covariate effects can be difficult. This led Heagerty (1999, 2002) to propose models for longitudinal binary data in which a logistic regression is first used to explain the average marginal response. The model is then completed by introducing a conditional regression that allows for the longitudinal, within‐subject, dependence, either via random effects or regressing on previous responses. In this paper, the authors extend the work of Heagerty to handle multivariate longitudinal binary response data using a triple of regression models that directly model the marginal mean response while taking into account dependence across time and across responses. Markov Chain Monte Carlo methods are used for inference. Data from the Iowa Youth and Families Project are used to illustrate the methods.  相似文献   

7.
One advantage of quantile regression, relative to the ordinary least-square (OLS) regression, is that the quantile regression estimates are more robust against outliers and non-normal errors in the response measurements. However, the relative efficiency of the quantile regression estimator with respect to the OLS estimator can be arbitrarily small. To overcome this problem, composite quantile regression methods have been proposed in the literature which are resistant to heavy-tailed errors or outliers in the response and at the same time are more efficient than the traditional single quantile-based quantile regression method. This paper studies the composite quantile regression from a Bayesian perspective. The advantage of the Bayesian hierarchical framework is that the weight of each component in the composite model can be treated as open parameter and automatically estimated through Markov chain Monte Carlo sampling procedure. Moreover, the lasso regularization can be naturally incorporated into the model to perform variable selection. The performance of the proposed method over the single quantile-based method was demonstrated via extensive simulations and real data analysis.  相似文献   

8.
There have been a number of procedures used to analyze non-monotonic binary data to predict the probability of response. Some classical procedures are the Up and Down strategy, the Robbins–Monro procedure, and other sequential optimization designs. Recently, nonparametric procedures such as kernel regression and local linear regression (llogr) have been applied to this type of data. It is a well known fact that kernel regression has problems fitting the data near the boundaries and a drawback with local linear regression is that it may be “too linear” when fitting data from a curvilinear function. The procedure introduced in this paper is called local logistic regression, which fits a logistic regression function at each of the data points. An example is given using United States Army projectile data that supports the use of local logistic regression when analyzing non-monotonic binary data for certain response curves. Properties of local logistic regression will be presented along with simulation results that indicate some of the strengths of the procedure.  相似文献   

9.
In this article, we propose a semiparametric mixture of additive regression models, in which the regression functions are additive and non parametric while the mixing proportions and variances are constant. Compared with the mixture of linear regression models, the proposed methodology is more flexible in modeling the non linear relationship between the response and covariate. A two-step procedure based on the spline-backfitted kernel method is derived for computation. Moreover, we establish the asymptotic normality of the resultant estimators and examine their good performance through a numerical example.  相似文献   

10.
This paper considers linear and nonlinear regression with a response variable that is allowed to be “missing at random”. The only structural assumptions on the distribution of the variables are that the errors have mean zero and are independent of the covariates. The independence assumption is important. It enables us to construct an estimator for the response density that uses all the observed data, in contrast to the usual local smoothing techniques, and which therefore permits a faster rate of convergence. The idea is to write the response density as a convolution integral which can be estimated by an empirical version, with a weighted residual-based kernel estimator plugged in for the error density. For an appropriate class of regression functions, and a suitably chosen bandwidth, this estimator is consistent and converges with the optimal parametric rate n1/2. Moreover, the estimator is proved to be efficient (in the sense of Hájek and Le Cam) if an efficient estimator is used for the regression parameter.  相似文献   

11.
For loss equal to squared error of prediction, Kempthorne(l984) has proved that all variable-selection procedures are admissible for choosing among least-squares fits of a normal linear regression model. We extend this result to the case of a normal linear regression model in which the form of the expected response vector is misspecified.  相似文献   

12.
Methods for linear regression with multivariate response variables are well described in statistical literature. In this study we conduct a theoretical evaluation of the expected squared prediction error in bivariate linear regression where one of the response variables contains missing data. We make the assumption of known covariance structure for the error terms. On this basis, we evaluate three well-known estimators: standard ordinary least squares, generalized least squares, and a James–Stein inspired estimator. Theoretical risk functions are worked out for all three estimators to evaluate under which circumstances it is advantageous to take the error covariance structure into account.  相似文献   

13.
This paper investigates the estimations of regression parameters and response mean in nonlinear regression models in the presence of missing response variables that are missing with missingness probabilities depending on covariates. We propose four empirical likelihood (EL)-based estimators for the regression parameters and the response mean. The resulting estimators are shown to be consistent and asymptotically normal under some general assumptions. To construct the confidence regions for the regression parameters as well as the response mean, we develop four EL ratio statistics, which are proven to have the χ2 distribution asymptotically. Simulation studies and an artificial data set are used to illustrate the proposed methodologies. Empirical results show that the EL method behaves better than the normal approximation method and that the coverage probabilities and average lengths depend on the selection probability function.  相似文献   

14.
Variable selection problem is one of the most important tasks in regression analysis, especially in a high-dimensional setting. In this paper, we study this problem in the context of scalar response functional regression model, which is a linear model with scalar response and functional regressors. The functional model can be represented by certain multiple linear regression model via basis expansions of functional variables. Based on this model and random subspace method of Mielniczuk and Teisseyre (Comput Stat Data Anal 71:725–742, 2014), two simple variable selection procedures for scalar response functional regression model are proposed. The final functional model is selected by using generalized information criteria. Monte Carlo simulation studies conducted and a real data example show very satisfactory performance of new variable selection methods under finite samples. Moreover, they suggest that considered procedures outperform solutions found in the literature in terms of correctly selected model, false discovery rate control and prediction error.  相似文献   

15.
Many areas of statistical modeling are plagued by the “curse of dimensionality,” in which there are more variables than observations. This is especially true when developing functional regression models where the independent dataset is some type of spectral decomposition, such as data from near-infrared spectroscopy. While we could develop a very complex model by simply taking enough samples (such that n > p), this could prove impossible or prohibitively expensive. In addition, a regression model developed like this could turn out to be highly inefficient, as spectral data usually exhibit high multicollinearity. In this article, we propose a two-part algorithm for selecting an effective and efficient functional regression model. Our algorithm begins by evaluating a subset of discrete wavelet transformations, allowing for variation in both wavelet and filter number. Next, we perform an intermediate processing step to remove variables with low correlation to the response data. Finally, we use the genetic algorithm to perform a stochastic search through the subset regression model space, driven by an information-theoretic objective function. We allow our algorithm to develop the regression model for each response variable independently, so as to optimally model each variable. We demonstrate our method on the familiar biscuit dough dataset, which has been used in a similar context by several researchers. Our results demonstrate both the flexibility and the power of our algorithm. For each response variable, a different subset model is selected, and different wavelet transformations are used. The models developed by our algorithm show an improvement, as measured by lower mean error, over results in the published literature.  相似文献   

16.
In the common linear model with quantitative predictors we consider the problem of designing experiments for estimating the slope of the expected response in a regression. We discuss locally optimal designs, where the experimenter is only interested in the slope at a particular point, and standardized minimax optimal designs, which could be used if precise estimation of the slope over a given region is required. General results on the number of support points of locally optimal designs are derived if the regression functions form a Chebyshev system. For polynomial regression and Fourier regression models of arbitrary degree the optimal designs for estimating the slope of the regression are determined explicitly for many cases of practical interest.  相似文献   

17.
Beta Regression for Modelling Rates and Proportions   总被引:9,自引:0,他引:9  
This paper proposes a regression model where the response is beta distributed using a parameterization of the beta law that is indexed by mean and dispersion parameters. The proposed model is useful for situations where the variable of interest is continuous and restricted to the interval (0, 1) and is related to other variables through a regression structure. The regression parameters of the beta regression model are interpretable in terms of the mean of the response and, when the logit link is used, of an odds ratio, unlike the parameters of a linear regression that employs a transformed response. Estimation is performed by maximum likelihood. We provide closed-form expressions for the score function, for Fisher's information matrix and its inverse. Hypothesis testing is performed using approximations obtained from the asymptotic normality of the maximum likelihood estimator. Some diagnostic measures are introduced. Finally, practical applications that employ real data are presented and discussed.  相似文献   

18.
In this article, we propose a new empirical likelihood method for linear regression analysis with a right censored response variable. The method is based on the synthetic data approach for censored linear regression analysis. A log-empirical likelihood ratio test statistic for the entire regression coefficients vector is developed and we show that it converges to a standard chi-squared distribution. The proposed method can also be used to make inferences about linear combinations of the regression coefficients. Moreover, the proposed empirical likelihood ratio provides a way to combine different normal equations derived from various synthetic response variables. Maximizing this empirical likelihood ratio yields a maximum empirical likelihood estimator which is asymptotically equivalent to the solution of the estimating equation that are optimal linear combination of the original normal equations. It improves the estimation efficiency. The method is illustrated by some Monte Carlo simulation studies as well as a real example.  相似文献   

19.
Results from classical linear regression regarding the effects of covariate adjustment, with respect to the issues of confounding, the precision with which an exposure effect can be estimated, and the efficiency of hypothesis tests for no treatment effect in randomized experiments, are often assumed to apply more generally to other types of regression models. In this paper results pertaining to several generalized linear models involving a dichotomous response variable are given, demonstrating that with respect to the issues of confounding and precision, for models having a linear or log link function the results of classical linear regression do generally apply, whereas for other models, including those having a logit, probit, log-log, complementary log-log, or generalized logistic link function, the results of classical linear regression do not always apply. It is also shown, however, that for any link function, covariate adjustment results in improved efficiency of hypothesis tests for no treatment effect in randomized experiments, and hence that the classical linear regression results regarding efficiency do apply for all models having a dichotomous response variable.  相似文献   

20.
ABSTRACT

M-estimation is a widely used technique for robust statistical inference. In this paper, we study robust partially functional linear regression model in which a scale response variable is explained by a function-valued variable and a finite number of real-valued variables. For the estimation of the regression parameters, which include the infinite dimensional function as well as the slope parameters for the real-valued variables, we use polynomial splines to approximate the slop parameter. The estimation procedure is easy to implement, and it is resistant to heavy-tailederrors or outliers in the response. The asymptotic properties of the proposed estimators are established. Finally, we assess the finite sample performance of the proposed method by Monte Carlo simulation studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号