期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Detecting discontinuities in nonparametric regression curves and surfaces

A. W. Bowman A. Pope B. Ismail 《Statistics and Computing》2006,16(4):377-390

The existence of a discontinuity in a regression function can be inferred by comparing regression estimates based on the data lying on different sides of a point of interest. This idea has been used in earlier research by Hall and Titterington (1992), Müller (1992) and later authors. The use of nonparametric regression allows this to be done without assuming linear or other parametric forms for the continuous part of the underlying regression function. The focus of the present paper is on assessing the evidence for the presence of a discontinuity within a regression function through examination of the standardised differences of ‘left’ and ‘right’ estimators at a variety of covariate values. The calculations for the test are carried out through distributional results on quadratic forms. A graphical method in the form of a reference band to highlight the sources of the evidence for discontinuities is proposed. The methods are also developed for the two covariate case where there are additional issues associated with the presence of a jump location curve. Methods for estimating this curve are also developed. All the techniques, for the one and two covariate situations, are illustrated through applications. 相似文献

2.

Local linear regression with adaptive orthogonal fitting for the wind power application

Pierre Pinson Henrik Aa. Nielsen Henrik Madsen Torben S. Nielsen 《Statistics and Computing》2008,18(1):59-71

Short-term forecasting of wind generation requires a model of the function for the conversion of meteorological variables (mainly wind speed) to power production. Such a power curve is nonlinear and bounded, in addition to being nonstationary. Local linear regression is an appealing nonparametric approach for power curve estimation, for which the model coefficients can be tracked with recursive Least Squares (LS) methods. This may lead to an inaccurate estimate of the true power curve, owing to the assumption that a noise component is present on the response variable axis only. Therefore, this assumption is relaxed here, by describing a local linear regression with orthogonal fit. Local linear coefficients are defined as those which minimize a weighted Total Least Squares (TLS) criterion. An adaptive estimation method is introduced in order to accommodate nonstationarity. This has the additional benefit of lowering the computational costs of updating local coefficients every time new observations become available. The estimation method is based on tracking the left-most eigenvector of the augmented covariance matrix. A robustification of the estimation method is also proposed. Simulations on semi-artificial datasets (for which the true power curve is available) underline the properties of the proposed regression and related estimation methods. An important result is the significantly higher ability of local polynomial regression with orthogonal fit to accurately approximate the target regression, even though it may hardly be visible when calculating error criteria against corrupted data. 相似文献

3.

Generalized linear models with functional predictors

Gareth M. James 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2002,64(3):411-432

Summary. We present a technique for extending generalized linear models to the situation where some of the predictor variables are observations from a curve or function. The technique is particularly useful when only fragments of each curve have been observed. We demonstrate, on both simulated and real data sets, how this approach can be used to perform linear, logistic and censored regression with functional predictors. In addition, we show how functional principal components can be used to gain insight into the relationship between the response and functional predictors. Finally, we extend the methodology to apply generalized linear models and principal components to standard missing data problems. 相似文献

4.

PERFORMANCE OF THE LOCATION LINEAR DISCRIMINANT FUNCTION UNDER ACROSS-LOCATION HETEROSCEDASTICITY

《统计学通讯:理论与方法》2013,42(6):1031-1044

ABSTRACT

Classification of data consisting of both categorical and continuous variables between two groups is often handled by the sample location linear discriminant function confined to each of the locations specified by the observed values of the categorical variables. Homoscedasticity of across-location conditional dispersion matrices of the continuous variables is often assumed. Quite often, interactions between continuous and categorical variables cause across-location heteroscedasticity. In this article, we examine the effect of heterogeneous across-location conditional dispersion matrices on the overall expected and actual error rates associated with the sample location linear discriminant function. Performance of the sample location linear discriminant function is evaluated against the results for the restrictive classifier adjusted for across-location heteroscedasticity. Conclusions based on a Monte Carlo study are reported. 相似文献

5.

我国科技创新体系产出机制的门槛效应研究

谢兰云王维国《统计研究》2016,33(2):51-60

实现科技创新目标,采取有针对性科技发展政策的前提是对我国科技创新体系的特点有充分的认识。本文基于门槛回归模型,以R&D强度和R&D经费中企业经费所占比例作为门槛变量,对我国1985-2013年科技创新投入产出之间的关系进行了研究,结果显示在样本期内我国科技创新体系产出机制发生了转变,它们之间存在着两个门槛,R&D强度的门槛值分别为0.661%和1.325%,R&D经费中企业经费所占比例的门槛值分别为29%和67%。两个门槛变量回归的结果是一致的,即在不考虑经济发展水平这个控制变量的前提下,我国专利产出的增长都经历了从以R&D人员增长为主要影响因素到以R&D资本增长为主要影响因素的转变过程。在这个机制转变的过程中,我国科技体制改革和科教兴国战略发挥了重要作用。在此基础上,本文建立非线性模型描述它们之间的关系,结果与门槛回归模型的结果一致。相似文献

6.

Total least squares solution for compositional data using linear models

Eva Fišerová 《Journal of applied statistics》2010,37(7):1137-1152

The restrictive properties of compositional data, that is multivariate data with positive parts that carry only relative information in their components, call for special care to be taken while performing standard statistical methods, for example, regression analysis. Among the special methods suitable for handling this problem is the total least squares procedure (TLS, orthogonal regression, regression with errors in variables, calibration problem), performed after an appropriate log-ratio transformation. The difficulty or even impossibility of deeper statistical analysis (confidence regions, hypotheses testing) using the standard TLS techniques can be overcome by calibration solution based on linear regression. This approach can be combined with standard statistical inference, for example, confidence and prediction regions and bounds, hypotheses testing, etc., suitable for interpretation of results. Here, we deal with the simplest TLS problem where we assume a linear relationship between two errorless measurements of the same object (substance, quantity). We propose an iterative algorithm for estimating the calibration line and also give confidence ellipses for the location of unknown errorless results of measurement. Moreover, illustrative examples from the fields of geology, geochemistry and medicine are included. It is shown that the iterative algorithm converges to the same values as those obtained using the standard TLS techniques. Fitted lines and confidence regions are presented for both original and transformed compositional data. The paper contains basic principles of linear models and addresses many related problems. 相似文献

7.

A new semiparametric Weibull cure rate model: fitting different behaviors within GAMLSS

Thiago G. Ramires Luiz R. Nakamura Ana J. Righetto Rodrigo R. Pescim Josmar Mazucheli Gauss M. Cordeiro 《Journal of applied statistics》2019,46(15):2744-2760

ABSTRACT

We propose a new semiparametric Weibull cure rate model for fitting nonlinear effects of explanatory variables on the mean, scale and cure rate parameters. The regression model is based on the generalized additive models for location, scale and shape, for which any or all distribution parameters can be modeled as parametric linear and/or nonparametric smooth functions of explanatory variables. We present methods to select additive terms, model estimation and validation, where all computational codes are presented in a simple way such that any R user can fit the new model. Biases of the parameter estimates caused by models specified erroneously are investigated through Monte Carlo simulations. We illustrate the usefulness of the new model by means of two applications to real data. We provide computational codes to fit the new regression model in the R software. 相似文献

8.

A flexible semiparametric regression model for bimodal,asymmetric and censored data

Thiago G. Ramires Niel Hens Gauss M. Cordeiro Gilberto A. Paula 《Journal of applied statistics》2018,45(7):1303-1324

In this paper, we propose a new semiparametric heteroscedastic regression model allowing for positive and negative skewness and bimodal shapes using the B-spline basis for nonlinear effects. The proposed distribution is based on the generalized additive models for location, scale and shape framework in order to model any or all parameters of the distribution using parametric linear and/or nonparametric smooth functions of explanatory variables. We motivate the new model by means of Monte Carlo simulations, thus ignoring the skewness and bimodality of the random errors in semiparametric regression models, which may introduce biases on the parameter estimates and/or on the estimation of the associated variability measures. An iterative estimation process and some diagnostic methods are investigated. Applications to two real data sets are presented and the method is compared to the usual regression methods. 相似文献

9.

线性回归模型设定的两个常见错误分析

刘明《统计与信息论坛》2011,26(8):11-14

删除截距项和遗漏解释变量是线性回归模型估计中的两个常见错误,删除截距项错误发生的原因是检验过程中发现其不显著而将其剔除,这会造成模型参数估计和假设检验的失真;遗漏解释变量的错误发生原因是人们错误认为只要变量存在相关性且存在因果联系就可以进行回归分析,以至于不考虑其它重要的解释变量,此时建立的模型不能用于经济结构分析和政策评价,最多只能用于预测目的。相似文献

10.

SEGMENTED DOSE-RESPONSE MODELS FOR REPEATED MEASURES DATA

《统计学通讯:理论与方法》2013,42(10):2045-2056

In a dose-response analysis, logit-transformed responses are modelled as a function of log-transformed doses. The linear trend is commonly observed. The comparison among treatment groups can be made based on the linear trend. An example in this paper came from a study to estimate the effect of aminophylline on dose-response curve of atracurium. Unlike the usual dose-response curve, this example has repeated measures and seems to have two slopes to which the usual dose-response model is not adequate to fit. We propose segmented regression models that allow two different slopes. The proposed model is an extension of the segmented regression model with a univariate response per subject. We illustrate the proposed model fits data better than the usual dose-response model. 相似文献

11.

A Basic Demonstration of the [-1, 1] Range for the Correlation Coefficient

Gary G. Koch 《The American statistician》2013,67(3):201-202

The structure of the variance of linear functions of two variables is used to show that the correlation coefficient lies in the range [-1, 1]. It also allows the role of the correlation coefficient in linear regression to be described. 相似文献

12.

Classification of dichotomous and continuous variables with incomplete samples

Chi-Ying Leung 《统计学通讯:理论与方法》2013,42(6):1581-1592

This Article Considers the problem of classifiying an observation consisting of both binary and continuous variables based on two general incomplete training samples one from each of the two given populations. The location linear model adopted by krzanowski 1975 forms the basis of our inverstigation. For a given location, When the common dispersion matrix as Well as the corresponding cell probabilities for the underlying populations are known, exact distribution of the conditional maximum likelihood classification rule is derived. The overall error rate can be obtained and is based on linear cominations of independent non– Chi– Distributions. large sample result for the case where the cell probabilities are unknown is also available. 相似文献

13.

TWO-STAGE SUPPORT ESTIMATION

Samuel Müller 《Australian & New Zealand Journal of Statistics》2005,47(4):463-472

This paper presents a two‐stage procedure for estimating the conditional support curve of a random variable X, given the information of a random vector X. Quantile estimation is followed by an extremal analysis on the residuals for problems which can be written as regression models. The technique is applied to data from the National Bureau of Economic Research and US Census Bureau's Center for Economic Studies which contain all four‐digit manufacturing industries. Simulation results show that in linear regression models the proposed estimation procedure is more efficient than the extreme linear regression quantile. 相似文献

14.

Nonlinearity testing and modeling for threshold moving average models

Rubing Liang Cuizhen Niu Zhiqiang Zhang 《Journal of applied statistics》2015,42(12):2614-2630

In this paper, we suggest a simple test and an easily applicable modeling procedure for threshold moving average (TMA) models. Firstly, based on the fitted residuals by maximum likelihood estimate (MLE) for MA models, we construct a simple statistic, which is obtained by linear arrange regression and follows F-distribution approximately, to test for threshold nonlinearity and specify the threshold variables. And then, we use some scatterplots to identify the number and locations of the potential thresholds. Finally, with the statistic and Akaike information criterion, we propose the procedure to build TMA models. Both the power of test statistic and the convenience of modeling procedure can work very well demonstrated by simulation experiments and the application to a real example. 相似文献

15.

Pairwise directions estimation for multivariate response regression data

Heng-Hui Lue 《Journal of Statistical Computation and Simulation》2019,89(5):776-794

This article concerns the analysis of multivariate response data with multi-dimensional covariates. Based on local linear smoothing techniques, we propose an iteratively adaptive estimation method to reduce the dimensions of response variables and covariates. Two weighted estimation strategies are incorporated in our approach to provide initial estimates. Our proposal is also extended to curve response data for a data-adaptive basis function searching. Instead of focusing on goodness of fit, we shift the problem to reveal the data structure and basis patterns. Simulation studies with multivariate response and curve data are conducted for our pairwise directions estimation (PDE) approach in comparison with sliced inverse regression of Li et al. [Dimension reduction for multivariate response data. J Amer Statist Assoc. 2003;98:99–109]. The results demonstrate that the proposed PDE method is useful for data with responses approximating linear or bending structures. Illustrative applications to two real datasets are also presented. 相似文献

16.

On discrimination procedure with mixtures of continuous and categorical variables

Gafar Matanmi Oyeyemi George Chinanu Mbaeyi Saheed Ishola Salawu Bernard Olagboyega Muse 《Journal of applied statistics》2016,43(10):1864-1873

A discrimination procedure, based on the location model is described and suggested for use in situation where the discriminating variables are mixtures of continuous and binary variables. Some procedures that have been previously employed, in a similar situation, like Fisher's linear discriminant function and the logistic regression were compared with this method using error rate (ER). Optimal ERs for these procedures are reported using real and simulated data for the case of varying sample size and number of continuous and binary variables and were used as a measure for assessing the performance of the various procedures. The suggested procedure performed considerably better in the cases considered and never did produce a result that is poor when compared with other procedures. Hence, the suggested procedure might be considered for such situations. 相似文献

17.

On Directional Dependence in a Regression Line

《统计学通讯:理论与方法》2013,42(10):2053-2057

Abstract

In this paper, under the assumption of linear relationship between two variables we provide alternative simple method of proving the existing result connecting correlation coefficient with those of skewness of response and explanatory variables. Further we have given a relationship between correlation coefficient and coefficient of kurtosis of response and explanatory variables assuming the linear relationship between the two variables. Simple alternative way of deriving the formula, which helps in finding the direction dependence in linear regression, is discussed. 相似文献

18.

Some Limits in the Theory of Multicollinearity

M.V. Rama Sastry 《The American statistician》2013,67(1):39-40

The name “multicollinearity” was first introduced by Ragnar Frisch [2]. In his original formulation the economic variables are supposed to be composed of two parts, a systematic or “true” and an “error” component. There are at least two other cases when the same type of indeterminancy of the estimates arises due to different reasons. Considerable attention was given to this problem which arises when some or all the variables in a regression equation are highly inter-correlated and it becomes almost impossible to separate their influences and obtain the corresponding estimates of the regression coefficients. Consider a linear regression model 相似文献

19.

模糊线性回归模型的最小二乘方法

卢佩陆秋君《统计与信息论坛》2016,(2):14-20

针对自变量和因变量皆模糊的数据系统中的回归分析问题,为避免自变量退化成数值变量时可能引致的估计误差增大而带来的问题,提出系统中引入模糊调整项的回归模型的一般结构,并运用基于模糊数间完备距离的最小二乘法研究模型解析表达式;利用水平截集概念将模糊多元回归模型转化成两个传统回归模型,根据模糊数间距离采用最小二乘法得到参数估计,给出员工工作绩效评估的算例说明方法的有效性,并结合Bootstrap方法的应用,研究回归参数所具有的随机不确定性动态变化。相似文献

20.

A model for markers and latent health status

Mel-Ling Ting Lee Victor DeGruttola & David Schoenfeld 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2000,62(4):747-762

We extend the bivariate Wiener process considered by Whitmore and co-workers and model the joint process of a marker and health status. The health status process is assumed to be latent or unobservable. The time to reach the primary end point or failure (death, onset of disease, etc.) is the time when the latent health status process first crosses a failure threshold level. Inferences for the model are based on two kinds of data: censored survival data and marker measurements. Covariates, such as treatment variables, risk factors and base-line conditions, are related to the model parameters through generalized linear regression functions. The model offers a much richer potential for the study of treatment efficacy than do conventional models. Treatment effects can be assessed in terms of their influence on both the failure threshold and the health status process parameters. We derive an explicit formula for the prediction of residual failure times given the current marker level. Also we discuss model validation. This model does not require the proportional hazards assumption and hence can be widely used. To demonstrate the usefulness of the model, we apply the methods in analysing data from the protocol 116a of the AIDS Clinical Trials Group. 相似文献