首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Our goal is to find a regression technique that can be used in a small-sample situation with possible model misspecification. The development of a new bandwidth selector allows nonparametric regression (in conjunction with least squares) to be used in this small-sample problem, where nonparametric procedures have previously proven to be inadequate. Considered here are two new semiparametric (model-robust) regression techniques that combine parametric and nonparametric techniques when there is partial information present about the underlying model. A general overview is given of how typical concerns for bandwidth selection in nonparametric regression extend to the model-robust procedures. A new penalized PRESS criterion (with a graphical selection strategy for applications) is developed that overcomes these concerns and is able to maintain the beneficial mean squared error properties of the new model-robust methods. It is shown that this new selector outperforms standard and recently improved bandwidth selectors. Comparisons of the selectors are made via numerous generated data examples and a small simulation study.  相似文献   

2.
Goodness-of-fit evaluation of a parametric regression model is often done through hypothesis testing, where the fit of the model of interest is compared statistically to that obtained under a broader class of models. Nonparametric regression models are frequently used as the latter type of model, because of their flexibility and wide applicability. To date, this type of tests has generally been performed globally, by comparing the parametric and nonparametric fits over the whole range of the data. However, in some instances it might be of interest to test for deviations from the parametric model that are localized to a subset of the data. In this case, a global test will have low power and hence can miss important local deviations. Alternatively, a naive testing approach that discards all observations outside the local interval will suffer from reduced sample size and potential overfitting. We therefore propose a new local goodness-of-fit test for parametric regression models that can be applied to a subset of the data but relies on global model fits, and propose a bootstrap-based approach for obtaining the distribution of the test statistic. We compare the new approach with the global and the naive tests, both theoretically and through simulations, and illustrate its practical behavior in an application. We find that the local test has a better ability to detect local deviations than the other two tests.  相似文献   

3.
A semiparametric approach to model skewed/heteroscedastic regression data is discussed. We work with a semiparametric transform-both-sides regression model, which contains a parametric regression function and a nonparametric transformation. This model is adequate when the relationship between the median response and the explanatory variable has been specified by a theoretical result or a previous empirical study. The transform-both-sides model with a parametric transformation has been studied extensively and applied successfully to a number data sets. Allowing a nonparametric transformation function increases the flexibility of the model. In this article, we estimate the nonparametric transformation function by the conditional kernel density approach developed by Wang and Ruppert (1995), and then use a pseudo-maximum likelihood estimator to estimate the regression parameters. This estimate of the regression parameters has not been studied previously. In this article, the asymptotic distribution of this pseudo-MLE is derived. We also show that when σ, the standard deviation of the error, goes to zero (small σ asymptotics), this estimator is adaptive. Adaptive means that the regression parameters are estimated as precisely as when the transformation is known exactly. A similar result holds in the parametric approaches of Carroll and Ruppert (1984) and Ruppert and Aldershof (1989). Simulated and real examples are provided to illustrate the performance of the proposed estimator for finite sample size.  相似文献   

4.
A test is proposed for assessing the lack of fit of heteroscedastic nonlinear regression models that is based on comparison of nonparametric kernel and parametric fits. A data-driven method is proposed for bandwidth selection using the asymptotically optimal bandwidth of the parametric null model which leads to a test that has a limiting normal distribution under the null hypothesis and is consistent against any fixed alternative. The resulting test is applied to the problem of testing the lack of fit of a generalized linear model.  相似文献   

5.
Jing Yang  Fang Lu  Hu Yang 《Statistics》2017,51(6):1179-1199
In this paper, we develop a new estimation procedure based on quantile regression for semiparametric partially linear varying-coefficient models. The proposed estimation approach is empirically shown to be much more efficient than the popular least squares estimation method for non-normal error distributions, and almost not lose any efficiency for normal errors. Asymptotic normalities of the proposed estimators for both the parametric and nonparametric parts are established. To achieve sparsity when there exist irrelevant variables in the model, two variable selection procedures based on adaptive penalty are developed to select important parametric covariates as well as significant nonparametric functions. Moreover, both these two variable selection procedures are demonstrated to enjoy the oracle property under some regularity conditions. Some Monte Carlo simulations are conducted to assess the finite sample performance of the proposed estimators, and a real-data example is used to illustrate the application of the proposed methods.  相似文献   

6.
In this article, we propose a parametric model for the distribution of time to first event when events are overdispersed and can be properly fitted by a Negative Binomial distribution. This is a very common situation in medical statistics, when the occurrence of events is summarized as a count for each patient and the simple Poisson model is not adequate to account for overdispersion of data. In this situation, studying the time of occurrence of the first event can be of interest. From the Negative Binomial distribution of counts, we derive a new parametric model for time to first event and apply it to fit the distribution of time to first relapse in multiple sclerosis (MS). We develop the regression model with methods for covariate estimation. We show that, as the Negative Binomial model properly fits relapse counts data, this new model matches quite perfectly the distribution of time to first relapse, as tested in two large datasets of MS patients. Finally we compare its performance, when fitting time to first relapse in MS, with other models widely used in survival analysis (the semiparametric Cox model and the parametric exponential, Weibull, log-logistic and log-normal models).  相似文献   

7.
Summary. Standard goodness-of-fit tests for a parametric regression model against a series of nonparametric alternatives are based on residuals arising from a fitted model. When a parametric regression model is compared with a nonparametric model, goodness-of-fit testing can be naturally approached by evaluating the likelihood of the parametric model within a nonparametric framework. We employ the empirical likelihood for an α -mixing process to formulate a test statistic that measures the goodness of fit of a parametric regression model. The technique is based on a comparison with kernel smoothing estimators. The empirical likelihood formulation of the test has two attractive features. One is its automatic consideration of the variation that is associated with the nonparametric fit due to empirical likelihood's ability to Studentize internally. The other is that the asymptotic distribution of the test statistic is free of unknown parameters, avoiding plug-in estimation. We apply the test to a discretized diffusion model which has recently been considered in financial market analysis.  相似文献   

8.
In many applications, it is of interest to simultaneously model the mean and variance of a response when no replication exists. Modeling the mean and variance simultaneously is commonly referred to as dual modeling. Parametric approaches to dual modeling are popular when the underlying mean and variance functions can be expressed explicitly. Quite often, however, nonparametric approaches are more appropriate due to the presence of unusual curvature in the underlying functions. In sparse data situations, nonparametric methods often fit the data too closely while parametric estimates exhibit problems with bias. We propose a semi-parametric dual modeling approach [Dual Model Robust Regression (DMRR)] for non-replicated data. DMRR combines parametric and nonparametric fits resulting in improved mean and variance estimation. The methodology is illustrated with a data set from the literature as well as via a simulation study.  相似文献   

9.
This article is concerned with one discrete nonparametric kernel and two parametric regression approaches for providing the evolution law of pavement deterioration. The first parametric approach is a survival data analysis method; and the second is a nonlinear mixed-effects model. The nonparametric approach consists of a regression estimator using the discrete associated kernels. Some asymptotic properties of the discrete nonparametric kernel estimator are shown as, in particular, its almost sure consistency. Moreover, two data-driven bandwidth selection methods are also given, with a new theoretical explicit expression of optimal bandwidth provided for this nonparametric estimator. A comparative simulation study is realized with an application of bootstrap methods to a measure of statistical accuracy.  相似文献   

10.
The generalized linear model (GLM) is a class of regression models where the means of the response variables and the linear predictors are joined through a link function. Standard GLM assumes the link function is fixed, and one can form more flexible GLM by either estimating the flexible link function from a parametric family of link functions or estimating it nonparametically. In this paper, we propose a new algorithm that uses P-spline for nonparametrically estimating the link function which is guaranteed to be monotone. It is equivalent to fit the generalized single index model with monotonicity constraint. We also conduct extensive simulation studies to compare our nonparametric approach for estimating link function with various parametric approaches, including traditional logit, probit and robit link functions, and two recently developed link functions, the generalized extreme value link and the symmetric power logit link. The simulation study shows that the link function estimated nonparametrically by our proposed algorithm performs well under a wide range of different true link functions and outperforms parametric approaches when they are misspecified. A real data example is used to illustrate the results.  相似文献   

11.
Data on the Likert scale are ubiquitous in medical research, including randomized trials. Statistical analysis of such data may be conducted using the means of raw scores or the rank information of the scores. In the context of parallel-group randomized trials, we quantify treatment effects by the probability that a subject in the treatment group has a better score than (or a win over) a subject in the control group. Asymptotic parametric and nonparametric confidence intervals for this win probability and associated sample size formulas are derived for studies with only follow-up scores, and those with both baseline and follow-up measurements. We assessed the performance of both the parametric and nonparametric approaches using simulation studies based on real studies with Likert item and Likert scale data. The simulation results demonstrate that even without baseline adjustment, the parametric methods did not perform well, in terms of bias, interval coverage percentage, balance of tail error, and assurance of achieving a pre-specified precision. In contrast, the nonparametric approach performed very well for both the unadjusted and adjusted win probability. We illustrate the methods with two examples: one using Likert item data and the other using Like scale data. We conclude that non-parametric methods are preferable for two-group randomization trials with Likert data. Illustrative SAS code for the nonparametric approach using existing procedures is provided.  相似文献   

12.
Mixed models are powerful tools for the analysis of clustered data and many extensions of the classical linear mixed model with normally distributed response have been established. As with all parametric (P) models, correctness of the assumed model is critical for the validity of the ensuing inference. An incorrectly specified P means model may be improved by using a local, or nonparametric (NP), model. Two local models are proposed by a pointwise weighting of the marginal and conditional variance–covariance matrices. However, NP models tend to fit to irregularities in the data and may provide fits with high variance. Model robust regression techniques estimate mean response as a convex combination of a P and a NP model fit to the data. It is a semiparametric method by which incomplete or incorrectly specified P models can be improved by adding an appropriate amount of the NP fit. We compare the approximate integrated mean square error of the P, NP, and mixed model robust methods via a simulation study and apply these methods to two real data sets: the monthly wind speed data from countries in Ireland and the engine speed data.  相似文献   

13.
Bayesian semiparametric inference is considered for a loglinear model. This model consists of a parametric component for the regression coefficients and a nonparametric component for the unknown error distribution. Bayesian analysis is studied for the case of a parametric prior on the regression coefficients and a mixture-of-Dirichlet-processes prior on the unknown error distribution. A Markov-chain Monte Carlo (MCMC) method is developed to compute the features of the posterior distribution. A model selection method for obtaining a more parsimonious set of predictors is studied. The method adds indicator variables to the regression equation. The set of indicator variables represents all the possible subsets to be considered. A MCMC method is developed to search stochastically for the best subset. These procedures are applied to two examples, one with censored data.  相似文献   

14.
Herein, we propose a data-driven test that assesses the lack of fit of nonlinear regression models. The comparison of local linear kernel and parametric fits is the basis of this test, and specific boundary-corrected kernels are not needed at the boundary when local linear fitting is used. Under the parametric null model, the asymptotically optimal bandwidth can be used for bandwidth selection. This selection method leads to the data-driven test that has a limiting normal distribution under the null hypothesis and is consistent against any fixed alternative. The finite-sample property of the proposed data-driven test is illustrated, and the power of the test is compared with that of some existing tests via simulation studies. We illustrate the practicality of the proposed test by using two data sets.  相似文献   

15.
Inverse regression estimation for censored data   总被引:1,自引:0,他引:1  
An inverse regression methodology for assessing predictor performance in the censored data setup is developed along with inference procedures and a computational algorithm. The technique developed here allows for conditioning on the unobserved failure time along with a weighting mechanism that accounts for the censoring. The implementation is nonparametric and computationally fast. This provides an efficient methodological tool that can be used especially in cases where the usual modeling assumptions are not applicable to the data under consideration. It can also be a good diagnostic tool that can be used in the model selection process. We have provided theoretical justification of consistency and asymptotic normality of the methodology. Simulation studies and two data analyses are provided to illustrate the practical utility of the procedure.  相似文献   

16.
Since the mid 1980's many statisticians have studied methods for combining parametric and nonparametric models to improve the quality of fits in a regression problem. Notably Einsporn (1987) proposed the Model Robust Regression 1 estimate (MRRl) in which the parametric function, f, and the nonparametric functiong were combined in a straightforward fashion via the use of a mixing parameter, λ This technique was studied extensively atsmall samples and was shown to be quite effective at modeling various unusual functions. In this paper we have asymptotic results for the MRRl estimate in the case where λ is theoretically optimal, is asymptotically optimal and data driven, and is chosen with the PRESS statistic (Allen, 1971) We demonstrate that the MRRl estimate with λchosen by the PRESS statistic is slightly inferior asymptotically to the other two estimates, but, nevertheless possesses positive asymptotic qualities.  相似文献   

17.
Additive models provide an attractive setup to estimate regression functions in a nonparametric context. They provide a flexible and interpretable model, where each regression function depends only on a single explanatory variable and can be estimated at an optimal univariate rate. Most estimation procedures for these models are highly sensitive to the presence of even a small proportion of outliers in the data. In this paper, we show that a relatively simple robust version of the backfitting algorithm (consisting of using robust local polynomial smoothers) corresponds to the solution of a well-defined optimisation problem. This formulation allows us to find mild conditions to show Fisher consistency and to study the convergence of the algorithm. Our numerical experiments show that the resulting estimators have good robustness and efficiency properties. We illustrate the use of these estimators on a real data set where the robust fit reveals the presence of influential outliers.  相似文献   

18.
Partially linear models are extensions of linear models that include a nonparametric function of some covariate allowing an adequate and more flexible handling of explanatory variables than in linear models. The difference-based estimation in partially linear models is an approach designed to estimate parametric component by using the ordinary least squares estimator after removing the nonparametric component from the model by differencing. However, it is known that least squares estimates do not provide useful information for the majority of data when the error distribution is not normal, particularly when the errors are heavy-tailed and when outliers are present in the dataset. This paper aims to find an outlier-resistant fit that represents the information in the majority of the data by robustly estimating the parametric and the nonparametric components of the partially linear model. Simulations and a real data example are used to illustrate the feasibility of the proposed methodology and to compare it with the classical difference-based estimator when outliers exist.  相似文献   

19.
In this article, a new composite quantile regression estimation approach is proposed for estimating the parametric part of single-index model. We use local linear composite quantile regression (CQR) for estimating the nonparametric part of single-index model (SIM) when the error distribution is symmetrical. The weighted local linear CQR is proposed for estimating the nonparametric part of SIM when the error distribution is asymmetrical. Moreover, a new variable selection procedure is proposed for SIM. Under some regularity conditions, we establish the large sample properties of the proposed estimators. Simulation studies and a real data analysis are presented to illustrate the behavior of the proposed estimators.  相似文献   

20.
ABSTRACT

We propose a new semiparametric Weibull cure rate model for fitting nonlinear effects of explanatory variables on the mean, scale and cure rate parameters. The regression model is based on the generalized additive models for location, scale and shape, for which any or all distribution parameters can be modeled as parametric linear and/or nonparametric smooth functions of explanatory variables. We present methods to select additive terms, model estimation and validation, where all computational codes are presented in a simple way such that any R user can fit the new model. Biases of the parameter estimates caused by models specified erroneously are investigated through Monte Carlo simulations. We illustrate the usefulness of the new model by means of two applications to real data. We provide computational codes to fit the new regression model in the R software.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号