首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Abstract

In this article, we study the variable selection and estimation for linear regression models with missing covariates. The proposed estimation method is almost as efficient as the popular least-squares-based estimation method for normal random errors and empirically shown to be much more efficient and robust with respect to heavy tailed errors or outliers in the responses and covariates. To achieve sparsity, a variable selection procedure based on SCAD is proposed to conduct estimation and variable selection simultaneously. The procedure is shown to possess the oracle property. To deal with the covariates missing, we consider the inverse probability weighted estimators for the linear model when the selection probability is known or unknown. It is shown that the estimator by using estimated selection probability has a smaller asymptotic variance than that with true selection probability, thus is more efficient. Therefore, the important Horvitz-Thompson property is verified for penalized rank estimator with the covariates missing in the linear model. Some numerical examples are provided to demonstrate the performance of the estimators.  相似文献   

2.
Consider a regression model where the regression function is the sum of a linear and a nonparametric component. Assuming that the errors of the model follow a stationary strong mixing process with mean zero, the problem of bandwidth selection for a kernel estimator of the nonparametric component is addressed here. We obtain an asymptotic expression for an optimal band-width and we propose to use a plug-in methodology in order to estimate this bandwidth through preliminary estimates of the unknown quantities. Asymptotic optimality for the plug-in bandwidth is established.  相似文献   

3.
In this paper, we introduce a partially linear single-index additive hazards model with current status data. Both the unknown link function of the single-index term and the cumulative baseline hazard function are approximated by B-splines under a monotonicity constraint on the latter. The sieve method is applied to estimate the nonparametric and parametric components simultaneously. We show that, when the nonparametric link function is an exact B-spline, the resultant estimator of regression parameter vector is asymptotically normal and achieves the semiparametric information bound and the rate of convergence of the estimator for the cumulative baseline hazard function is optimal. Simulation studies are presented to examine the finite sample performance of the proposed estimation method. For illustration, we apply the method to a clinical dataset with current status outcome.  相似文献   

4.
We consider semiparametric additive regression models with a linear parametric part and a nonparametric part, both involving multivariate covariates. For the nonparametric part we assume two models. In the first, the regression function is unspecified and smooth; in the second, the regression function is additive with smooth components. Depending on the model, the regression curve is estimated by suitable least squares methods. The resulting residual-based empirical distribution function is shown to differ from the error-based empirical distribution function by an additive expression, up to a uniformly negligible remainder term. This result implies a functional central limit theorem for the residual-based empirical distribution function. It is used to test for normal errors.  相似文献   

5.
This article is concerned with the estimation problem in the semiparametric isotonic regression model when the covariates are measured with additive errors and the response is missing at random. An inverse marginal probability weighted imputation approach is developed to estimate the regression parameters and a least-square approach under monotone constraint is employed to estimate the functional component. We show that the proposed estimator of the regression parameter is root-n consistent and asymptotically normal and the isotonic estimator of the functional component, at a fixed point, is cubic root-n consistent. A simulation study is conducted to examine the finite-sample properties of the proposed estimators. A data set is used to demonstrate the proposed approach.  相似文献   

6.
In this paper, a generalized partially linear model (GPLM) with missing covariates is studied and a Monte Carlo EM (MCEM) algorithm with penalized-spline (P-spline) technique is developed to estimate the regression coefficients and nonparametric function, respectively. As classical model selection procedures such as Akaike's information criterion become invalid for our considered models with incomplete data, some new model selection criterions for GPLMs with missing covariates are proposed under two different missingness mechanism, say, missing at random (MAR) and missing not at random (MNAR). The most attractive point of our method is that it is rather general and can be extended to various situations with missing observations based on EM algorithm, especially when no missing data involved, our new model selection criterions are reduced to classical AIC. Therefore, we can not only compare models with missing observations under MAR/MNAR settings, but also can compare missing data models with complete-data models simultaneously. Theoretical properties of the proposed estimator, including consistency of the model selection criterions are investigated. A simulation study and a real example are used to illustrate the proposed methodology.  相似文献   

7.
Abstract. Similar to variable selection in the linear model, selecting significant components in the additive model is of great interest. However, such components are unknown, unobservable functions of independent variables. Some approximation is needed. We suggest a combination of penalized regression spline approximation and group variable selection, called the group‐bridge‐type spline method (GBSM), to handle this component selection problem with a diverging number of correlated variables in each group. The proposed method can select significant components and estimate non‐parametric additive function components simultaneously. To make the GBSM stable in computation and adaptive to the level of smoothness of the component functions, weighted power spline bases and projected weighted power spline bases are proposed. Their performance is examined by simulation studies. The proposed method is extended to a partial linear regression model analysis with real data, and gives reliable results.  相似文献   

8.
We consider the nonparametric estimation of the regression functions for dependent data. Suppose that the covariates are observed with additive errors in the data and we employ nonparametric deconvolution kernel techniques to estimate the regression functions in this paper. We investigate how the strength of time dependence affects the asymptotic properties of the local constant and linear estimators. We treat both short-range dependent and long-range dependent linear processes in a unified way and demonstrate that the long-range dependence (LRD) of the covariates affects the asymptotic properties of the nonparametric estimators as well as the LRD of regression errors does.  相似文献   

9.
A growth curve analysis is often applied to estimate patterns of changes in a given characteristic of different individuals. It is also used to find out if the variations in the growth rates among individuals are due to effects of certain covariates. In this paper, a random coefficient linear regression model, as a special case of the growth curve analysis, is generalized to accommodate the situation where the set of influential covariates is not known a priori. Two different approaches for seleaing influential covariates (a weighted stepwise selection procedure and a modified version of Rao and Wu’s selection criterion) for the random slope coefficient of a linear regression model with unbalanced data are proposed. Performances of these methods are evaluated by means of Monte-Carlo simulation. In addition, several methods (Maximum Likelihood, Restricted Maximum Likelihood, Pseudo Maximum Likelihood and Method of Moments) for estimating the parameters of the selected model are compared Proposed variable selection schemes and estimators are appliedtotheactualindustrial problem which motivated this investigation.  相似文献   

10.
This paper proposes Bayesian nonparametric mixing for some well-known and popular models. The distribution of the observations is assumed to contain an unknown mixed effects term which includes a fixed effects term, a function of the observed covariates, and an additive or multiplicative random effects term. Typically these random effects are assumed to be independent of the observed covariates and independent and identically distributed from a distribution from some known parametric family. This assumption may be suspect if either there is interaction between observed covariates and unobserved covariates or the fixed effects predictor of observed covariates is misspecified. Another cause for concern might be simply that the covariates affect more than just the location of the mixed effects distribution. As a consequence the distribution of the random effects could be highly irregular in modality and skewness leaving parametric families unable to model the distribution adequately. This paper therefore proposes a Bayesian nonparametric prior for the random effects to capture possible deviances in modality and skewness and to explore the observed covariates' effect on the distribution of the mixed effects.  相似文献   

11.
The authors define a class of “partially linear single‐index” survival models that are more flexible than the classical proportional hazards regression models in their treatment of covariates. The latter enter the proposed model either via a parametric linear form or a nonparametric single‐index form. It is then possible to model both linear and functional effects of covariates on the logarithm of the hazard function and if necessary, to reduce the dimensionality of multiple covariates via the single‐index component. The partially linear hazards model and the single‐index hazards model are special cases of the proposed model. The authors develop a likelihood‐based inference to estimate the model components via an iterative algorithm. They establish an asymptotic distribution theory for the proposed estimators, examine their finite‐sample behaviour through simulation, and use a set of real data to illustrate their approach.  相似文献   

12.
Semiparametric regression models with multiple covariates are commonly encountered. When there are covariates not associated with response variable, variable selection may lead to sparser models, more lucid interpretations and more accurate estimation. In this study, we adopt a sieve approach for the estimation of nonparametric covariate effects in semiparametric regression models. We adopt a two-step iterated penalization approach for variable selection. In the first step, a mixture of the Lasso and group Lasso penalties are employed to conduct the first-round variable selection and obtain the initial estimate. In the second step, a mixture of the weighted Lasso and weighted group Lasso penalties, with weights constructed using the initial estimate, are employed for variable selection. We show that the proposed iterated approach has the variable selection consistency property, even when number of unknown parameters diverges with sample size. Numerical studies, including simulation and analysis of a diabetes dataset, show satisfactory performance of the proposed approach.  相似文献   

13.
In this article, we study a nonparametric approach regarding a general nonlinear reduced form equation to achieve a better approximation of the optimal instrument. Accordingly, we propose the nonparametric additive instrumental variable estimator (NAIVE) with the adaptive group Lasso. We theoretically demonstrate that the proposed estimator is root-n consistent and asymptotically normal. The adaptive group Lasso helps us select the valid instruments while the dimensionality of potential instrumental variables is allowed to be greater than the sample size. In practice, the degree and knots of B-spline series are selected by minimizing the BIC or EBIC criteria for each nonparametric additive component in the reduced form equation. In Monte Carlo simulations, we show that the NAIVE has the same performance as the linear instrumental variable (IV) estimator for the truly linear reduced form equation. On the other hand, the NAIVE performs much better in terms of bias and mean squared errors compared to other alternative estimators under the high-dimensional nonlinear reduced form equation. We further illustrate our method in an empirical study of international trade and growth. Our findings provide a stronger evidence that international trade has a significant positive effect on economic growth.  相似文献   

14.
For linear regression models with non normally distributed errors, the least squares estimate (LSE) will lose some efficiency compared to the maximum likelihood estimate (MLE). In this article, we propose a kernel density-based regression estimate (KDRE) that is adaptive to the unknown error distribution. The key idea is to approximate the likelihood function by using a nonparametric kernel density estimate of the error density based on some initial parameter estimate. The proposed estimate is shown to be asymptotically as efficient as the oracle MLE which assumes the error density were known. In addition, we propose an EM type algorithm to maximize the estimated likelihood function and show that the KDRE can be considered as an iterated weighted least squares estimate, which provides us some insights on the adaptiveness of KDRE to the unknown error distribution. Our Monte Carlo simulation studies show that, while comparable to the traditional LSE for normal errors, the proposed estimation procedure can have substantial efficiency gain for non normal errors. Moreover, the efficiency gain can be achieved even for a small sample size.  相似文献   

15.
If the capture probabilities in a capture‐recapture experiment depend on covariates, parametric models may be fitted and the population size may then be estimated. Here a semiparametric model for the capture probabilities that allows both continuous and categorical covariates is developed. Kernel smoothing and profile estimating equations are used to estimate the nonparametric and parametric components. Analytic forms of the standard errors are derived, which allows an empirical bias bandwidth selection procedure to be used to estimate the bandwidth. The method is evaluated in simulations and is applied to a real data set concerning captures of Prinia flaviventris, which is a common bird species in Southeast Asia.  相似文献   

16.
There are several procedures for fitting generalized additive models, i.e. regression models for an exponential family response where the influence of each single covariates is assumed to have unknown, potentially non-linear shape. Simulated data are used to compare a smoothing parameter optimization approach for selection of smoothness and of covariates, a stepwise approach, a mixed model approach, and a procedure based on boosting techniques. In particular it is investigated how the performance of procedures is linked to amount of information, type of response, total number of covariates, number of influential covariates, and extent of non-linearity. Measures for comparison are prediction performance, identification of influential covariates, and smoothness of fitted functions. One result is that the mixed model approach returns sparse fits with frequently over-smoothed functions, while the functions are less smooth for the boosting approach and variable selection is less strict. The other approaches are in between with respect to these measures. The boosting procedure is seen to perform very well when little information is available and/or when a large number of covariates is to be investigated. It is somewhat surprising that in scenarios with low information the fitting of a linear model, even with stepwise variable selection, has not much advantage over the fitting of an additive model when the true underlying structure is linear. In cases with more information the prediction performance of all procedures is very similar. So, in difficult data situations the boosting approach can be recommended, in others the procedures can be chosen conditional on the aim of the analysis.  相似文献   

17.
Generalized additive mixed models are proposed for overdispersed and correlated data, which arise frequently in studies involving clustered, hierarchical and spatial designs. This class of models allows flexible functional dependence of an outcome variable on covariates by using nonparametric regression, while accounting for correlation between observations by using random effects. We estimate nonparametric functions by using smoothing splines and jointly estimate smoothing parameters and variance components by using marginal quasi-likelihood. Because numerical integration is often required by maximizing the objective functions, double penalized quasi-likelihood is proposed to make approximate inference. Frequentist and Bayesian inferences are compared. A key feature of the method proposed is that it allows us to make systematic inference on all model components within a unified parametric mixed model framework and can be easily implemented by fitting a working generalized linear mixed model by using existing statistical software. A bias correction procedure is also proposed to improve the performance of double penalized quasi-likelihood for sparse data. We illustrate the method with an application to infectious disease data and we evaluate its performance through simulation.  相似文献   

18.
This paper proposes nonparametric estimation methods for functional linear semiparametric quantile regression, where the conditional quantile of the scalar responses is modelled by both scalar and functional covariates and an additional unknown nonparametric function term. The slope function is estimated using the functional principal component basis and the nonparametric function is approximated by a piecewise polynomial function. The asymptotic distribution of the estimators of slope parameters is derived and the global convergence rate of the quantile estimator of unknown slope function is established under suitable norm. The asymptotic distribution of the estimator of the unknown nonparametric function is also established. Simulation studies are conducted to investigate the finite-sample performance of the proposed estimators. The proposed methodology is demonstrated by analysing a real data from ADHD-200 sample.  相似文献   

19.
We consider kernel methods to construct nonparametric estimators of a regression function based on incomplete data. To tackle the presence of incomplete covariates, we employ Horvitz–Thompson-type inverse weighting techniques, where the weights are the selection probabilities. The unknown selection probabilities are themselves estimated using (1) kernel regression, when the functional form of these probabilities are completely unknown, and (2) the least-squares method, when the selection probabilities belong to a known class of candidate functions. To assess the overall performance of the proposed estimators, we establish exponential upper bounds on the \(L_p\) norms, \(1\le p<\infty \), of our estimators; these bounds immediately yield various strong convergence results. We also apply our results to deal with the important problem of statistical classification with partially observed covariates.  相似文献   

20.
We propose a Bayesian nonparametric instrumental variable approach under additive separability that allows us to correct for endogeneity bias in regression models where the covariate effects enter with unknown functional form. Bias correction relies on a simultaneous equations specification with flexible modeling of the joint error distribution implemented via a Dirichlet process mixture prior. Both the structural and instrumental variable equation are specified in terms of additive predictors comprising penalized splines for nonlinear effects of continuous covariates. Inference is fully Bayesian, employing efficient Markov chain Monte Carlo simulation techniques. The resulting posterior samples do not only provide us with point estimates, but allow us to construct simultaneous credible bands for the nonparametric effects, including data-driven smoothing parameter selection. In addition, improved robustness properties are achieved due to the flexible error distribution specification. Both these features are challenging in the classical framework, making the Bayesian one advantageous. In simulations, we investigate small sample properties and an investigation of the effect of class size on student performance in Israel provides an illustration of the proposed approach which is implemented in an R package bayesIV. Supplementary materials for this article are available online.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号