首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Jing Yang  Fang Lu  Hu Yang 《Statistics》2013,47(6):1193-1211
The outer product of gradients (OPG) estimation procedure based on least squares (LS) approach has been presented by Xia et al. [An adaptive estimation of dimension reduction space. J Roy Statist Soc Ser B. 2002;64:363–410] to estimate the single-index parameter in partially linear single-index models (PLSIM). However, its asymptotic property has not been established yet and the efficiency of LS-based method can be significantly affected by outliers and heavy-tailed distributions. In this paper, we firstly derive the asymptotic property of OPG estimator developed by Xia et al. [An adaptive estimation of dimension reduction space. J Roy Statist Soc Ser B. 2002;64:363–410] in theory, and a novel robust estimation procedure combining the ideas of OPG and local rank (LR) inference is further developed for PLSIM along with its theoretical property. Then, we theoretically derive the asymptotic relative efficiency (ARE) of the proposed LR-based procedure with respect to LS-based method, which is shown to possess an expression that is closely related to that of the signed-rank Wilcoxon test in comparison with the t-test. Moreover, we demonstrate that the new proposed estimator has a great efficiency gain across a wide spectrum of non-normal error distributions and almost not lose any efficiency for the normal error. Even in the worst case scenarios, the ARE owns a lower bound equalling to 0.864 for estimating the single-index parameter and a lower bound being 0.8896 for estimating the nonparametric function respectively, versus the LS-based estimators. Finally, some Monte Carlo simulations and a real data analysis are conducted to illustrate the finite sample performance of the estimators.  相似文献   

2.
ABSTRACT

The last few years, the applications of Support Vector Machine (SVM) for solving classification and regression problems have been increasing, due to its high performance and ability to transform the non-linear relationships among variables to linear form by employing the kernel idea (kernel function). In this work, we develop a semi-parametric approach to fit single-index models to deal with high-dimensional problems. To achieve this goal, we use support vector regression (SVR) for estimating the unknown nonparametric link function, while the single-index is determined by using the semi-parametric least squares method (Ichimura 1993). This development enhances the ability of SVR to solve high-dimensional problem. We design a three simulation examples with high-dimensional problems (linear and nonlinear). The simulations demonstrate the superior performance of the proposed method versus the standard SVR method. This is further illustrated by applying the real data.  相似文献   

3.
Biao Zhang 《Statistics》2016,50(5):1173-1194
Missing covariate data occurs often in regression analysis. We study methods for estimating the regression coefficients in an assumed conditional mean function when some covariates are completely observed but other covariates are missing for some subjects. We adopt the semiparametric perspective of Robins et al. [Estimation of regression coefficients when some regressors are not always observed. J Amer Statist Assoc. 1994;89:846–866] on regression analyses with missing covariates, in which they pioneered the use of two working models, the working propensity score model and the working conditional score model. A recent approach to missing covariate data analysis is the empirical likelihood method of Qin et al. [Empirical likelihood in missing data problems. J Amer Statist Assoc. 2009;104:1492–1503], which effectively combines unbiased estimating equations. In this paper, we consider an alternative likelihood approach based on the full likelihood of the observed data. This full likelihood-based method enables us to generate estimators for the vector of the regression coefficients that are (a) asymptotically equivalent to those of Qin et al. [Empirical likelihood in missing data problems. J Amer Statist Assoc. 2009;104:1492–1503] when the working propensity score model is correctly specified, and (b) doubly robust, like the augmented inverse probability weighting (AIPW) estimators of Robins et al. [Estimation of regression coefficients when some regressors are not always observed. J Am Statist Assoc. 1994;89:846–866]. Thus, the proposed full likelihood-based estimators improve on the efficiency of the AIPW estimators when the working propensity score model is correct but the working conditional score model is possibly incorrect, and also improve on the empirical likelihood estimators of Qin, Zhang and Leung [Empirical likelihood in missing data problems. J Amer Statist Assoc. 2009;104:1492–1503] when the reverse is true, that is, the working conditional score model is correct but the working propensity score model is possibly incorrect. In addition, we consider a regression method for estimation of the regression coefficients when the working conditional score model is correctly specified; the asymptotic variance of the resulting estimator is no greater than the semiparametric variance bound characterized by the theory of Robins et al. [Estimation of regression coefficients when some regressors are not always observed. J Amer Statist Assoc. 1994;89:846–866]. Finally, we compare the finite-sample performance of various estimators in a simulation study.  相似文献   

4.
A generalized self-consistency approach to maximum likelihood estimation (MLE) and model building was developed in Tsodikov [2003. Semiparametric models: a generalized self-consistency approach. J. Roy. Statist. Soc. Ser. B Statist. Methodology 65(3), 759–774] and applied to a survival analysis problem. We extend the framework to obtain second-order results such as information matrix and properties of the variance. Multinomial model motivates the paper and is used throughout as an example. Computational challenges with the multinomial likelihood motivated Baker [1994. The Multinomial–Poisson transformation. The Statist. 43, 495–504] to develop the Multinomial–Poisson (MP) transformation for a large variety of regression models with multinomial likelihood kernel. Multinomial regression is transformed into a Poisson regression at the cost of augmenting model parameters and restricting the problem to discrete covariates. Imposing normalization restrictions by means of Lagrange multipliers [Lang, J., 1996. On the comparison of multinomial and Poisson log-linear models. J. Roy. Statist. Soc. Ser. B Statist. Methodology 58, 253–266] justifies the approach. Using the self-consistency framework we develop an alternative solution to multinomial model fitting that does not require augmenting parameters while allowing for a Poisson likelihood and arbitrary covariate structures. Normalization restrictions are imposed by averaging over artificial “missing data” (fake mixture). Lack of probabilistic interpretation at the “complete-data” level makes the use of the generalized self-consistency machinery essential.  相似文献   

5.
Semiparametric regression models have been proposed in the econometric literature as a trade-off between the simple but easily implementable and interpretable parametric models and the flexible but structure free smoothing techniques. Some semiparametric models for binary response with possible application to scoring data are reviewed: single-index models, generalized partially linear models, generalized partially linear single-index models, and multiple-index models. All these models are extensions of the classical logistic regression.  相似文献   

6.
Variable selection in multiple linear regression models is considered. It is shown that for the special case of orthogonal predictor variables, an adaptive pre-test-type procedure proposed by Venter and Steel [Simultaneous selection and estimation for the some zeros family of normal models, J. Statist. Comput. Simul. 45 (1993), pp. 129–146] is almost equivalent to least angle regression, proposed by Efron et al. [Least angle regression, Ann. Stat. 32 (2004), pp. 407–499]. A new adaptive pre-test-type procedure is proposed, which extends the procedure of Venter and Steel to the general non-orthogonal case in a multiple linear regression analysis. This new procedure is based on a likelihood ratio test where the critical value is determined data-dependently. A practical illustration and results from a simulation study are presented.  相似文献   

7.
In this paper, semiparametric methods are applied to estimate multivariate volatility functions, using a residual approach as in [J. Fan and Q. Yao, Efficient estimation of conditional variance functions in stochastic regression, Biometrika 85 (1998), pp. 645–660; F.A. Ziegelmann, Nonparametric estimation of volatility functions: The local exponential estimator, Econometric Theory 18 (2002), pp. 985–991; F.A. Ziegelmann, A local linear least-absolute-deviations estimator of volatility, Comm. Statist. Simulation Comput. 37 (2008), pp. 1543–1564], among others. Our main goal here is two-fold: (1) describe and implement a number of semiparametric models, such as additive, single-index and (adaptive) functional-coefficient, in volatility estimation, all motivated as alternatives to deal with the curse of dimensionality present in fully nonparametric models; and (2) propose the use of a variation of the traditional cross-validation method to deal with model choice in the class of adaptive functional-coefficient models, choosing simultaneously the bandwidth, the number of covariates in the model and also the single-index smoothing variable. The modified cross-validation algorithm is able to tackle the computational burden caused by the model complexity, providing an important tool in semiparametric volatility estimation. We briefly discuss model identifiability when estimating volatility as well as nonnegativity of the resulting estimators. Furthermore, Monte Carlo simulations for several underlying generating models are implemented and applications to real data are provided.  相似文献   

8.
Daniel Hohmann 《Statistics》2013,47(2):348-362
We consider a two-component location mixture model with symmetric components, one of which is assumed to be known, the other is unknown. We show identifiability under assumptions on the tails of the characteristic function for the true underlying mixture, and also construct asymptotically normal estimates. The model is an extension of the contamination model in Bordes et al. [Semiparametric estimation of a two-component mixture model when a component is known, Scand. J. Statist. 33 (2006), pp. 733–752], and also related to a location mixture of one symmetric density as in Bordes et al. [Semiparametric estimation of a two component mixture model, Ann. Statist. 34 (2006), pp. 1204–1232]. We show by simulation that estimating the additional location parameter leads to a slight loss of efficiency as compared with the contamination model.  相似文献   

9.
A reasonable approach to robust regression estimation is minimizing a robust scale estimator of the pairwise differences of residuals. We introduce a large class of estimators based on this strategy extending ideas of Yohai and Zamar (Am. Statist. (1993) 1824–1842) and Croux et al. (J. Am. Statist. Assoc. (1994) 1271–1281). The asymptotic robustness properties of the estimators in this class are addressed using the maxbias curve. We provide a general principle to compute this curve and present explicit formulae for several particular cases including generalized versions of S-, R- and τ-estimators. Finally, the most stable estimator in the class, that is, the estimator with the minimum maxbias curve, is shown to be the set of coefficients that minimizes an appropriate quantile of the distribution of the absolute pairwise differences of residuals.  相似文献   

10.
The demand for reliable statistics in subpopulations, when only reduced sample sizes are available, has promoted the development of small area estimation methods. In particular, an approach that is now widely used is based on the seminal work by Battese et al. [An error-components model for prediction of county crop areas using survey and satellite data, J. Am. Statist. Assoc. 83 (1988), pp. 28–36] that uses linear mixed models (MM). We investigate alternatives when a linear MM does not hold because, on one side, linearity may not be assumed and/or, on the other, normality of the random effects may not be assumed. In particular, Opsomer et al. [Nonparametric small area estimation using penalized spline regression, J. R. Statist. Soc. Ser. B 70 (2008), pp. 265–283] propose an estimator that extends the linear MM approach to the case in which a linear relationship may not be assumed using penalized splines regression. From a very different perspective, Chambers and Tzavidis [M-quantile models for small area estimation, Biometrika 93 (2006), pp. 255–268] have recently proposed an approach for small-area estimation that is based on M-quantile (MQ) regression. This allows for models robust to outliers and to distributional assumptions on the errors and the area effects. However, when the functional form of the relationship between the qth MQ and the covariates is not linear, it can lead to biased estimates of the small area parameters. Pratesi et al. [Semiparametric M-quantile regression for estimating the proportion of acidic lakes in 8-digit HUCs of the Northeastern US, Environmetrics 19(7) (2008), pp. 687–701] apply an extended version of this approach for the estimation of the small area distribution function using a non-parametric specification of the conditional MQ of the response variable given the covariates [M. Pratesi, M.G. Ranalli, and N. Salvati, Nonparametric m-quantile regression using penalized splines, J. Nonparametric Stat. 21 (2009), pp. 287–304]. We will derive the small area estimator of the mean under this model, together with its mean-squared error estimator and compare its performance to the other estimators via simulations on both real and simulated data.  相似文献   

11.
Reduced-rank regression models proposed by Anderson [1951. Estimating linear restrictions on regression coefficients for multivariate normal distributions. Ann. Math. Statist. 22, 327–351] have been used in various applications in social and natural sciences. In this paper we combine the features of these models with another popular, seemingly unrelated regression model proposed by Zellner [1962. An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias. J. Amer. Statist. Assoc. 57, 348–368]. In addition to estimation and inference aspects of the new model, we also discuss an application in the area of marketing.  相似文献   

12.
Doubly robust (DR) estimators of the mean with missing data are compared. An estimator is DR if either the regression of the missing variable on the observed variables or the missing data mechanism is correctly specified. One method is to include the inverse of the propensity score as a linear term in the imputation model [D. Firth and K.E. Bennett, Robust models in probability sampling, J. R. Statist. Soc. Ser. B. 60 (1998), pp. 3–21; D.O. Scharfstein, A. Rotnitzky, and J.M. Robins, Adjusting for nonignorable drop-out using semiparametric nonresponse models (with discussion), J. Am. Statist. Assoc. 94 (1999), pp. 1096–1146; H. Bang and J.M. Robins, Doubly robust estimation in missing data and causal inference models, Biometrics 61 (2005), pp. 962–972]. Another method is to calibrate the predictions from a parametric model by adding a mean of the weighted residuals [J.M Robins, A. Rotnitzky, and L.P. Zhao, Estimation of regression coefficients when some regressors are not always observed, J. Am. Statist. Assoc. 89 (1994), pp. 846–866; D.O. Scharfstein, A. Rotnitzky, and J.M. Robins, Adjusting for nonignorable drop-out using semiparametric nonresponse models (with discussion), J. Am. Statist. Assoc. 94 (1999), pp. 1096–1146]. The penalized spline propensity prediction (PSPP) model includes the propensity score into the model non-parametrically [R.J.A. Little and H. An, Robust likelihood-based analysis of multivariate data with missing values, Statist. Sin. 14 (2004), pp. 949–968; G. Zhang and R.J. Little, Extensions of the penalized spline propensity prediction method of imputation, Biometrics, 65(3) (2008), pp. 911–918]. All these methods have consistency properties under misspecification of regression models, but their comparative efficiency and confidence coverage in finite samples have received little attention. In this paper, we compare the root mean square error (RMSE), width of confidence interval and non-coverage rate of these methods under various mean and response propensity functions. We study the effects of sample size and robustness to model misspecification. The PSPP method yields estimates with smaller RMSE and width of confidence interval compared with other methods under most situations. It also yields estimates with confidence coverage close to the 95% nominal level, provided the sample size is not too small.  相似文献   

13.
In this paper we review existing work on robust estimation for simultaneous equations models. Then we sketch three strategies for obtaining estimators with a high breakdown point and a controllable efficiency: (a) robustifying three-stage least squares, (b) robustifying the full information maximum likelihood method by minimizing the determinant of a robust covariance matrix of residuals, and (c) generalizing multivariate tau-estimators (Lopuhaä, 1992, Can. J. Statist., 19, 307–321) to these models. They have the same order of computational complexity as high breakdown point multivariate estimators. The latter seems the most promising approach.  相似文献   

14.
The purpose of this paper is to develop diagnostics analysis for nonlinear regression models (NLMs) under scale mixtures of skew-normal (SMSN) distributions introduced by Garay et al. [Nonlinear regression models based on SMSN distributions. J. Korean Statist. Soc. 2011;40:115–124]. This novel class of models provides a useful generalization of the symmetrical NLM [Vanegas LH, Cysneiros FJA. Assessment of diagnostic procedures in symmetrical nonlinear regression models. Comput. Statist. Data Anal. 2010;54:1002–1016] since the random terms distributions cover both symmetric as well as asymmetric and heavy-tailed distributions such as the skew-t, skew-slash, skew-contaminated normal distributions, among others. Motivated by the results given in Garay et al. [Nonlinear regression models based on SMSN distributions. J. Korean Statist. Soc. 2011;40:115–124], we presented a score test for testing the homogeneity of the scale parameter and its properties are investigated through Monte Carlo simulations studies. Furthermore, local influence measures and the one-step approximations of the estimates in the case-deletion model are obtained. The newly developed procedures are illustrated considering a real data set.  相似文献   

15.
Composite quantile regression models have been shown to be effective techniques in improving the prediction accuracy [H. Zou and M. Yuan, Composite quantile regression and the oracle model selection theory, Ann. Statist. 36 (2008), pp. 1108–1126; J. Bradic, J. Fan, and W. Wang, Penalized composite quasi-likelihood for ultrahighdimensional variable selection, J. R. Stat. Soc. Ser. B 73 (2011), pp. 325–349; Z. Zhao and Z. Xiao, Efficient regressions via optimally combining quantile information, Econometric Theory 30(06) (2014), pp. 1272–1314]. This paper studies composite Tobit quantile regression (TQReg) from a Bayesian perspective. A simple and efficient MCMC-based computation method is derived for posterior inference using a mixture of an exponential and a scaled normal distribution of the skewed Laplace distribution. The approach is illustrated via simulation studies and a real data set. Results show that combine information across different quantiles can provide a useful method in efficient statistical estimation. This is the first work to discuss composite TQReg from a Bayesian perspective.  相似文献   

16.
In this paper, we describe an overall strategy for robust estimation of multivariate location and shape, and the consequent identification of outliers and leverage points. Parts of this strategy have been described in a series of previous papers (Rocke, Ann. Statist., in press; Rocke and Woodruff, Statist. Neerlandica 47 (1993), 27–42, J. Amer. Statist. Assoc., in press; Woodruff and Rocke, J. Comput. Graphical Statist. 2 (1993), 69–95; J. Amer. Statist. Assoc. 89 (1994), 888–896) but the overall structure is presented here for the first time. After describing the first-level architecture of a class of algorithms for this problem, we review available information about possible tactics for each major step in the process. The major steps that we have found to be necessary are as follows: (1) partition the data into groups of perhaps five times the dimension; (2) for each group, search for the best available solution to a combinatorial estimator such as the Minimum Covariance Determinant (MCD) — these are the preliminary estimates; (3) for each preliminary estimate, iterate to the solution of a smooth estimator chosen for robustness and outlier resistance; and (4) choose among the final iterates based on a robust criterion, such as minimum volume. Use of this algorithm architecture can enable reliable, fast, robust estimation of heavily contaminated multivariate data in high (> 20) dimension even with large quantities of data. A computer program implementing the algorithm is available from the authors.  相似文献   

17.
This is an interesting article that considers the question of inference on unknown linear index coefficients in a general class of models where reduced form parameters are invertible function of one or more linear index. Interpretable sufficient conditions such as monotonicity and or smoothness for the invertibility condition are provided. The results generalize some work in the previous literature by allowing the number of reduced form parameters to exceed the number of indices. The identification and estimation expand on the approach taken in previous work by the authors. Examples include Ahn, Powell, and Ichimura (2004 Ahn, H., Powell, J., and Ichimura, H. (2004), “Simple Estimators for Monotone Index Models,” UC Berkeley Working Paper. [Google Scholar]) for monotone single-index regression models to a multi-index setting and extended by Blundell and Powell (2004 Blundell, R. W., and Powell, J. L. (2004), “Endogeneity in Semiparametric Binary Response Models,” The Review of Economic Studies, 71, 655679.[Crossref], [Web of Science ®] [Google Scholar]) and Powell and Ruud (2008 Powell, J., and Ruud, P. (2008), “Simple Estimators for Semiparametric Multinomial Choice Models,” UC Berkeley Working Paper. [Google Scholar]) to models with endogenous regressors and multinomial response, respectively. A key property of the inference approach taken is that the estimator of the unknown index coefficients (up to scale) is computationally simple to obtain (relative to other estimators in the literature) in that it is closed form. Specifically, unifying an approach for all models considered in this article, the authors propose an estimator, which is the eigenvector of a matrix (defined in terms of a preliminary estimator of the reduced form parameters) corresponding to its smallest eigenvalue. Under suitable conditions, the proposed estimator is shown to be root-n-consistent and asymptotically normal.  相似文献   

18.
Statistical bioequivalence has recently attracted lots of attention. This is perhaps due to the importance of setting a reasonable criterion on the part of a regulatory agency such as the FDA in the US in regulating the manufacturing of drugs (especially generic drugs). Pharmaceutical companies are obviously interested in the criterion since a huge profit is involved. Various criteria and various types of bioequivalence have been proposed. At present, the FDA recommends testing for average bioequivalence. The FDA, however, is considering replacing average bioequivalence by individual bioequivalence. We focus on the criterion of individual bioequivalence proposed earlier by Anderson and Hauck (J. Pharmacokinetics and Biopharmaceutics 18 (1990) 259) and Wellek (Medizinische Informatik und Statistik, vol. 71, Springer, Berlin, 1989, pp. 95–99; Biometrical J. 35 (1993) 47). For their criterion, they proposed TIER (test of individual equivalence ratios). Other tests were also proposed by Phillips (J. Biopharmaceutical Statist. 3 (1993) 185), and Liu and Chow (J. Biopharmaceutical Statist. 7 (1997) 49). In this paper, we propose an alternative test, called nearly unbiased test, which is shown numerically to have power substantially larger than existing tests. We also show that our test works for various models including 2×3 and 2×4 crossover designs.  相似文献   

19.
In this article, we propose some new generalizations of M-estimation procedures for single-index regression models in presence of randomly right-censored responses. We derive consistency and asymptotic normality of our estimates. The results are proved in order to be adapted to a wide range of techniques used in a censored regression framework (e.g. synthetic data or weighted least squares). As in the uncensored case, the estimator of the single-index parameter is seen to have the same asymptotic behavior as in a fully parametric scheme. We compare these new estimators with those based on the average derivative technique of Lu and Burke [2005. Censored multiple regression by the method of average derivatives. J. Multivariate Anal. 95, 182–205] through a simulation study.  相似文献   

20.
Expectile regression [Newey W, Powell J. Asymmetric least squares estimation and testing, Econometrica. 1987;55:819–847] is a nice tool for estimating the conditional expectiles of a response variable given a set of covariates. Expectile regression at 50% level is the classical conditional mean regression. In many real applications having multiple expectiles at different levels provides a more complete picture of the conditional distribution of the response variable. Multiple linear expectile regression model has been well studied [Newey W, Powell J. Asymmetric least squares estimation and testing, Econometrica. 1987;55:819–847; Efron B. Regression percentiles using asymmetric squared error loss, Stat Sin. 1991;1(93):125.], but it can be too restrictive for many real applications. In this paper, we derive a regression tree-based gradient boosting estimator for nonparametric multiple expectile regression. The new estimator, referred to as ER-Boost, is implemented in an R package erboost publicly available at http://cran.r-project.org/web/packages/erboost/index.html. We use two homoscedastic/heteroscedastic random-function-generator models in simulation to show the high predictive accuracy of ER-Boost. As an application, we apply ER-Boost to analyse North Carolina County crime data. From the nonparametric expectile regression analysis of this dataset, we draw several interesting conclusions that are consistent with the previous study using the economic model of crime. This real data example also provides a good demonstration of some nice features of ER-Boost, such as its ability to handle different types of covariates and its model interpretation tools.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号