首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
Article: 2     
Summary. Searching for an effective dimension reduction space is an important problem in regression, especially for high dimensional data. We propose an adaptive approach based on semiparametric models, which we call the (conditional) minimum average variance estimation (MAVE) method, within quite a general setting. The MAVE method has the following advantages. Most existing methods must undersmooth the nonparametric link function estimator to achieve a faster rate of consistency for the estimator of the parameters (than for that of the nonparametric function). In contrast, a faster consistency rate can be achieved by the MAVE method even without undersmoothing the nonparametric link function estimator. The MAVE method is applicable to a wide range of models, with fewer restrictions on the distribution of the covariates, to the extent that even time series can be included. Because of the faster rate of consistency for the parameter estimators, it is possible for us to estimate the dimension of the space consistently. The relationship of the MAVE method with other methods is also investigated. In particular, a simple outer product gradient estimator is proposed as an initial estimator. In addition to theoretical results, we demonstrate the efficacy of the MAVE method for high dimensional data sets through simulation. Two real data sets are analysed by using the MAVE approach.  相似文献   

2.
The dimension reduction in regression is an efficient method of overcoming the curse of dimensionality in non-parametric regression. Motivated by recent developments for dimension reduction in time series, an empirical extension of central mean subspace in time series to a single-input transfer function model is performed in this paper. Here, we use central mean subspace as a tool of dimension reduction for bivariate time series in the case when the dimension and lag are known and estimate the central mean subspace through the Nadaraya–Watson kernel smoother. Furthermore, we develop a data-dependent approach based on a modified Schwarz Bayesian criterion to estimate the unknown dimension and lag. Finally, we show that the approach in bivariate time series works well using an expository demonstration, two simulations, and a real data analysis such as El Niño and fish Population.  相似文献   

3.
In this article we introduce a general approach to dynamic path analysis. This is an extension of classical path analysis to the situation where variables may be time-dependent and where the outcome of main interest is a stochastic process. In particular we will focus on the survival and event history analysis setting where the main outcome is a counting process. Our approach will be especially fruitful for analyzing event history data with internal time-dependent covariates, where an ordinary regression analysis may fail. The approach enables us to describe how the effect of a fixed covariate partly is working directly and partly indirectly through internal time-dependent covariates. For the sequence of times of event, we define a sequence of path analysis models. At each time of an event, ordinary linear regression is used to estimate the relation between the covariates, while the additive hazard model is used for the regression of the counting process on the covariates. The methodology is illustrated using data from a randomized trial on survival for patients with liver cirrhosis.  相似文献   

4.
Sliced regression is an effective dimension reduction method by replacing the original high-dimensional predictors with its appropriate low-dimensional projection. It is free from any probabilistic assumption and can exhaustively estimate the central subspace. In this article, we propose to incorporate shrinkage estimation into sliced regression so that variable selection can be achieved simultaneously with dimension reduction. The new method can improve the estimation accuracy and achieve better interpretability for the reduced variables. The efficacy of proposed method is shown through both simulation and real data analysis.  相似文献   

5.
李向杰等 《统计研究》2018,35(7):115-124
经典的充分降维方法对解释变量存在异常值或者当其是厚尾分布时效果较差,为此,经过对充分降维理论中加权与累积切片的分析研究,本文提出了一种将两者有机结合的稳健降维方法:累积加权切片逆回归法(CWSIR)。该方法对自变量存在异常值以及小样本情况下表现比较稳健,并且有效避免了对切片数目的选择。数值模拟结果显示CWSIR要优于传统的切片逆回归(SIR)、累积切片估计(CUME)、基于等高线的切片逆回归估计(CPSIR)、加权典则相关估计(WCANCOR)、切片逆中位数估计(SIME)、加权逆回归估计(WIRE)等方法。最后,我们通过对某视频网站真实数据的分析也验证了CWSIR的有效性。  相似文献   

6.
We consider nonparametric estimation problems in the presence of dependent data, notably nonparametric regression with random design and nonparametric density estimation. The proposed estimation procedure is based on a dimension reduction. The minimax optimal rate of convergence of the estimator is derived assuming a sufficiently weak dependence characterised by fast decreasing mixing coefficients. We illustrate these results by considering classical smoothness assumptions. However, the proposed estimator requires an optimal choice of a dimension parameter depending on certain characteristics of the function of interest, which are not known in practice. The main issue addressed in our work is an adaptive choice of this dimension parameter combining model selection and Lepski's method. It is inspired by the recent work of Goldenshluger and Lepski [(2011), ‘Bandwidth Selection in Kernel Density Estimation: Oracle Inequalities and Adaptive Minimax Optimality’, The Annals of Statistics, 39, 1608–1632]. We show that this data-driven estimator can attain the lower risk bound up to a constant provided a fast decay of the mixing coefficients.  相似文献   

7.
Residual plots are a standard tool for assessing model fit. When some outcome data are censored, standard residual plots become less appropriate. Here, we develop a new procedure for producing residual plots for linear regression models where some or all of the outcome data are censored. We implement two approaches for incorporating parameter uncertainty. We illustrate our methodology by examining the model fit for an analysis of bacterial load data from a trial for chronic obstructive pulmonary disease. Simulated datasets show that the method can be used when the outcome data consist of a variety of types of censoring.  相似文献   

8.
In this article, we propose to use sparse sufficient dimension reduction as a novel method for Markov blanket discovery of a target variable, where we do not take any distributional assumption on the variables. By assuming sparsity on the basis of the central subspace, we developed a penalized loss function estimate on the high-dimensional covariance matrix. A coordinate descent algorithm based on an inverse regression is used to get the sparse basis of the central subspace. Finite sample behavior of the proposed method is explored by simulation study and real data examples.  相似文献   

9.
This study considers the nonparametric estimation of a regression function when the response variable is the waiting time between two consecutive events of a stationary renewal process, and where this variable is not completely observed. In these circumstances, our data are the recurrence times from the occurrence of the last event up to a pre-established time, along with the corresponding values of a certain set of covariates. Estimation of the error density function and some of its characteristics are also considered. For the proposed estimators, we first analyze their asymptotic behavior and, thereafter, carry out a simulation study to highlight their behavior in finite samples. Finally, we apply this methodology to an illustrative example with biomedical data.  相似文献   

10.
In a recent research, the quasi-likelihood estimation methodology was developed to estimate the regression effects in the Generalized BINMA(1) (GBINMA(1)) process. The method provides consistent parameter estimates but, in the intermediate computations, moment estimating equations were used to estimate the serial- and cross-correlation parameters. This procedure may not result optimal parameter estimates, in particular, for the regression effects. This paper provides an alternative simpler GBINMA(1) process based on multivariate thinning properties where the main effects are estimated via a robust generalized quasi-likelihood (GQL) estimation approach. The two techniques are compared through some simulation experiments. A real-life data application is studied.  相似文献   

11.
In high-dimensional data analysis, feature selection becomes one means for dimension reduction, which proceeds with parameter estimation. Concerning accuracy of selection and estimation, we study nonconvex constrained and regularized likelihoods in the presence of nuisance parameters. Theoretically, we show that constrained L(0)-likelihood and its computational surrogate are optimal in that they achieve feature selection consistency and sharp parameter estimation, under one necessary condition required for any method to be selection consistent and to achieve sharp parameter estimation. It permits up to exponentially many candidate features. Computationally, we develop difference convex methods to implement the computational surrogate through prime and dual subproblems. These results establish a central role of L(0)-constrained and regularized likelihoods in feature selection and parameter estimation involving selection. As applications of the general method and theory, we perform feature selection in linear regression and logistic regression, and estimate a precision matrix in Gaussian graphical models. In these situations, we gain a new theoretical insight and obtain favorable numerical results. Finally, we discuss an application to predict the metastasis status of breast cancer patients with their gene expression profiles.  相似文献   

12.
Functional logistic regression is becoming more popular as there are many situations where we are interested in the relation between functional covariates (as input) and a binary response (as output). Several approaches have been advocated, and this paper goes into detail about three of them: dimension reduction via functional principal component analysis, penalized functional regression, and wavelet expansions in combination with Least Absolute Shrinking and Selection Operator penalization. We discuss the performance of the three methods on simulated data and also apply the methods to data regarding lameness detection for horses. Emphasis is on classification performance, but we also discuss estimation of the unknown parameter function.  相似文献   

13.
Non-central chi-squared distribution plays a vital role in statistical testing procedures. Estimation of the non-centrality parameter provides valuable information for the power calculation of the associated test. We are interested in the statistical inference property of the non-centrality parameter estimate based on one observation (usually a summary statistic) from a truncated chi-squared distribution. This work is motivated by the application of the flexible two-stage design in case–control studies, where the sample size needed for the second stage of a two-stage study can be determined adaptively by the results of the first stage. We first study the moment estimate for the truncated distribution and prove its existence, uniqueness, and inadmissibility and convergence properties. We then define a new class of estimates that includes the moment estimate as a special case. Among this class of estimates, we recommend to use one member that outperforms the moment estimate in a wide range of scenarios. We also present two methods for constructing confidence intervals. Simulation studies are conducted to evaluate the performance of the proposed point and interval estimates.  相似文献   

14.

Parameter reduction can enable otherwise infeasible design and uncertainty studies with modern computational science models that contain several input parameters. In statistical regression, techniques for sufficient dimension reduction (SDR) use data to reduce the predictor dimension of a regression problem. A computational scientist hoping to use SDR for parameter reduction encounters a problem: a computer prediction is best represented by a deterministic function of the inputs, so data comprised of computer simulation queries fail to satisfy the SDR assumptions. To address this problem, we interpret SDR methods sliced inverse regression (SIR) and sliced average variance estimation (SAVE) as estimating the directions of a ridge function, which is a composition of a low-dimensional linear transformation with a nonlinear function. Within this interpretation, SIR and SAVE estimate matrices of integrals whose column spaces are contained in the ridge directions’ span; we analyze and numerically verify convergence of these column spaces as the number of computer model queries increases. Moreover, we show example functions that are not ridge functions but whose inverse conditional moment matrices are low-rank. Consequently, the computational scientist should beware when using SIR and SAVE for parameter reduction, since SIR and SAVE may mistakenly suggest that truly important directions are unimportant.

  相似文献   

15.
When combining estimates of a common parameter (of dimension d?1d?1) from independent data sets—as in stratified analyses and meta analyses—a weighted average, with weights ‘proportional’ to inverse variance matrices, is shown to have a minimal variance matrix (a standard fact when d=1d=1)—minimal in the sense that all convex combinations of the coordinates of the combined estimate have minimal variances. Minimum variance for the estimation of a single coordinate of the parameter can therefore be achieved by joint estimation of all coordinates using matrix weights. Moreover, if each estimate is asymptotically efficient within its own data set, then this optimally weighted average, with consistently estimated weights, is shown to be asymptotically efficient in the combined data set and avoids the need to merge the data sets and estimate the parameter in question afresh. This is so whatever additional non-common nuisance parameters may be in the models for the various data sets. A special case of this appeared in Fisher [1925. Theory of statistical estimation. Proc. Cambridge Philos. Soc. 22, 700–725.]: Optimal weights are ‘proportional’ to information matrices, and he argued that sample information should be used as weights rather than expected information, to maintain second-order efficiency of maximum likelihood. A number of special cases have appeared in the literature; we review several of them and give additional special cases, including stratified regression analysis—proportional-hazards, logistic or linear—, combination of independent ROC curves, and meta analysis. A test for homogeneity of the parameter across the data sets is also given.  相似文献   

16.
Sliced Inverse Regression (SIR; 1991) is a dimension reduction method for reducing the dimension of the predictors without losing regression information. The implementation of SIR requires inverting the covariance matrix of the predictors—which has hindered its use to analyze high-dimensional data where the number of predictors exceed the sample size. We propose random sliced inverse regression (rSIR) by applying SIR to many bootstrap samples, each using a subset of randomly selected candidate predictors. The final rSIR estimate is obtained by aggregating these estimates. A simple variable selection procedure is also proposed using these bootstrap estimates. The performance of the proposed estimates is studied via extensive simulation. Application to a dataset concerning myocardial perfusion diagnosis from cardiac Single Proton Emission Computed Tomography (SPECT) images is presented.  相似文献   

17.
Sufficient dimension reduction (SDR) is a popular supervised machine learning technique that reduces the predictor dimension and facilitates subsequent data analysis in practice. In this article, we propose principal weighted logistic regression (PWLR), an efficient SDR method in binary classification where inverse-regression-based SDR methods often suffer. We first develop linear PWLR for linear SDR and study its asymptotic properties. We then extend it to nonlinear SDR and propose the kernel PWLR. Evaluations with both simulated and real data show the promising performance of the PWLR for SDR in binary classification.  相似文献   

18.
Multi-type insurance claim processes have attracted considerable research interest in the literature. The existing statistical inference for such processes, however, may encounter “curse of dimensionality” due to high-dimensional covariates. In this article, a technique of sufficient dimension reduction is applied to multiple-type insurance claim data, which uses a copula to model the dependence between different types of claim processes, and incorporates a one-dimensional frailty to fit the dependence of claims “within” the same claim process. A two-step procedure is proposed to estimate model parameters. The first step develops nonparametric estimators of the baseline, the basis of the central subspace and its dimension, and the regression function. Then the second step estimates the copula parameter. Simulations are performed to evaluate and confirm the theoretical results.  相似文献   

19.
In this article, we construct the uniform confidence band (UCB) of nonparametric trend in a partially linear model with locally stationary regressors. A two-stage semiparametric regression is employed to estimate the trend function. Based on this estimate, we develop an invariance principle to construct the UCB of the trend function. The proposed methodology is used to estimate the Non-Accelerating Inflation Rate of Unemployment (NAIRU) in the Phillips Curve and to perform inference of the parameter based on its UCB. The empirical results strongly suggest that the U.S. NAIRU is time-varying.  相似文献   

20.
Quantile regression (QR) is becoming increasingly popular due to its relevance in many scientific investigations. There is a great amount of work about linear and nonlinear QR models. Specifically, nonparametric estimation of the conditional quantiles received particular attention, due to its model flexibility. However, nonparametric QR techniques are limited in the number of covariates. Dimension reduction offers a solution to this problem by considering low-dimensional smoothing without specifying any parametric or nonparametric regression relation. The existing dimension reduction techniques focus on the entire conditional distribution. We, on the other hand, turn our attention to dimension reduction techniques for conditional quantiles and introduce a new method for reducing the dimension of the predictor $$\mathbf {X}$$. The novelty of this paper is threefold. We start by considering a single index quantile regression model, which assumes that the conditional quantile depends on $$\mathbf {X}$$ through a single linear combination of the predictors, then extend to a multi-index quantile regression model, and finally, generalize the proposed methodology to any statistical functional of the conditional distribution. The performance of the methodology is demonstrated through simulation examples and real data applications. Our results suggest that this method has a good finite sample performance and often outperforms the existing methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号