首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Recent literature provides many computational and modeling approaches for covariance matrices estimation in a penalized Gaussian graphical models but relatively little study has been carried out on the choice of the tuning parameter. This paper tries to fill this gap by focusing on the problem of shrinkage parameter selection when estimating sparse precision matrices using the penalized likelihood approach. Previous approaches typically used K-fold cross-validation in this regard. In this paper, we first derived the generalized approximate cross-validation for tuning parameter selection which is not only a more computationally efficient alternative, but also achieves smaller error rate for model fitting compared to leave-one-out cross-validation. For consistency in the selection of nonzero entries in the precision matrix, we employ a Bayesian information criterion which provably can identify the nonzero conditional correlations in the Gaussian model. Our simulations demonstrate the general superiority of the two proposed selectors in comparison with leave-one-out cross-validation, 10-fold cross-validation and Akaike information criterion.  相似文献   

2.
Abstract

This paper is concerned with model averaging procedure for varying-coefficient partially linear models. We proposed a jackknife model averaging method that involves minimizing a leave-one-out cross-validation criterion, and developed a computational shortcut to optimize the cross-validation criterion for weight choice. The resulting model average estimator is shown to be asymptotically optimal in terms of achieving the smallest possible squared error. The simulation studies have provided evidence of the superiority of the proposed procedures. Our approach is further applied to a real data.  相似文献   

3.
The envelope method produces efficient estimation in multivariate linear regression, and is widely applied in biology, psychology, and economics. This paper estimates parameters through a model averaging methodology and promotes the predicting abilities of the envelope models. We propose a frequentist model averaging method by minimizing a cross-validation criterion. When all the candidate models are misspecified, the proposed model averaging estimator is proved to be asymptotically optimal. When correct candidate models exist, the coefficient estimator is proved to be consistent, and the sum of the weights assigned to the correct models, in probability, converges to one. Simulations and an empirical application demonstrate the effectiveness of the proposed method.  相似文献   

4.
This paper considers the problem of variance estimation for sparse ultra-high dimensional varying coefficient models. We first use B-spline to approximate the coefficient functions, and discuss the asymptotic behavior of a naive two-stage estimator of error variance. We also reveal that this naive estimator may significantly underestimate the error variance due to the spurious correlations, which are even higher for nonparametric models than linear models. This prompts us to propose an accurate estimator of the error variance by effectively integrating the sure independence screening and the refitted cross-validation techniques. The consistency and the asymptotic normality of the resulting estimator are established under some regularity conditions. The simulation studies are carried out to assess the finite sample performance of the proposed methods.  相似文献   

5.
ON SPLINE SMOOTHING WITH AUTOCORRELATED ERRORS   总被引:1,自引:0,他引:1  
The generalised cross-validation criterion for choosing the degree of smoothing in spline regression is extended to accommodate an autocorrelated error sequence. It is demonstrated via simulation that the minimum generalised cross-validation smoothing spline is an inconsistent estimator in the presence of autocorrelated errors and that ignoring even moderate autocorrelation structure can seriously affect the performance of the cross-validated smoothing spline. The method of penalised maximum likelihood is used to develop an efficient algorithm for the case in which the autocorrelation decays exponentially. An application of the method to a published data-set is described. The method does not require the data to be equally spaced in time.  相似文献   

6.
A criterion for choosing an estimator in a family of semi-parametric estimators from incomplete data is proposed. This criterion is the expected observed log-likelihood (ELL). Adapted versions of this criterion in case of censored data and in presence of explanatory variables are exhibited. We show that likelihood cross-validation (LCV) is an estimator of ELL and we exhibit three bootstrap estimators. A simulation study considering both families of kernel and penalized likelihood estimators of the hazard function (indexed on a smoothing parameter) demonstrates good results of LCV and a bootstrap estimator called ELLbboot . We apply the ELLbboot criterion to compare the kernel and penalized likelihood estimators to estimate the risk of developing dementia for women using data from a large cohort study.  相似文献   

7.
Several estimators of squared prediction error have been suggested for use in model and bandwidth selection problems. Among these are cross-validation, generalized cross-validation and a number of related techniques based on the residual sum of squares. For many situations with squared error loss, e.g. nonparametric smoothing, these estimators have been shown to be asymptotically optimal in the sense that in large samples the estimator minimizing the selection criterion also minimizes squared error loss. However, cross-validation is known not to be asymptotically optimal for some `easy' location problems. We consider selection criteria based on estimators of squared prediction risk for choosing between location estimators. We show that criteria based on adjusted residual sum of squares are not asymptotically optimal for choosing between asymptotically normal location estimators that converge at rate n 1/2but are when the rate of convergence is slower. We also show that leave-one-out cross-validation is not asymptotically optimal for choosing between √ n -differentiable statistics but leave- d -out cross-validation is optimal when d ∞ at the appropriate rate.  相似文献   

8.
Summary.  Smoothing splines via the penalized least squares method provide versatile and effective nonparametric models for regression with Gaussian responses. The computation of smoothing splines is generally of the order O ( n 3), n being the sample size, which severely limits its practical applicability. We study more scalable computation of smoothing spline regression via certain low dimensional approximations that are asymptotically as efficient. A simple algorithm is presented and the Bayes model that is associated with the approximations is derived, with the latter guiding the porting of Bayesian confidence intervals. The practical choice of the dimension of the approximating space is determined through simulation studies, and empirical comparisons of the approximations with the exact solution are presented. Also evaluated is a simple modification of the generalized cross-validation method for smoothing parameter selection, which to a large extent fixes the occasional undersmoothing problem that is suffered by generalized cross-validation.  相似文献   

9.
The k nearest neighbors (k-NN) classifier is one of the most popular methods for statistical pattern recognition and machine learning. In practice, the size k, the number of neighbors used for classification, is usually arbitrarily set to one or some other small numbers, or based on the cross-validation procedure. In this study, we propose a novel alternative approach to decide the size k. Based on a k-NN-based multivariate multi-sample test, we assign each k a permutation test based Z-score. The number of NN is set to the k with the highest Z-score. This approach is computationally efficient since we have derived the formulas for the mean and variance of the test statistic under permutation distribution for multiple sample groups. Several simulation and real-world data sets are analyzed to investigate the performance of our approach. The usefulness of our approach is demonstrated through the evaluation of prediction accuracies using Z-score as a criterion to select the size k. We also compare our approach to the widely used cross-validation approaches. The results show that the size k selected by our approach yields high prediction accuracies when informative features are used for classification, whereas the cross-validation approach may fail in some cases.  相似文献   

10.
We treat a non parametric estimator for joint probability mass function, based on multivariate discrete associated kernels which are appropriated for multivariate count data of small and moderate sample sizes. Bayesian adaptive estimation of the vector of bandwidths using the quadratic and entropy loss functions is considered. Exact formulas for the posterior distribution and the vector of bandwidths are obtained. Numerical studies indicate that the performance of our approach is better, comparing with other bandwidth selection techniques using integrated squared error as criterion. Some applications are made on real data sets.  相似文献   

11.
This paper demonstrates that cross-validation (CV) and Bayesian adaptive bandwidth selection can be applied in the estimation of associated kernel discrete functions. This idea is originally proposed by Brewer [A Bayesian model for local smoothing in kernel density estimation, Stat. Comput. 10 (2000), pp. 299–309] to derive variable bandwidths in adaptive kernel density estimation. Our approach considers the adaptive binomial kernel estimator and treats the variable bandwidths as parameters with beta prior distribution. The best variable bandwidth selector is estimated by the posterior mean in the Bayesian sense under squared error loss. Monte Carlo simulations are conducted to examine the performance of the proposed Bayesian adaptive approach in comparison with the performance of the Asymptotic mean integrated squared error estimator and CV technique for selecting a global (fixed) bandwidth proposed in Kokonendji and Senga Kiessé [Discrete associated kernels method and extensions, Stat. Methodol. 8 (2011), pp. 497–516]. The Bayesian adaptive bandwidth estimator performs better than the global bandwidth, in particular for small and moderate sample sizes.  相似文献   

12.
Abstract.  We consider models based on multivariate counting processes, including multi-state models. These models are specified semi-parametrically by a set of functions and real parameters. We consider inference for these models based on coarsened observations, focusing on families of smooth estimators such as produced by penalized likelihood. An important issue is the choice of model structure, for instance, the choice between a Markov and some non-Markov models. We define in a general context the expected Kullback–Leibler criterion and we show that the likelihood-based cross-validation (LCV) is a nearly unbiased estimator of it. We give a general form of an approximate of the leave-one-out LCV. The approach is studied by simulations, and it is illustrated by estimating a Markov and two semi-Markov illness–death models with application on dementia using data of a large cohort study.  相似文献   

13.
Risk estimation is an important statistical question for the purposes of selecting a good estimator (i.e., model selection) and assessing its performance (i.e., estimating generalization error). This article introduces a general framework for cross-validation and derives distributional properties of cross-validated risk estimators in the context of estimator selection and performance assessment. Arbitrary classes of estimators are considered, including density estimators and predictors for both continuous and polychotomous outcomes. Results are provided for general full data loss functions (e.g., absolute and squared error, indicator, negative log density). A broad definition of cross-validation is used in order to cover leave-one-out cross-validation, V-fold cross-validation, Monte Carlo cross-validation, and bootstrap procedures. For estimator selection, finite sample risk bounds are derived and applied to establish the asymptotic optimality of cross-validation, in the sense that a selector based on a cross-validated risk estimator performs asymptotically as well as an optimal oracle selector based on the risk under the true, unknown data generating distribution. The asymptotic results are derived under the assumption that the size of the validation sets converges to infinity and hence do not cover leave-one-out cross-validation. For performance assessment, cross-validated risk estimators are shown to be consistent and asymptotically linear for the risk under the true data generating distribution and confidence intervals are derived for this unknown risk. Unlike previously published results, the theorems derived in this and our related articles apply to general data generating distributions, loss functions (i.e., parameters), estimators, and cross-validation procedures.  相似文献   

14.
This work deals with semiparametric kernel estimator of probability mass functions which are assumed to be modified Poisson distributions. This semiparametric approach is based on discrete associated kernel method appropriated for modelling count data; in particular, the famous discrete symmetric triangular kernels are used. Two data-driven bandwidth selection procedures are investigated and an explicit expression of optimal bandwidth not available until now is provided. Moreover, some asymptotic properties of the cross-validation criterion adapted for discrete semiparametric kernel estimation are studied. Finally, to measure the performance of semiparametric estimator according to each type of bandwidth parameter, some applications are realized on three real count data-sets from sociology and biology.  相似文献   

15.
We propose a new modified (biased) cross-validation method for adaptively determining the bandwidth in a nonparametric density estimation setup. It is shown that the method provides consistent minimizers. Some simulation results are reported on which compare the small sample behavior of the new and the classical cross-validation selectors.  相似文献   

16.
We describe a Monte Carlo investigation of a number of variants of cross-validation for the assessment of performance of predictive models, including different values of k in leave-k-out cross-validation, and implementation either in a one-deep or a two-deep fashion. We assume an underlying linear model that is being fitted using either ridge regression or partial least squares, and vary a number of design factors such as sample size n relative to number of variables p, and error variance. The investigation encompasses both the non-singular (i.e. n > p) and the singular (i.e. n p) cases. The latter is now common in areas such as chemometrics but has as yet received little rigorous investigation. Results of the experiments enable us to reach some definite conclusions and to make some practical recommendations.  相似文献   

17.
The focused information criterion for model selection is constructed to select the model that best estimates a particular quantity of interest, the focus, in terms of mean squared error. We extend this focused selection process to the high‐dimensional regression setting with potentially a larger number of parameters than the size of the sample. We distinguish two cases: (i) the case where the considered submodel is of low dimension and (ii) the case where it is of high dimension. In the former case, we obtain an alternative expression of the low‐dimensional focused information criterion that can directly be applied. In the latter case, we use a desparsified estimator that allows us to derive the mean squared error of the focus estimator. We illustrate the performance of the high‐dimensional focused information criterion with a numerical study and a real dataset.  相似文献   

18.
The Maximum Likelihood (ML) and Best Linear Unbiased (BLU) estimators of the location and scale parameters of an extreme value distribution (Lawless [1982]) are compared under conditions of small sample sizes and Type I censorship. The comparisons were made in terms of the mean square error criterion. According to this criterion, the ML estimator of σ in the case of very small sample sizes (n < 10) and heavy censorship (low censoring time) proved to be more efficient than the corresponding BLU estimator. However, the BLU estimator for σ attains parity with the corresponding ML estimator when the censoring time increases even for sample sizes as low as 10. The BLU estimator of σ attains equivalence with the ML estimator when the sample size increases above 10, particularly when the censoring time is also increased. The situation is reversed when it came to estimating the location parameter μ, as the BLU estimator was found to be consistently more efficient than the ML estimator despite the improved performance of the ML estimator when the sample size increases. However, computational ease and convenience favor the ML estimators.  相似文献   

19.
Cross-validation has been widely used in the context of statistical linear models and multivariate data analysis. Recently, technological advancements give possibility of collecting new types of data that are in the form of curves. Statistical procedures for analysing these data, which are of infinite dimension, have been provided by functional data analysis. In functional linear regression, using statistical smoothing, estimation of slope and intercept parameters is generally based on functional principal components analysis (FPCA), that allows for finite-dimensional analysis of the problem. The estimators of the slope and intercept parameters in this context, proposed by Hall and Hosseini-Nasab [On properties of functional principal components analysis, J. R. Stat. Soc. Ser. B: Stat. Methodol. 68 (2006), pp. 109–126], are based on FPCA, and depend on a smoothing parameter that can be chosen by cross-validation. The cross-validation criterion, given there, is time-consuming and hard to compute. In this work, we approximate this cross-validation criterion by such another criterion so that we can turn to a multivariate data analysis tool in some sense. Then, we evaluate its performance numerically. We also treat a real dataset, consisting of two variables; temperature and the amount of precipitation, and estimate the regression coefficients for the former variable in a model predicting the latter one.  相似文献   

20.
Summary. We obtain the residual information criterion RIC, a selection criterion based on the residual log-likelihood, for regression models including classical regression models, Box–Cox transformation models, weighted regression models and regression models with autoregressive moving average errors. We show that RIC is a consistent criterion, and that simulation studies for each of the four models indicate that RIC provides better model order choices than the Akaike information criterion, corrected Akaike information criterion, final prediction error, C p and R adj2, except when the sample size is small and the signal-to-noise ratio is weak. In this case, none of the criteria performs well. Monte Carlo results also show that RIC is superior to the consistent Bayesian information criterion BIC when the signal-to-noise ratio is not weak, and it is comparable with BIC when the signal-to-noise ratio is weak and the sample size is large.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号