期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An improved C_p criterion for spline smoothing

Chun-Shu Chen Hsin-Cheng Huang 《Journal of statistical planning and inference》2011,141(1):445-452

Spline smoothing is a popular technique for curve fitting, in which selection of the smoothing parameter is crucial. Many methods such as Mallows’ C_p, generalized maximum likelihood (GML), and the extended exponential (EE) criterion have been proposed to select this parameter. Although C_p is shown to be asymptotically optimal, it is usually outperformed by other selection criteria for small to moderate sample sizes due to its high variability. On the other hand, GML and EE are more stable than C_p, but they do not possess the same asymptotic optimality as C_p. Instead of selecting this smoothing parameter directly using C_p, we propose to select among a small class of selection criteria based on Stein's unbiased risk estimate (SURE). Due to the selection effect, the spline estimate obtained from a criterion in this class is nonlinear. Thus, the effective degrees of freedom in SURE contains an adjustment term in addition to the trace of the smoothing matrix, which cannot be ignored in small to moderate sample sizes. The resulting criterion, which we call adaptive C_p, is shown to have an analytic expression, and hence can be efficiently computed. Moreover, adaptive C_p is not only demonstrated to be superior and more stable than commonly used selection criteria in a simulation study, but also shown to possess the same asymptotic optimality as C_p. 相似文献

2.

Selection of Variables in Multivariate Regression Models for Large Dimensions

《统计学通讯:理论与方法》2012,41(13-14):2465-2489

The Akaike information criterion, AIC, and Mallows’ C _p statistic have been proposed for selecting a smaller number of regressors in the multivariate regression models with fully unknown covariance matrix. All of these criteria are, however, based on the implicit assumption that the sample size is substantially larger than the dimension of the covariance matrix. To obtain a stable estimator of the covariance matrix, it is required that the dimension of the covariance matrix is much smaller than the sample size. When the dimension is close to the sample size, it is necessary to use ridge-type estimators for the covariance matrix. In this article, we use a ridge-type estimators for the covariance matrix and obtain the modified AIC and modified C _p statistic under the asymptotic theory that both the sample size and the dimension go to infinity. It is numerically shown that these modified procedures perform very well in the sense of selecting the true model in large dimensional cases. 相似文献

3.

A modified C_p statistic in a system-of-equations model

Vichit Lorchirachoonkul Jirawan Jitthavech 《Journal of statistical planning and inference》2012

A new statistic, SΓ(p), is developed for variable selection in a system-of-equations model. The standardized total mean square error in the SΓ(p)statistic is weighted by the covariance matrix of dependent variables instead of the error covariance matrix of the true model as in the original definition. The new statistic can be also used for model selection in the non-nested models. The estimate of SΓ(p), SC(p), is derived and shown to become SC_ε(p) in the similar form of C_p in a single-equation model when the covariance matrix of sampled dependent variables is replaced by the error covariance matrix under the full model. 相似文献

4.

Linear model selection by cross-validation

《Journal of statistical planning and inference》2005,128(1):231-240

We consider the problem of model (or variable) selection in the classical regression model based on cross-validation with an added penalty term for penalizing overfitting. Under some weak conditions, the new criterion is shown to be strongly consistent in the sense that with probability one, for all large n, the criterion chooses the smallest true model. The penalty function denoted by C_n depends on the sample size n and is chosen to ensure the consistency in the selection of true model. There are various choices of C_n suggested in the literature on model selection. In this paper we show that a particular choice of C_n based on observed data, which makes it random, preserves the consistency property and provides improved performance over a fixed choice of C_n. 相似文献

5.

Model selection with data-oriented penalty

《Journal of statistical planning and inference》1999,77(1):103-117

We consider the problem of model (or variable) selection in the classical regression model using the GIC (general information criterion). In this method the maximum likelihood is used with a penalty function denoted by C_n, depending on the sample size n and chosen to ensure consistency in the selection of the true model. There are various choices of C_n suggested in the literature on model selection. In this paper we show that a particular choice of C_n based on observed data, which makes it random, preserves the consistency property and provides improved performance over a fixed choice of C_n. 相似文献

6.

Alternatives to post‐processing posterior predictive p values

Jrund Gsemyr Ida Scheel 《Scandinavian Journal of Statistics》2019,46(4):1252-1273

The posterior predictive p value (ppp) was invented as a Bayesian counterpart to classical p values. The methodology can be applied to discrepancy measures involving both data and parameters and can, hence, be targeted to check for various modeling assumptions. The interpretation can, however, be difficult since the distribution of the ppp value under modeling assumptions varies substantially between cases. A calibration procedure has been suggested, treating the ppp value as a test statistic in a prior predictive test. In this paper, we suggest that a prior predictive test may instead be based on the expected posterior discrepancy, which is somewhat simpler, both conceptually and computationally. Since both these methods require the simulation of a large posterior parameter sample for each of an equally large prior predictive data sample, we furthermore suggest to look for ways to match the given discrepancy by a computation‐saving conflict measure. This approach is also based on simulations but only requires sampling from two different distributions representing two contrasting information sources about a model parameter. The conflict measure methodology is also more flexible in that it handles non‐informative priors without difficulty. We compare the different approaches theoretically in some simple models and in a more complex applied example. 相似文献

7.

Discrepancy for uniform design of experiments with mixtures

Jian-Hui Ning Yong-Dao Zhou Kai-Tai Fang 《Journal of statistical planning and inference》2011,141(4):1487-1496

The uniform design is a kind of space filling design that is robust against the model specification. The uniform design has been widely applied to experiments with mixtures. In this paper, we propose a new discrepancy DM₂-discrepancy as a new criterion to measure the uniformity of designs with mixtures. A computational formula of the new discrepancy, by the functional method, is also given. This property overcome the main disadvantage of the discrepancies proposed before. 相似文献

8.

Using a Truncated C p Statistic for Variable Selection in Multiple Linear Regression

D. W. Uys S. J. Steel 《统计学通讯:模拟与计算》2013,42(2):420-432

In multiple linear regression analysis each lower-dimensional subspace L of a known linear subspace M of ?ⁿ corresponds to a non empty subset of the columns of the regressor matrix. For a fixed subspace L, the C _p statistic is an unbiased estimator of the mean square error if the projection of the response vector onto L is used to estimate the expected response. In this article, we consider two truncated versions of the C _p statistic that can also be used to estimate this mean square error. The C _p statistic and its truncated versions are compared in two example data sets, illustrating that use of the truncated versions may result in models different from those selected by standard C _p. 相似文献

9.

On detecting influential data and selecting regression variables

《Journal of statistical planning and inference》1996,53(3):421-435

The analysis of residuals may reveal various functional forms suitable for the regression model. In this paper, we investigate some selection criteria for selecting important regression variables. In doing so, we use statistical selection and ranking procedures. Thus, we derive an appropriate criterion to measure the influence and bias for the reduced models. We show that the reduced models are based on some noncentrality parameters which provide a measure of goodness of fit for the fitted models. In this paper, we also discuss the relationships of influence diagnostics and the statistic proposed earlier by Gupta and Huang (J. Statist. Plann. Inference 20 (1988) 155–167). We introduce a new measure for detecting influential data as an alternative to Cook's measure. 相似文献

10.

Resistant selection of the smoothing parameter for smoothing splines

Eva Cantoni Elvezio Ronchetti 《Statistics and Computing》2001,11(2):141-146

Robust automatic selection techniques for the smoothing parameter of a smoothing spline are introduced. They are based on a robust predictive error criterion and can be viewed as robust versions of C _p and cross-validation. They lead to smoothing splines which are stable and reliable in terms of mean squared error over a large spectrum of model distributions. 相似文献

11.

Bounds on dispersion of order statistics based on dependent symmetrically distributed random variables

Krzysztof Jasiński Tomasz Rychlik 《Journal of statistical planning and inference》2012

We consider a fixed number of arbitrarily dependent random variables with a common symmetric marginal distribution. For each order statistic based on the variables, we determine a common optimal bound, dependent in a simple way on the sample size and number of order statistics, for various measures of dispersion of the order statistics, expressed in terms of the same dispersion measure of the single original variable. The dispersion measures are connected with the notion of M-functional of a random variable location with respect to a symmetric and convex loss function. The measure is defined as the expected loss paid for the discrepancy between the M-functional and the variable. The most popular examples are the median absolute deviation and variance. 相似文献

12.

Generalized Least Squares Model Averaging

Qingfeng Liu Arihiro Yoshimura 《Econometric Reviews》2016,35(8-10):1692-1752

In this article, we propose a method of averaging generalized least squares estimators for linear regression models with heteroskedastic errors. The averaging weights are chosen to minimize Mallows’ C_p-like criterion. We show that the weight vector selected by our method is optimal. It is also shown that this optimality holds even when the variances of the error terms are estimated and the feasible generalized least squares estimators are averaged. The variances can be estimated parametrically or nonparametrically. Monte Carlo simulation results are encouraging. An empirical example illustrates that the proposed method is useful for predicting a measure of firms’ performance. 相似文献

13.

Optimal U-type design for Bayesian nonparametric multiresponse prediction

Rong-Xian Yue Hong QinKashinath Chatterjee 《Journal of statistical planning and inference》2011,141(7):2472-2479

This paper presents an extension of the work of Yue and Chatterjee (2010) about U-type designs for Bayesian nonparametric response prediction. We consider nonparametric Bayesian regression model with p responses. We use U-type designs with n runs, m factors and q levels for the nonparametric multiresponse prediction based on the asymptotic Bayesian criterion. A lower bound for the proposed criterion is established, and some optimal and nearly optimal designs for the illustrative models are given. 相似文献

14.

Probability-Based Process Capability Indices

K. G. Khadse 《统计学通讯:模拟与计算》2013,42(4):884-904

This article deals with alternative process capability indices (PCIs) to traditional basic PCIs C _p, C _pk, and C _pm based on different fraction conforming type of probabilities. In view of various problems of constructing capability indices for univariate as well as multivariate set up, these alternative PCIs are very useful as compared to C _p, C _pk, and C _pm. Computing aspects of proposed PCIs are discussed for normal and non normal processes when process tolerance is symmetric as well as asymmetric. Generalization of these PCIs for multivariate set up is also discussed. Some simulation study results and real life problems are given for applications of proposed PCIs. 相似文献

15.

Estimated confidence under ancillary statistic everywhere-valid constraint

《Journal of statistical planning and inference》1998,67(1):123-135

Consider the problem of estimating the coverage function of an usual confidence interval for a randomly chosen linear combination of the elements of the mean vector of a p-dimensional normal distribution. The usual constant coverage probability estimator is shown to be admissible under the ancillary statistic everywhere-valid constraint. Note that this estimator is not admissible under the usual sense if p⩾5. Since the criterion of admissibility under the ancillary statistic everywhere-valid constraint is a reasonable one, that the constant coverage probability estimator has been commonly accepted is justified. 相似文献

16.

A Test of Independence Based on the Likelihood of Cut-Points

Reza Modarres 《统计学通讯:模拟与计算》2013,42(4):817-825

We define a test statistic C _n based on the sum of the likelihood ratio statistics for testing independence in the 2 × 2 tables defined at n sample cut-points (X _i, Y _i). The asymptotic distribution of C _n, given the cut-points, is sum of dependent χ² variables with one degree of freedom. We use the bootstrap to obtain the distribution of C _n. We compare the performance of several tests of bivariate independence, including Pearson, Spearman, and Kendall correlations, Blum-Kiefer-Rosenblatt statistic, and C _n under several copulas and given marginal distributions. 相似文献

17.

A note on the generalized degrees of freedom under the L₁ loss function

Xiaoli Gao Yixin Fang 《Journal of statistical planning and inference》2011,141(2):677-686

相似文献

18.

A general class of capability indices in the case of asymmetric tolerances

Kerstin Vännman 《统计学通讯:理论与方法》2013,42(8):2049-2072

Vannman has earlier studied a class of capability indices, containing the indices C_p, C_pk, C_pm and C_pmk, when the tolerances are symmetric. We study the properties of this class when the tolerances are asymmetric and suggest a new enlargened class of indices. Under the assumption of normality an explicit form of the distribution of the new class of the estimated indices is provided. Numerical investigations are made to explore the behavior of the estimators of the indices for different values of the parameters. Based on the estimator a decision rule that can be used to determine whether the process can be considered capable or not is provided and suitable criteria for choosing an index from the family are suggested. 相似文献

19.

Asymptotic properties of a goodness-of-fit test based on maximum correlations

Aurea Grané Anna V. Tchirina 《Statistics》2013,47(1):202-215

We study the efficiency properties of the goodness-of-fit test based on the Q _n statistic introduced in Fortiana and Grané [Goodness-of-fit tests based on maximum correlations and their orthogonal decompositions, J. R. Stat. Soc. B 65 (2003), pp. 115–126] using the concepts of Bahadur asymptotic relative efficiency and Bahadur asymptotic optimality. We compare the test based on this statistic with those based on the Kolmogorov–Smirnov, the Cramér-von Mises criterion and the Anderson–Darling statistics. We also describe the distribution families for which the test based on Q _n is locally asymptotically optimal in the Bahadur sense and, as an application, we use this test to detect the presence of hidden periodicities in a stationary time series. 相似文献

20.

Inference for Random Coefficient INAR(1) Process Based on Frequency Domain Analysis

Haixiang Zhang 《统计学通讯:模拟与计算》2015,44(4):1078-1100

In this article, we present the explicit expressions for the higher-order moments and cumulants of the first-order random coefficient integer-valued autoregressive (RCINAR(1)) process. The spectral and bispectral density functions are also obtained, which can characterize the RCINAR(1) process in the frequency domain. We use a frequency domain approach which is named Whittle criterion to estimate the parameters of the process. We propose a test statistic which is based on the frequency domain approach for the hypothesis test, H₀: α = 0?H₁: 0 < α < 1, where α is the mean of the random coefficient in the process. The asymptotic distribution of the test statistic is obtained. We compare the proposed test statistic with other statistics that can test serial dependence in time series of count via a typically numerical simulation, which indicates that our proposed test statistic has a good power. 相似文献