期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A modified C_p statistic in a system-of-equations model

Vichit Lorchirachoonkul Jirawan Jitthavech 《Journal of statistical planning and inference》2012

A new statistic, SΓ(p), is developed for variable selection in a system-of-equations model. The standardized total mean square error in the SΓ(p)statistic is weighted by the covariance matrix of dependent variables instead of the error covariance matrix of the true model as in the original definition. The new statistic can be also used for model selection in the non-nested models. The estimate of SΓ(p), SC(p), is derived and shown to become SC_ε(p) in the similar form of C_p in a single-equation model when the covariance matrix of sampled dependent variables is replaced by the error covariance matrix under the full model. 相似文献

2.

Resistant selection of the smoothing parameter for smoothing splines

Eva Cantoni Elvezio Ronchetti 《Statistics and Computing》2001,11(2):141-146

Robust automatic selection techniques for the smoothing parameter of a smoothing spline are introduced. They are based on a robust predictive error criterion and can be viewed as robust versions of C _p and cross-validation. They lead to smoothing splines which are stable and reliable in terms of mean squared error over a large spectrum of model distributions. 相似文献

3.

Weighted L1-estimates for the First-order Bifurcating Autoregressive Model

Tamer M. Elbayoumi Jeff Terpstra 《统计学通讯:模拟与计算》2016,45(8):2991-3013

We developed robust estimators that minimize a weighted L₁ norm for the first-order bifurcating autoregressive model. When all of the weights are fixed, our estimate is an L₁ estimate that is robust against outlying points in the response space and more efficient than the least squares estimate for heavy-tailed error distributions. When the weights are random and depend on the points in the factor space, the weighted L₁ estimate is robust against outlying points in the factor space. Simulated and artificial examples are presented. The behavior of the proposed estimate is modeled through a Monte Carlo study. 相似文献

4.

Lower confidence bounds as precision measure for truncated processes

Chia Huang Wu Wen Lea Pearn Pi Chuan Lin 《统计学通讯:模拟与计算》2017,46(2):1461-1480

Process capability index C_p has been the most popular one used in the manufacturing industry to provide numerical measures on process precision. For normally distributed processes with automatic fully inspections, the inspected processes follow truncated normal distributions. In this article, we provide the formulae of moments used for the Edgeworth approximation on the precision measurement C_p for truncated normally distributed processes. Based on the developed moments, lower confidence bounds with various sample sizes and confidence levels are provided and tabulated. Consequently, practitioners can use lower confidence bounds to determine whether their manufacturing processes are capable of preset precision requirements. 相似文献

5.

Remarks on the L1 distance in statistical data analysis

Robert J. Budzyński Witold Kondracki 《统计学通讯:理论与方法》2017,46(19):9355-9363

We propose the L₁ distance between the distribution of a binned data sample and a probability distribution from which it is hypothetically drawn as a statistic for testing agreement between the data and a model. We study the distribution of this distance for N-element samples drawn from k bins of equal probability and derive asymptotic formulae for the mean and dispersion of L₁ in the large-N limit. We argue that the L₁ distance is asymptotically normally distributed, with the mean and dispersion being accurately reproduced by asymptotic formulae even for moderately large values of N and k. 相似文献

6.

An alternate version of the conceptual predictive statistic based on a symmetrized discrepancy measure

Joseph E. Cavanaugh Andrew A. Neath Simon L. Davies 《Journal of statistical planning and inference》2010

The conceptual predictive statistic, C_p, is a widely used criterion for model selection in linear regression. C_p serves as an estimator of a discrepancy, a measure that reflects the disparity between the generating model and a fitted candidate model. This discrepancy, based on scaled squared error loss, is asymmetric: an alternate measure is obtained by reversing the roles of the two models in the definition of the measure. We propose a variant of the C_p statistic based on estimating a symmetrized version of the discrepancy targeted by C_p. We claim that the resulting criterion provides better protection against overfitting than C_p, since the symmetric discrepancy is more sensitive towards detecting overspecification than its asymmetric counterpart. We illustrate our claim by presenting simulation results. Finally, we demonstrate the practical utility of the new criterion by discussing a modeling application based on data collected in a cardiac rehabilitation program at University of Iowa Hospitals and Clinics. 相似文献

7.

Selection of Variables in Multivariate Regression Models for Large Dimensions

《统计学通讯:理论与方法》2012,41(13-14):2465-2489

The Akaike information criterion, AIC, and Mallows’ C _p statistic have been proposed for selecting a smaller number of regressors in the multivariate regression models with fully unknown covariance matrix. All of these criteria are, however, based on the implicit assumption that the sample size is substantially larger than the dimension of the covariance matrix. To obtain a stable estimator of the covariance matrix, it is required that the dimension of the covariance matrix is much smaller than the sample size. When the dimension is close to the sample size, it is necessary to use ridge-type estimators for the covariance matrix. In this article, we use a ridge-type estimators for the covariance matrix and obtain the modified AIC and modified C _p statistic under the asymptotic theory that both the sample size and the dimension go to infinity. It is numerically shown that these modified procedures perform very well in the sense of selecting the true model in large dimensional cases. 相似文献

8.

A MORE GENERAL CRITERION FOR SUBSET SELECTION IN MULTIPLE LINEAR REGRESSION

《统计学通讯:理论与方法》2013,42(5):795-811

ABSTRACT

In this article, we propose a more general criterion called S_p -criterion, for subset selection in the multiple linear regression Model. Many subset selection methods are based on the Least Squares (LS) estimator of β, but whenever the data contain an influential observation or the distribution of the error variable deviates from normality, the LS estimator performs ‘poorly’ and hence a method based on this estimator (for example, Mallows’ C_p -criterion) tends to select a ‘wrong’ subset. The proposed method overcomes this drawback and its main feature is that it can be used with any type of estimator (either the LS estimator or any robust estimator) of β without any need for modification of the proposed criterion. Moreover, this technique is operationally simple to implement as compared to other existing criteria. The method is illustrated with examples. 相似文献

9.

Bootstrapping in a high dimensional but very low-sample size problem

《Journal of Statistical Computation and Simulation》2012,82(8):825-840

This article is concerned with testing multiple hypotheses, one for each of a large number of small data sets. Such data are sometimes referred to as high-dimensional, low-sample size data. Our model assumes that each observation within a randomly selected small data set follows a mixture of C shifted and rescaled versions of an arbitrary density f. A novel kernel density estimation scheme, in conjunction with clustering methods, is applied to estimate f. Bayes information criterion and a new criterion weighted mean of within-cluster variances are used to estimate C, which is the number of mixture components or clusters. These results are applied to the multiple testing problem. The null sampling distribution of each test statistic is determined by f, and hence a bootstrap procedure that resamples from an estimate of f is used to approximate this null distribution. 相似文献

10.

Estimation of the common mean of two univariate normal populations

Bimal Kumar Sinha Omar Mouqadem 《统计学通讯:理论与方法》2013,42(14):1603-1614

The problem of estimating the common mean μ of two univariate normal populations with unknown and unequal variances is considered from a decision-theoretic point of view. We restrict our attention to an appropriate class C and its three subclasses C_0C1C2of un-biased estimates of μ. We consider the usual estimate μ⁰ of μ which is the weighted linear combination of the sample means with weights as reciprocals of the sample variances. Its admissibility in C₀ and extended admissibility in C is proved. Admissible estimates in C₁ and C₂are also obtained.The loss is always assumed to be squared error. The question of admissibility of μ⁰ in the class of all estimators is still open. 相似文献

11.

Nonparametric tests for conditional independence using conditional distributions

Taoufik Bouezmarni 《Journal of nonparametric statistics》2014,26(4):697-719

The concept of causality is naturally defined in terms of conditional distribution, however almost all the empirical works focus on causality in mean. This paper aims to propose a nonparametric statistic to test the conditional independence and Granger non-causality between two variables conditionally on another one. The test statistic is based on the comparison of conditional distribution functions using an L₂ metric. We use Nadaraya–Watson method to estimate the conditional distribution functions. We establish the asymptotic size and power properties of the test statistic and we motivate the validity of the local bootstrap. We ran a simulation experiment to investigate the finite sample properties of the test and we illustrate its practical relevance by examining the Granger non-causality between S&P 500 Index returns and VIX volatility index. Contrary to the conventional t-test which is based on a linear mean-regression, we find that VIX index predicts excess returns both at short and long horizons. 相似文献

12.

Estimation suroptimale de la densité par projection

Denis Bosq 《Revue canadienne de statistique》2005,33(1):21-37

Superefficiency of a projection density estimator The author constructs a projection density estimator with a data‐driven truncation index. This estimator reaches the superoptimal rates 1/n in mean integrated square error and {In ln(n/n}^1/2 in uniform almost sure convergence over a given subspace which is dense in the class of all possible densities; the rate of the estimator is quasi‐optimal everywhere else. The subspace in question may be chosen a priori by the statistician. 相似文献

13.

A class of modified stein estimators with easily computable risk functions

Samuel D. Oman 《Journal of statistical planning and inference》1983,7(4):359-369

Consider the problem of estimating the mean of a p (≥3)-variate multi-normal distribution with identity variance-covariance matrix and with unweighted sum of squared error loss. A class of minimax, noncomparable (i.e. no estimate in the class dominates any other estimate in the class) estimates is proposed; the class contains rules dominating the simple James-Stein estimates. The estimates are essentially smoothed versions of the scaled, truncated James-Stein estimates studied by Efron and Morris. Explicit and analytically tractable expressions for their risks are obtained and are used to give guidelines for selecting estimates within the class. 相似文献

14.

Comparison of computer programs for simple linear L 1 regression

《Journal of Statistical Computation and Simulation》2012,82(1-2):63-68

A number of efficient computer codes are available for the simple linear L ₁ regression problem. However, a number of these codes can be made more efficient by utilizing the least squares solution. In fact, a couple of available computer programs already do so.

We report the results of a computational study comparing several openly available computer programs for solving the simple linear L ₁ regression problem with and without computing and utilizing a least squares solution. 相似文献

15.

Empirical Comparison of Nonparametric Regression Estimates on Real Data

Daniel Jones Michael Kohler Alexander Richter 《统计学通讯:模拟与计算》2016,45(7):2309-2319

The performance of nine different nonparametric regression estimates is empirically compared on ten different real datasets. The number of data points in the real datasets varies between 7, 900 and 18, 000, where each real dataset contains between 5 and 20 variables. The nonparametric regression estimates include kernel, partitioning, nearest neighbor, additive spline, neural network, penalized smoothing splines, local linear kernel, regression trees, and random forests estimates. The main result is a table containing the empirical L₂ risks of all nine nonparametric regression estimates on the evaluation part of the different datasets. The neural networks and random forests are the two estimates performing best. The datasets are publicly available, so that any new regression estimate can be easily compared with all nine estimates considered in this article by just applying it to the publicly available data and by computing its empirical L₂ risks on the evaluation part of the datasets. 相似文献

16.

Influential subsets on the variable selection

Choongrak Kim Soonyoung Hwang 《统计学通讯:理论与方法》2013,42(2):335-347

When one or few observations are deleted in the multiple linear regression model, they can affect the variable selection. In this paper we derived the formula for the Mallows C_p criterion when k observations are deleted and express it as a functionn of basic building blocks such as residuals and leverages. Also, two real date sets are used to see how the selected model changes as few observations re deleted. 相似文献

17.

Generalized Least Squares Model Averaging

Qingfeng Liu Arihiro Yoshimura 《Econometric Reviews》2016,35(8-10):1692-1752

In this article, we propose a method of averaging generalized least squares estimators for linear regression models with heteroskedastic errors. The averaging weights are chosen to minimize Mallows’ C_p-like criterion. We show that the weight vector selected by our method is optimal. It is also shown that this optimality holds even when the variances of the error terms are estimated and the feasible generalized least squares estimators are averaged. The variances can be estimated parametrically or nonparametrically. Monte Carlo simulation results are encouraging. An empirical example illustrates that the proposed method is useful for predicting a measure of firms’ performance. 相似文献

18.

Small-sample intervals for regression

M. A. Tingley 《Revue canadienne de statistique》1992,20(3):271-280

For the general linear regression model Y = Xη + e, we construct small-sample exponentially tilted empirical confidence intervals for a linear parameter 6 = a^Tη and for nonlinear functions of η. The coverage error for the intervals is O_p(1/n), as shown in Tingley and Field (1990). The technique, though sample-based, does not require bootstrap resampling. The first step is calculation of an estimate for η. We have used a Mallows estimate. The algorithm applies whenever η is estimated as the solution of a system of equations having expected value 0. We include calculations of the relative efficiency of the estimator (compared with the classical least-squares estimate). The intervals are compared with asymptotic intervals as found, for example, in Hampel et at. (1986). We demonstrate that the procedure gives sensible intervals for small samples. 相似文献

19.

Reducing the variance by smoothing

《Journal of statistical planning and inference》1997,57(1):29-38

In this paper we show that versions of statistical functionals which are obtained by smoothing the corresponding empirical d.f. with an appropriate kernel can reduce the variance and the mean square error of the statistic. This is shown by studying the influence function of the functional. The smaller variance is achieved when the influence function is either discontinuous or piecewise linear with convexity towards the x-axis. Examples for M- and L-estimators are given. 相似文献

20.

$${\mathcal{L}}_p$$ loss functions: a robust bayesian approach

J. P. Arias-Nicolás J. Martín A. Suárez-Llorens 《Statistical Papers》2009,50(3):501-509

In bayesian inference, the Bayes estimator is the alternative with the minimum expected loss. In most cases, the loss function shows the distance between the alternative and the parameter. Therefore, any distance can lead to a loss function. Among the best known distance functions is L _p one, where the choice of value p may be difficult and arbitrary. This paper examines robust models where the loss function is modelled by family L _p. Our solution concept is the non-dominated alternative. We characterize the non-dominated set by having the posterior distribution function satisfy a particular asymmetry property. We also include an example to illustrate the methodology described. 相似文献