期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Selection of the Ridge Parameter Using Mathematical Programming

Ramadan Hamed Ali EL Hefnawy Aya Farag 《统计学通讯:模拟与计算》2013,42(6):1409-1432

Ridge regression solves multicollinearity problems by introducing a biasing parameter that is called ridge parameter; it shrinks the estimates and their standard errors in order to reach acceptable results. Selection of the ridge parameter was done using several subjective and objective techniques that are concerned with certain criteria. In this study, selection of the ridge parameter depends on other important statistical measures to reach a better value of the ridge parameter. The proposed ridge parameter selection technique depends on a mathematical programming model and the results are evaluated using a simulation study. The performance of the proposed method is good when the error variance is greater than or equal to one; the sample consists of 20 observations, the number of explanatory variables in the model is 2, and there is a very strong correlation between the two explanatory variables. 相似文献

2.

Developing Ridge Parameters for SUR Model

M. A. Alkhamisi 《统计学通讯:理论与方法》2013,42(4):544-564

This paper proposes a number of procedures for developing new biased estimators of the seemingly unrelated regression (SUR) parameters, when the explanatory variables are affected by multicollinearity. Several ridge parameters are proposed and then compared in terms of the trace mean squared error (TMSE) and (PR) criteria. The PR criterion is the proportion of replication (out of 1,000) for which the SUR version of the generalized least squares (SGLS) estimator has a smaller TMSE than others. The study was performed using Monte Carlo simulations where the number of equations in the system, the number of observations, the correlation among equations, and the correlation between explanatory variables have been varied. For each model, we performed 1,000 replications. Our results show that under certain conditions some of the proposed SUR ridge parameters, (R _Sgeom, R _Skmed, R _Sqarith, and R _Sqmax), performed well when compared, in terms of TMSE and PR criteria, with other proposed and popular existing ridge parameters. In large samples and when the collinearity between the explanatory variables is not high, the unbiased SUR estimator (SGLS), performed better than the other ridge parameters. 相似文献

3.

Bootstrapping regression models with BLUS residuals

Michle Grenier Christian Lger 《Revue canadienne de statistique》2000,28(1):31-43

To bootstrap a regression problem, pairs of response and explanatory variables or residuals can be resam‐pled, according to whether we believe that the explanatory variables are random or fixed. In the latter case, different residuals have been proposed in the literature, including the ordinary residuals (Efron 1979), standardized residuals (Bickel & Freedman 1983) and Studentized residuals (Weber 1984). Freedman (1981) has shown that the bootstrap from ordinary residuals is asymptotically valid when the number of cases increases and the number of variables is fixed. Bickel & Freedman (1983) have shown the asymptotic validity for ordinary residuals when the number of variables and the number of cases both increase, provided that the ratio of the two converges to zero at an appropriate rate. In this paper, the authors introduce the use of BLUS (Best Linear Unbiased with Scalar covariance matrix) residuals in bootstrapping regression models. The main advantage of the BLUS residuals, introduced in Theil (1965), is that they are uncorrelated. The main disadvantage is that only n —p residuals can be computed for a regression problem with n cases and p variables. The asymptotic results of Freedman (1981) and Bickel & Freedman (1983) for the ordinary (and standardized) residuals are generalized to the BLUS residuals. A small simulation study shows that even though only n — p residuals are available, in small samples bootstrapping BLUS residuals can be as good as, and sometimes better than, bootstrapping from standardized or Studentized residuals. 相似文献

4.

Doubly robust weighted composite quantile regression based on SCAD-L2

Zhimiao Cao Xiaoning Kang Mingqiu Wang 《Revue canadienne de statistique》2023,51(1):38-76

In this article, a robust variable selection procedure based on the weighted composite quantile regression (WCQR) is proposed. Compared with the composite quantile regression (CQR), WCQR is robust to heavy-tailed errors and outliers in the explanatory variables. For the choice of the weights in the WCQR, we employ a weighting scheme based on the principal component method. To select variables with grouping effect, we consider WCQR with SCAD-L₂ penalization. Furthermore, under some suitable assumptions, the theoretical properties, including the consistency and oracle property of the estimator, are established with a diverging number of parameters. In addition, we study the numerical performance of the proposed method in the case of ultrahigh-dimensional data. Simulation studies and real examples are provided to demonstrate the superiority of our method over the CQR method when there are outliers in the explanatory variables and/or the random error is from a heavy-tailed distribution. 相似文献

5.

Monte Carlo methods for nonparametric survival model determination

Maura Mezzetti Paolo Giudici 《Statistical Methods and Applications》1999,8(1):49-60

In the causal analysis of survival data a time-based response is related to a set of explanatory variables. Definition of the relation between the time and the covariates may become a difficult task, particularly in the preliminary stage, when the information is limited. Through a nonparametric approach, we propose to estimate the survival function allowing to evaluate the relative importance of each potential explanatory variable, in a simple and explanatory fashion. To achieve this aim, each of the explanatory variables is used to partition the observed survival times. The observations are assumed to be partially exchangeable according to such partition. We then consider, conditionally on each partition, a hierarchical nonparametric Bayesian model on the hazard functions. We define and compare different prior distribution for the hazard functions. 相似文献

6.

A Note on Screening Regression Equations

David A. Freedman Professor David A. Freedman Professor 《The American statistician》2013,67(2):152-155

Consider developing a regression model in a context where substantive theory is weak. To focus on an extreme case, suppose that in fact there is no relationship between the dependent variable and the explanatory variables. Even so, if there are many explanatory variables, the R ² will be high. If explanatory variables with small t statistics are dropped and the equation refitted, the R ² will stay high and the overall F will become highly significant. This is demonstrated by simulation and by asymptotic calculation. 相似文献

7.

基于互信息的变量选择方法

周生彬黄叶金《统计与决策》2020,(1):20-23

文章基于解释变量与被解释变量之间的互信息提出一种新的变量选择方法:MI-SIS。该方法可以处理解释变量数目p远大于观测样本量n的超高维问题,即p=O(exp(nε))ε>0。另外,该方法是一种不依赖于模型假设的变量选择方法。数值模拟和实证研究表明,MI-SIS方法在小样本情形下能够有效地发现微弱信号。相似文献

8.

The forward search: Theory and data analysis

Anthony C. Atkinson Marco Riani Andrea Cerioli 《Journal of the Korean Statistical Society》2010,39(2):117-134

The Forward Search is a powerful general method, incorporating flexible data-driven trimming, for the detection of outliers and unsuspected structure in data and so for building robust models. Starting from small subsets of data, observations that are close to the fitted model are added to the observations used in parameter estimation. As this subset grows we monitor parameter estimates, test statistics and measures of fit such as residuals. The paper surveys theoretical development in work on the Forward Search over the last decade. The main illustration is a regression example with 330 observations and 9 potential explanatory variables. Mention is also made of procedures for multivariate data, including clustering, time series analysis and fraud detection. 相似文献

9.

On the Optimal Use of Suboptimal Forecasts of Explanatory Variables

Richard Ashley 《商业与经济统计学杂志》2013,31(2):129-131

Ashley (1983) gave a simple condition for determining when a forecast of an explanatory variable (X_t ) is sufficiently inaccurate that direct replacement of X_t by the forecast yields worse forecasts of the dependent variable than does respecification of the equation to omit X_t . Many available macroeconomic forecasts were shown to be of limited usefulness in direct replacement. Direct replacement, however, is not optimal if the forecast's distribution is known. Here optimal linear forms in commercial forecasts of several macroeconomic variables are obtained by using estimates of their distributions. Although they are an improvement on the raw forecasts (direct replacement), these optimal forms are still too inaccurate to be useful in replacing the actual explanatory variables in forecasting models. The results strongly indicate that optimal forms involving several commercial forecasts will not be very useful either. Thus Ashley's (1983) sufficient condition retains its value in gauging the usefulness of a forecast of an explanatory variable in a forecasting model, even though it focuses on direct replacement. 相似文献

10.

Robust variable selection with application to quality of life research 总被引：1，自引：0，他引：1

Andreas Alfons Wolfgang E. Baaske Peter Filzmoser Wolfgang Mader Roland Wieser 《Statistical Methods and Applications》2011,20(1):65-82

A large database containing socioeconomic data from 60 communities in Austria and Germany has been built, stemming from 18,000 citizens’ responses to a survey, together with data from official statistical institutes about these communities. This paper describes a procedure for extracting a small set of explanatory variables to explain response variables such as the cognition of quality of life. For better interpretability, the set of explanatory variables needs to be very small and the dependencies among the selected variables need to be low. Due to possible inhomogeneities within the data set, it is further required that the solution is robust to outliers and deviating points. In order to achieve these goals, a robust model selection method, combined with a strategy to reduce the number of selected predictor variables to a necessary minimum, is developed. In addition, this context-sensitive method is applied to obtain responsible factors describing quality of life in communities. 相似文献

11.

Weighted modified first order regression procedures for estimation in linear models with missingX-observations

Helge Toutenburg Andreas Fieger V. K. Srivastava 《Statistical Papers》1999,40(3):351-361

This paper considers the estimation of coefficients in a linear regression model with missing observations in the independent variables and introduces a modification of the standard first order regression method for imputation of missing values. The modification provides stochastic values for imputation and, as an extension, makes use of the principle of weighted mixed regression. The proposed procedures are compared with two popular procedures—one which utilizes only the complete observations and the other which employs the standard first order regression imputation method for missing values. A simulation experiment to evaluate the gain in efficiency and to examine interesting issues like the impact of varying degree of multicollinearity in explanatory variables is proceeded. Some work on the case of discrete regressor variables is in progress and will be reported in a future article to follow. 相似文献

12.

Marginally restricted D-optimal designs for correlated observations

J. López-Fidalgo M. Stehlík 《Journal of applied statistics》2008,35(6):617-632

Two practical degrees of complexity may arise when designing an experiment for a model of a real life case. First, some explanatory variables may not be under the control of the practitioner. Secondly, the responses may be correlated. In this paper three real life cases in this situation are considered. Different covariance structures are studied and some designs are computed adapting the theory of marginally restricted designs for correlated observations. An exchange algorithm given by Brimkulov's algorithm is also adapted to marginally restricted D–optimality and it is applied to a complex situation. 相似文献

13.

Interpreting the Principal Component Analysis of Multivariate Density Functions

Rachid Boumaza Smail Yousfi Sabine Demotes-Mainard 《统计学通讯:理论与方法》2013,42(16):3321-3339

Functional principal component analysis (FPCA) as a reduction data technique of a finite number T of functions can be used to identify the dominant modes of variation of numeric three-way data.

We carry out the FPCA on multidimensional probability density functions, relate this method to other standard methods and define its centered or standardized versions. Grounded on the relationship between FPCA of densities, FPCA of their corresponding characteristic functions, PCA of the MacLaurin expansions of these characteristic functions and dual STATIS method applied to their variance matrices, we propose a method for interpreting the results of the FPCA of densities. This method is based on the investigations of the relationships between the scores of the FPCA and the moments associated to the densities.

The method is illustrated using known Gaussian densities. In practice, FPCA of densities deals with observations of multidimensional variables on T occasions. These observations can be used to estimate the T associated densities (i) by estimating the parameters of these densities, assuming that they are Gaussian, or (ii) by using the Gaussian kernel method and choosing the matrix bandwidth by the normal reference rule. Thereafter, FPCA estimate is derived from these estimates and the interpretation method is carried out to explore the dominant modes of variation of the types of three-way data encountered in sensory analysis and archaeology. 相似文献

14.

Smoothing in an underdetermined linear model with random explanatory variables

Gary Sneddon 《Revue canadienne de statistique》1999,27(1):63-79

In some physical systems, where the goal is to describe behaviour over an entire field using scattered observations, a multiple regression model can be derived from the discretization of a continuous process. These models often have more parameters than observations. We propose a technique for constructing smoothed estimators in this situation. Our method assumes the model has random explanatory and response variables, and imposes a smoothness penalty based on the signal-to-noise ratio of the model. Results are présentés using a known value for the ratio, and a method for estimating the ratio is discussed. The procedure is applied to modelling temperature measurements taken in the California Current. 相似文献

15.

Comparison of PLS algorithms when number of objects is much larger than number of variables

Aylin Alin 《Statistical Papers》2009,50(4):711-720

NIPALS and SIMPLS algorithms are the most commonly used algorithms for partial least squares analysis. When the number of objects, N, is much larger than the number of explanatory, K, and/or response variables, M, the NIPALS algorithm can be time consuming. Even though the SIMPLS is not as time consuming as the NIPALS and can be preferred over the NIPALS, there are kernel algorithms developed especially for the cases where N is much larger than number of variables. In this study, the NIPALS, SIMPLS and some kernel algorithms have been used to built partial least squares regression model. Their performances have been compared in terms of the total CPU time spent for the calculations of latent variables, leave-one-out cross validation and bootstrap methods. According to the numerical results, one of the kernel algorithms suggested by Dayal and MacGregor (J Chemom 11:73–85, 1997) is the fastest algorithm. 相似文献

16.

Residual-based specification of the random-effects distribution for cluster data

Samuel Soubeyrand Joël Chad&#x;uf Ivan Sache Christian Lannou 《Statistical Methodology》2006,3(4):464-482

We propose a method for specifying the distribution of random effects included in a model for cluster data. The class of models we consider includes mixed models and frailty models whose random effects and explanatory variables are constant within clusters. The method is based on cluster residuals obtained by assuming that the random effects are equal between clusters. We exhibit an asymptotic relationship between the cluster residuals and variations of the random effects as the number of observations increases and the variance of the random effects decreases. The asymptotic relationship is used to specify the random-effects distribution. The method is applied to a frailty model and a model used to describe the spread of plant diseases. 相似文献

17.

A note on estimating regression coefficients with missing data

Donald Lien David Rearden 《Econometric Reviews》2013,32(1):119-122

In this note, we consider the problem of estimating regression coefficients when there are missing observations of some explanatory variables. Following Dagenais (1973), Gourieroux and Monfort (1981), and Conniffe (1983a, 1983b), we assume auxiliary relationships exist among explanatory varibles. Several estimatprs and their interrelationships are discussed. We begin with the model of Gourieroux and

Monfort (1981) 相似文献

18.

Dealing with big data: comparing dimension reduction and shrinkage regression methods

Hamideh D. Hamedani Sara Sadat Moosavi 《Journal of applied statistics》2017,44(3):511-532

In the past decades, the number of variables explaining observations in different practical applications increased gradually. This has led to heavy computational tasks, despite of widely using provisional variable selection methods in data processing. Therefore, more methodological techniques have appeared to reduce the number of explanatory variables without losing much of the information. In these techniques, two distinct approaches are apparent: ‘shrinkage regression’ and ‘sufficient dimension reduction’. Surprisingly, there has not been any communication or comparison between these two methodological categories, and it is not clear when each of these two approaches are appropriate. In this paper, we fill some of this gap by first reviewing each category in brief, paying special attention to the most commonly used methods in each category. We then compare commonly used methods from both categories based on their accuracy, computation time, and their ability to select effective variables. A simulation study on the performance of the methods in each category is generated as well. The selected methods are concurrently tested on two sets of real data which allows us to recommend conditions under which one approach is more appropriate to be applied to high-dimensional data. 相似文献

19.

Marginally restricted sequential D‐optimal designs

Jesús López‐Fidalgo Raul Martín‐Martín Douglas P. Wiens 《Revue canadienne de statistique》2008,36(3):397-410

In many experiments, not all explanatory variables can be controlled. When the units arise sequentially, different approaches may be used. The authors study a natural sequential procedure for “marginally restricted” D‐optimal designs. They assume that one set of explanatory variables (x₁) is observed sequentially, and that the experimenter responds by choosing an appropriate value of the explanatory variable x₂. In order to solve the sequential problem a priori, the authors consider the problem of constructing optimal designs with a prior marginal distribution for x₁. This eliminates the influence of units already observed on the next unit to be designed. They give explicit designs for various cases in which the mean response follows a linear regression model; they also consider a case study with a nonlinear logistic response. They find that the optimal strategy often consists of randomizing the assignment of the values of x₂. 相似文献

20.

Graphical Techniques for Selecting Explanatory Variables for Time Series Data

J. M. Marriott & A. N. Pettitt 《Journal of the Royal Statistical Society. Series C, Applied statistics》1997,46(2):253-264

Bayesian model building techniques are developed for data with a strong time series structure and possibly exogenous explanatory variables that have strong explanatory and predictive power. The emphasis is on finding whether there are any explanatory variables that might be used for modelling if the data have a strong time series structure that should also be included. We use a time series model that is linear in past observations and that can capture both stochastic and deterministic trend, seasonality and serial correlation. We propose the plotting of absolute predictive error against predictive standard deviation. A series of such plots is utilized to determine which of several nested and non-nested models is optimal in terms of minimizing the dispersion of the predictive distribution and restricting predictive outliers. We apply the techniques to modelling monthly counts of fatal road crashes in Australia where economic, consumption and weather variables are available and we find that three such variables should be included in addition to the time series filter. The approach leads to graphical techniques to determine strengths of relationships between the dependent variable and covariates and to detect model inadequacy as well as determining useful numerical summaries. 相似文献