期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Stepwise Regression in Mixed Quantitative Linear Models with Autocorrelated Errors

Gülhan Alpargu 《统计学通讯:模拟与计算》2013,42(1):79-104

ABSTRACT

In the stepwise procedure of selection of a fixed or a random explanatory variable in a mixed quantitative linear model with errors following a Gaussian stationary autocorrelated process, we have studied the efficiency of five estimators relative to Generalized Least Squares (GLS): Ordinary Least Squares (OLS), Maximum Likelihood (ML), Restricted Maximum Likelihood (REML), First Differences (FD), and First-Difference Ratios (FDR). We have also studied the validity and power of seven derived testing procedures, to assess the significance of the slope of the candidate explanatory variable x ₂ to enter the model in which there is already one regressor x ₁. In addition to five testing procedures of the literature, we considered the FDR t-test with n ? 3 df and the modified t-test with n? ? 3 df for partial correlations, where n? is Dutilleul's effective sample size. Efficiency, validity, and power were analyzed by Monte Carlo simulations, as functions of the nature, fixed vs. random (purely random or autocorrelated), of x ₁ and x ₂, the sample size and the autocorrelation of random terms in the regression model. We report extensive results for the autocorrelation structure of first-order autoregressive [AR(1)] type, and discuss results we obtained for other autocorrelation structures, such as spherical semivariogram, first-order moving average [MA(1)] and ARMA(1,1), but we could not present because of space constraints. Overall, we found that:

the efficiency of slope estimators and the validity of testing procedures depend primarily on the nature of x ₂, but not on that of x ₁;
FDR is the most inefficient slope estimator, regardless of the nature of x ₁ and x ₂;
REML is the most efficient of the slope estimators compared relative to GLS, provided the specified autocorrelation structure is correct and the sample size is large enough to ensure the convergence of its optimization algorithm;
the FDR t-test, the modified t-test and the REML t-test are the most valid of the testing procedures compared, despite the inefficiency of the FDR and OLS slope estimators for the former two;
the FDR t-test, however, suffers from a lack of power that varies with the nature of x ₁ and x ₂; and
the modified t-test for partial correlations, which does not require the specification of an autocorrelation structure, can be recommended when x ₁ is fixed or random and x ₂ is random, whether purely random or autocorrelated. Our results are illustrated by the environmental data that motivated our work.

相似文献

2.

Efficiency and Validity Analyses of Two-Stage Estimation Procedures and Derived Testing Procedures in Quantitative Linear Models with AR(1) Errors

《统计学通讯:模拟与计算》2013,42(3):799-833

Abstract

In a quantitative linear model with errors following a stationary Gaussian, first-order autoregressive or AR(1) process, Generalized Least Squares (GLS) on raw data and Ordinary Least Squares (OLS) on prewhitened data are efficient methods of estimation of the slope parameters when the autocorrelation parameter of the error AR(1) process, ρ, is known. In practice, ρ is generally unknown. In the so-called two-stage estimation procedures, ρ is then estimated first before using the estimate of ρ to transform the data and estimate the slope parameters by OLS on the transformed data. Different estimators of ρ have been considered in previous studies. In this article, we study nine two-stage estimation procedures for their efficiency in estimating the slope parameters. Six of them (i.e., three noniterative, three iterative) are based on three estimators of ρ that have been considered previously. Two more (i.e., one noniterative, one iterative) are based on a new estimator of ρ that we propose: it is provided by the sample autocorrelation coefficient of the OLS residuals at lag 1, denoted r(1). Lastly, REstricted Maximum Likelihood (REML) represents a different type of two-stage estimation procedure whose efficiency has not been compared to the others yet. We also study the validity of the testing procedures derived from GLS and the nine two-stage estimation procedures. Efficiency and validity are analyzed in a Monte Carlo study. Three types of explanatory variable x in a simple quantitative linear model with AR(1) errors are considered in the time domain: Case 1, x is fixed; Case 2, x is purely random; and Case 3, x follows an AR(1) process with the same autocorrelation parameter value as the error AR(1) process. In a preliminary step, the number of inadmissible estimates and the efficiency of the different estimators of ρ are compared empirically, whereas their approximate expected value in finite samples and their asymptotic variance are derived theoretically. Thereafter, the efficiency of the estimation procedures and the validity of the derived testing procedures are discussed in terms of the sample size and the magnitude and sign of ρ. The noniterative two-stage estimation procedure based on the new estimator of ρ is shown to be more efficient for moderate values of ρ at small sample sizes. With the exception of small sample sizes, REML and its derived F-test perform the best overall. The asymptotic equivalence of two-stage estimation procedures, besides REML, is observed empirically. Differences related to the nature, fixed or random (uncorrelated or autocorrelated), of the explanatory variable are also discussed. 相似文献

3.

Efficiency analysis of ten estimation procedures for quantitative linear models with autocorrelated errors

《Journal of Statistical Computation and Simulation》2012,82(3):257-275

Many estimation procedures for quantitative linear models with autocorrelated errors have been proposed in the literature. A number of these procedures have been compared in various ways for different sample sizes and autocorrelation parameters values and for structured or random explanatory vaiables. In this paper, we revisit three situations that were considered to some extent in previous studies, by comparing ten estimation procedures: Ordinary Least Squares (OLS), Generalized Least Squares (GLS), estimated Generalized Least Squares (six procedures), Maximum Likelihood (ML), and First Differences (FD). The six estimated GLS procedures and the ML procedure differ in the way the error autocovariance matrix is estimated. The three situations can be defined as follows: Case 1, the explanatory variable x in the simple linear regression is fixed; Case 2,x is purely random; and Case 3x is first-order autoregressive. Following a theoretical presentation, the ten estimation procedures are compared in a Monte Carlo study conducted in the time domain, where the errors are first-order autoregressive in Cases 1-3. The measure of comparison for the estimation procedures is their efficiency relative to OLS. It is evaluated as a function of the time series length and the magnitude and sign of the error autocorrelation parameter. Overall, knowledge of the model of the time series process generating the errors enhances efficiency in estimated GLS. Differences in the efficiency of estimation procedures between Case 1 and Cases 2 and 3 as well as differences in efficiency among procedures in a given situation are observed and discussed. 相似文献

4.

On the Power of Bootstrapped Specification Tests

《Econometric Reviews》2013,32(3):215-228

Abstract

Decisions based on econometric model estimates may not have the expected effect if the model is misspecified. Thus, specification tests should precede any analysis. Bierens' specification test is consistent and has optimality properties against some local alternatives. A shortcoming is that the test statistic is not distribution free, even asymptotically. This makes the test unfeasible. There have been many suggestions to circumvent this problem, including the use of upper bounds for the critical values. However, these suggestions lead to tests that lose power and optimality against local alternatives. In this paper we show that bootstrap methods allow us to recover power and optimality of Bierens' original test. Bootstrap also provides reliable p-values, which have a central role in Fisher's theory of hypothesis testing. The paper also includes a discussion of the properties of the bootstrap Nonlinear Least Squares Estimator under local alternatives. 相似文献

5.

To be or not to be valid in testing the significance of the slope in simple quantitative linear models with autocorrelated errors

《Journal of Statistical Computation and Simulation》2012,82(3):165-180

In this article, the validity of procedures for testing the significance of the slope in quantitative linear models with one explanatory variable and first-order autoregressive [AR(1)] errors is analyzed in a Monte Carlo study conducted in the time domain. Two cases are considered for the regressor: fixed and trended versus random and AR(1). In addition to the classical t -test using the Ordinary Least Squares (OLS) estimator of the slope and its standard error, we consider seven t -tests with n-2\,\hbox{df} built on the Generalized Least Squares (GLS) estimator or an estimated GLS estimator, three variants of the classical t -test with different variances of the OLS estimator, two asymptotic tests built on the Maximum Likelihood (ML) estimator, the F -test for fixed effects based on the Restricted Maximum Likelihood (REML) estimator in the mixed-model approach, two t -tests with n - 2 df based on first differences (FD) and first-difference ratios (FDR), and four modified t -tests using various corrections of the number of degrees of freedom. The FDR t -test, the REML F -test and the modified t -test using Dutilleul's effective sample size are the most valid among the testing procedures that do not assume the complete knowledge of the covariance matrix of the errors. However, modified t -tests are not applicable and the FDR t -test suffers from a lack of power when the regressor is fixed and trended ( i.e. , FDR is the same as FD in this case when observations are equally spaced), whereas the REML algorithm fails to converge at small sample sizes. The classical t -test is valid when the regressor is fixed and trended and autocorrelation among errors is predominantly negative, and when the regressor is random and AR(1), like the errors, and autocorrelation is moderately negative or positive. We discuss the results graphically, in terms of the circularity condition defined in repeated measures ANOVA and of the effective sample size used in correlation analysis with autocorrelated sample data. An example with environmental data is presented. 相似文献

6.

Generalized Autoregressive (GAR) Model: A Comparison of Maximum Likelihood and Whittle Estimation Procedures Using a Simulation Study

Mahendran Shitan Shelton Peiris 《统计学通讯:模拟与计算》2013,42(3):560-570

This article evaluates the performance of two estimators namely, the Maximum Likelihood Estimator (MLE) and Whittle's Estimator (WE), through a simulation study for the Generalised Autoregressive (GAR) model.

As expected, it is found that for the parameters α and σ², the MLE and WE have a better performance than Method of Moments (MOM) estimator. For the parameter δ, MOM sometimes appears to have a slightly better performance than MLE and WE, possibly due to truncation approximations associated with the hypergeometric functions for calculating the autocorrelation function. However, the MLE and WE can be used in practice without loss of efficiency. 相似文献

7.

A Novel Regression Approach: Least Squares Ratio

Oguz Akbilgic Eylem Deniz Akinci 《统计学通讯:理论与方法》2013,42(9):1539-1545

Regression Analysis (RA) is one of the frequently used tool for forecasting. The Ordinary Least Squares (OLS) Technique is the basic instrument of RA and there are many regression techniques based on OLS. This paper includes a new regression approach, called Least Squares Ratio (LSR), and comparison of OLS and LSR according to mean square errors of estimation of theoretical regression parameters (mse ß) and dependent value (mse y). 相似文献

8.

NONNEGATIVE ESTIMATORS FOR THE ONE-WAY RANDOM EFFECTS MODEL

《统计学通讯:理论与方法》2013,42(8-9):1605-1613

The relative merits of ten estimators for the variance component of the balanced and unbalanced one-way random effects models are compared. Six of the estimators are nonnegative, two of which are obtained by modifying the Minimum Variance Quadratic Unbiased Estimator (MIVQUE) and the Weighted Least Square Estimator (WLS), and two more from the positive parts of these estimators. The Minimum Norm Quadratic Estimator (MINQE), which is nonnegative, is adjusted for reducing its bias. The nonnegative Minimum Mean Square Error Estimator (MIMSQE), the Analysis of Variance (ANOVA) and Unweighted Sums of Squares (USS) estimator are also included. 相似文献

9.

[HDDA] sparse subspace constrained partial least squares

Matthew Sutton Kerrie Mengersen Benoit Liquet 《Journal of Statistical Computation and Simulation》2019,89(6):1005-1019

ABSTRACT

In this paper, we investigate the objective function and deflation process for sparse Partial Least Squares (PLS) regression with multiple components. While many have considered variations on the objective for sparse PLS, the deflation process for sparse PLS has not received as much attention. Our work highlights a flaw in the Statistically Inspired Modification of Partial Least Squares (SIMPLS) deflation method when applied in sparse PLS regression. We also consider the Nonlinear Iterative Partial Least Squares (NIPALS) deflation in sparse PLS regression. To remedy the flaw in the SIMPLS method, we propose a new sparse PLS method wherein the direction vectors are constrained to be sparse and lie in a chosen subspace. We give insight into this new PLS procedure and show through examples and simulation studies that the proposed technique can outperform alternative sparse PLS techniques in coefficient estimation. Moreover, our analysis reveals a simple renormalization step that can be used to improve the estimation of sparse PLS direction vectors generated using any convex relaxation method. 相似文献

10.

Nonlinear unbiased estimation in the linear regression model with nonnormal disturbances

《Journal of statistical planning and inference》1999,81(2):293-309

In the application of the linear regression model there continues to be wide-spread use of the Least Squares Estimator (LSE) due to its theoretical optimality. For example, it is well known that the LSE is the best unbiased estimator under normality while it remains best linear unbiased estimator (BLUE) when the normality assumption is dropped. In this paper we extend an approach given in Knautz (1993) that allows improvement of the LSE in the context of nonnormal and nonsymmetric error distributions. It will be shown that there exist linear plus quadratic (LPQ) estimators, consisting of linear and quadratic terms in the dependent variable, which dominate the LS estimator, depending on second, third and fourth moments of the error distribution. A simulation study illustrates that this remains true if the moments have to be estimated from the data. Computation of confidence intervals using bootstrap methods reveal significant improvement compared with inference based on the LS especially for nonsymmetric distributions of the error term. 相似文献

11.

A study of generalized skew-normal distribution

Wen-Jang Huang Arjun K. Gupta 《Statistics》2013,47(5):942-953

Following the paper by Genton and Loperfido [Generalized skew-elliptical distributions and their quadratic forms, Ann. Inst. Statist. Math. 57 (2005), pp. 389–401], we say that Z has a generalized skew-normal distribution, if its probability density function (p.d.f.) is given by f(z)=2φ_p(z; ξ, Ω)π (z?ξ), z∈?^p, where φ_p(·; ξ, Ω) is the p-dimensional normal p.d.f. with location vector ξ and scale matrix Ω, ξ∈?^p, Ω>0, and π is a skewing function from ?^p to ?, that is 0≤π (z)≤1 and π (?z)=1?π (z), ? z∈?^p. First the distribution of linear transformations of Z are studied, and some moments of Z and its quadratic forms are derived. Next we obtain the joint moment-generating functions (m.g.f.’s) of linear and quadratic forms of Z and then investigate conditions for their independence. Finally explicit forms for the above distributions, m.g.f.’s and moments are derived when π (z)=κ (α′z), where α∈?^p and κ is the normal, Laplace, logistic or uniform distribution function. 相似文献

12.

A MORE GENERAL CRITERION FOR SUBSET SELECTION IN MULTIPLE LINEAR REGRESSION

《统计学通讯:理论与方法》2013,42(5):795-811

ABSTRACT

In this article, we propose a more general criterion called S_p -criterion, for subset selection in the multiple linear regression Model. Many subset selection methods are based on the Least Squares (LS) estimator of β, but whenever the data contain an influential observation or the distribution of the error variable deviates from normality, the LS estimator performs ‘poorly’ and hence a method based on this estimator (for example, Mallows’ C_p -criterion) tends to select a ‘wrong’ subset. The proposed method overcomes this drawback and its main feature is that it can be used with any type of estimator (either the LS estimator or any robust estimator) of β without any need for modification of the proposed criterion. Moreover, this technique is operationally simple to implement as compared to other existing criteria. The method is illustrated with examples. 相似文献

13.

Location adjustment for the minimum volume ellipsoid estimator

Christophe Croux Gentiane Haesbroeck Peter J. Rousseeuw 《Statistics and Computing》2002,12(3):191-200

Estimating multivariate location and scatter with both affine equivariance and positive breakdown has always been difficult. A well-known estimator which satisfies both properties is the Minimum Volume Ellipsoid Estimator (MVE). Computing the exact MVE is often not feasible, so one usually resorts to an approximate algorithm. In the regression setup, algorithms for positive-breakdown estimators like Least Median of Squares typically recompute the intercept at each step, to improve the result. This approach is called intercept adjustment. In this paper we show that a similar technique, called location adjustment, can be applied to the MVE. For this purpose we use the Minimum Volume Ball (MVB), in order to lower the MVE objective function. An exact algorithm for calculating the MVB is presented. As an alternative to MVB location adjustment we propose L ₁ location adjustment, which does not necessarily lower the MVE objective function but yields more efficient estimates for the location part. Simulations compare the two types of location adjustment. We also obtain the maxbias curves of L ₁ and the MVB in the multivariate setting, revealing the superiority of L ₁. 相似文献

14.

A non stochastic ridge regression estimator and comparison with the James-Stein estimator

Luis Firinguetti Hernán Rubio Yogendra P. Chaubey 《统计学通讯:理论与方法》2013,42(8):2298-2310

Abstract

This article presents a non-stochastic version of the Generalized Ridge Regression estimator that arises from a discussion of the properties of a Generalized Ridge Regression estimator whose shrinkage parameters are found to be close to their upper bounds. The resulting estimator takes the form of a shrinkage estimator that is superior to both the Ordinary Least Squares estimator and the James-Stein estimator under certain conditions. A numerical study is provided to investigate the range of signal to noise ratio under which the new estimator dominates the James-Stein estimator with respect to the prediction mean square error. 相似文献

15.

Bias Correction in the Dynamic Panel Data Model with a Nonscalar Disturbance Covariance Matrix

《Econometric Reviews》2013,32(1):29-58

Abstract

Approximation formulae are developed for the bias of ordinary and generalized Least Squares Dummy Variable (LSDV) estimators in dynamic panel data models. Results from Kiviet [Kiviet, J. F. (1995), on bias, inconsistency, and efficiency of various estimators in dynamic panel data models, J. Econometrics68:53–78; Kiviet, J. F. (1999), Expectations of expansions for estimators in a dynamic panel data model: some results for weakly exogenous regressors, In: Hsiao, C., Lahiri, K., Lee, L‐F., Pesaran, M. H., eds., Analysis of Panels and Limited Dependent Variables, Cambridge: Cambridge University Press, pp. 199–225] are extended to higher‐order dynamic panel data models with general covariance structure. The focus is on estimation of both short‐ and long‐run coefficients. The results show that proper modelling of the disturbance covariance structure is indispensable. The bias approximations are used to construct bias corrected estimators which are then applied to quarterly data from 14 European Union countries. Money demand functions for M1, M2 and M3 are estimated for the EU area as a whole for the period 1991: I–1995: IV. Significant spillovers between countries are found reflecting the dependence of domestic money demand on foreign developments. The empirical results show that in general plausible long‐run effects are obtained by the bias corrected estimators. Moreover, finite sample bias, although of moderate magnitude, is present underlining the importance of more refined estimation techniques. Also the efficiency gains by exploiting the heteroscedasticity and cross‐correlation patterns between countries are sometimes considerable. 相似文献

16.

The pitfall of instrumental variables in big data: What the rule of thumb can't give you

Hui Shao Charles Stoecker Shuang Yang 《统计学通讯:模拟与计算》2013,42(7):2118-2124

ABSTRACT

Background: Instrumental variables (IVs) have become much easier to find in the “Big data era” which has increased the number of applications of the Two-Stage Least Squares model (TSLS). With the increased availability of IVs, the possibility that these IVs are weak has increased. Prior work has suggested a ‘rule of thumb’ that IVs with a first stage F statistic at least ten will avoid a relative bias in point estimates greater than 10%. We investigated whether or not this threshold was also an efficient guarantee of low false rejection rates of the null hypothesis test in TSLS applications with many IVs.

Objective: To test how the ‘rule of thumb’ for weak instruments performs in predicting low false rejection rates in the TSLS model when the number of IVs is large.

Method: We used a Monte Carlo approach to create 28 original data sets for different models with the number of IVs varying from 3 to 30. For each model, we generated 2000 observations for each iteration and conducted 50,000 iterations to reach convergence in rejection rates. The point estimate was set to 0, and probabilities of rejecting this hypothesis were recorded for each model as a measurement of false rejection rate. The relationship between the endogenous variable and IVs was carefully adjusted to let the F statistics for the first stage model equal ten, thus simulating the ‘rule of thumb.’

Results: We found that the false rejection rates (type I errors) increased when the number of IVs in the TSLS model increased while holding the F statistics for the first stage model equal to 10. The false rejection rate exceeds 10% when TLSL has 24 IVs and exceed 15% when TLSL has 30 IVs.

Conclusion: When more instrumental variables were applied in the model, the ‘rule of thumb’ was no longer an efficient guarantee for good performance in hypothesis testing. A more restricted margin for F statistics is recommended to replace the ‘rule of thumb,’ especially when the number of instrumental variables is large. 相似文献

17.

Robust multivariate diagnostics for PLSR and application on high dimensional spectrally overlapped drug systems

Aylin Alin Claudio Agostinelli Georgi Gergov Plamen Katsarov Yahya Al-Degs 《Journal of Statistical Computation and Simulation》2019,89(6):966-984

ABSTRACT

Statistical methods are effectively used in the evaluation of pharmaceutical formulations instead of laborious liquid chromatography. However, signal overlapping, nonlinearity, multicollinearity and presence of outliers deteriorate the performance of statistical methods. The Partial Least Squares Regression (PLSR) is a very popular method in the quantification of high dimensional spectrally overlapped drug formulations. The SIMPLS is the mostly used PLSR algorithm, but it is highly sensitive to outliers that also effect the diagnostics. In this paper, we propose new robust multivariate diagnostics to identify outliers, influential observations and points causing non-normality for a PLSR model. We study performances of the proposed diagnostics on two everyday use highly overlapping drug systems: Paracetamol–Caffeine and Doxylamine Succinate–Pyridoxine Hydrochloride. 相似文献

18.

On Semiparametric Mode Regression Estimation

Ali Gannoun Jerome Saracco Keming Yu 《统计学通讯:理论与方法》2013,42(7):1141-1157

It has been found that, for a variety of probability distributions, there is a surprising linear relation between mode, mean, and median. In this article, the relation between mode, mean, and median regression functions is assumed to follow a simple parametric model. We propose a semiparametric conditional mode (mode regression) estimation for an unknown (unimodal) conditional distribution function in the context of regression model, so that any m-step-ahead mean and median forecasts can then be substituted into the resultant model to deliver m-step-ahead mode prediction. In the semiparametric model, Least Squared Estimator (LSEs) for the model parameters and the simultaneous estimation of the unknown mean and median regression functions by the local linear kernel method are combined to infer about the parametric and nonparametric components of the proposed model. The asymptotic normality of these estimators is derived, and the asymptotic distribution of the parameter estimates is also given and is shown to follow usual parametric rates in spite of the presence of the nonparametric component in the model. These results are applied to obtain a data-based test for the dependence of mode regression over mean and median regression under a regression model. 相似文献

19.

Model fitting and inference under latent equilibrium processes

Bhattacharya S Gelfand AE Holsinger KE 《Statistics and Computing》2007,17(2):193-208

This paper presents a methodology for model fitting and inference in the context of Bayesian models of the type f(Y | X,θ)f(X|θ)f(θ), where Y is the (set of) observed data, θ is a set of model parameters and X is an unobserved (latent) stationary stochastic process induced by the first order transition model f(X ^(t+1)|X ^(t),θ), where X ^(t) denotes the state of the process at time (or generation) t. The crucial feature of the above type of model is that, given θ, the transition model f(X ^(t+1)|X ^(t),θ) is known but the distribution of the stochastic process in equilibrium, that is f(X|θ), is, except in very special cases, intractable, hence unknown. A further point to note is that the data Y has been assumed to be observed when the underlying process is in equilibrium. In other words, the data is not collected dynamically over time. We refer to such specification as a latent equilibrium process (LEP) model. It is motivated by problems in population genetics (though other applications are discussed), where it is of interest to learn about parameters such as mutation and migration rates and population sizes, given a sample of allele frequencies at one or more loci. In such problems it is natural to assume that the distribution of the observed allele frequencies depends on the true (unobserved) population allele frequencies, whereas the distribution of the true allele frequencies is only indirectly specified through a transition model. As a hierarchical specification, it is natural to fit the LEP within a Bayesian framework. Fitting such models is usually done via Markov chain Monte Carlo (MCMC). However, we demonstrate that, in the case of LEP models, implementation of MCMC is far from straightforward. The main contribution of this paper is to provide a methodology to implement MCMC for LEP models. We demonstrate our approach in population genetics problems with both simulated and real data sets. The resultant model fitting is computationally intensive and thus, we also discuss parallel implementation of the procedure in special cases. 相似文献

20.

Study of partial least squares and ridge regression methods

Luis Firinguetti Golam Kibria Rodrigo Araya 《统计学通讯:模拟与计算》2017,46(8):6631-6644

This article considers both Partial Least Squares (PLS) and Ridge Regression (RR) methods to combat multicollinearity problem. A simulation study has been conducted to compare their performances with respect to Ordinary Least Squares (OLS). With varying degrees of multicollinearity, it is found that both, PLS and RR, estimators produce significant reductions in the Mean Square Error (MSE) and Prediction Mean Square Error (PMSE) over OLS. However, from the simulation study it is evident that the RR performs better when the error variance is large and the PLS estimator achieves its best results when the model includes more variables. However, the advantage of the ridge regression method over PLS is that it can provide the 95% confidence interval for the regression coefficients while PLS cannot. 相似文献