首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 578 毫秒
1.
ABSTRACT

In this article we evaluate the performance of a randomization test for a subset of regression coefficients in a linear model. This randomization test is based on random permutations of the independent variables. It is shown that the method maintains its level of significance, except for extreme situations, and has power that approximates the power of another randomization test, which is based on the permutation of residuals from the reduced model. We also show, via an example, that the method of permuting independent variables is more valuable than other randomization methods because it can be used in connection with the downweighting of outliers.  相似文献   

2.
To carry out a permutation test we have to examine the n! permutations of the observations. In order to make the permutation test feasible, Dwass (1957) proposed to examine only a sample of these permutations. With the help of sequential methods, we obtain a test which is never less efficient than that proposed by Dwass or the permutation test itself, in the sense that it is as powerful and never requires more permutations to make a decision. In practice, we can expect to gain much efficiency.  相似文献   

3.
A class of tests due to Shoemaker (Commun Stat Simul Comput 28: 189–205, 1999) for differences in scale which is valid for a variety of both skewed and symmetric distributions when location is known or unknown is considered. The class is based on the interquantile range and requires that the population variances are finite. In this paper, we firstly propose a permutation version of it that does not require the condition of finite variances and is remarkably more powerful than the original one. Secondly we solve the question of what quantile choose by proposing a combined interquantile test based on our permutation version of Shoemaker tests. Shoemaker showed that the more extreme interquantile range tests are more powerful than the less extreme ones, unless the underlying distributions are very highly skewed. Since in practice you may not know if the underlying distributions are very highly skewed or not, the question arises. The combined interquantile test solves this question, is robust and more powerful than the stand alone tests. Thirdly we conducted a much more detailed simulation study than that of Shoemaker (1999) that compared his tests to the F and the squared rank tests showing that his tests are better. Since the F and the squared rank test are not good for differences in scale, his results suffer of such a drawback, and for this reason instead of considering the squared rank test we consider, following the suggestions of several authors, tests due to Brown–Forsythe (J Am Stat Assoc 69:364–367, 1974), Pan (J Stat Comput Simul 63:59–71, 1999), O’Brien (J Am Stat Assoc 74:877–880, 1979) and Conover et al. (Technometrics 23:351–361, 1981).  相似文献   

4.
Permutation tests are often used to analyze data since they may not require one to make assumptions regarding the form of the distribution to have a random and independent sample selection. We initially considered a permutation test to assess the treatment effect on computed tomography lesion volume in the National Institute of Neurological Disorders and Stroke (NINDS) t-PA Stroke Trial, which has highly skewed data. However, we encountered difficulties in summarizing the permutation test results on the lesion volume. In this paper, we discuss some aspects of permutation tests and illustrate our findings. This experience with the NINDS t-PA Stroke Trial data emphasizes that permutation tests are useful for clinical trials and can be used to validate assumptions of an observed test statistic. The permutation test places fewer restrictions on the underlying distribution but is not always distribution-free or an exact test, especially for ill-behaved data. Quasi-likelihood estimation using the generalized estimating equation (GEE) approach on transformed data seems to be a good choice for analyzing CT lesion data, based on both its corresponding permutation test and its clinical interpretation.  相似文献   

5.
One way to cope with high-dimensional data even in small samples is the use of pairwise distance measures—such as the Euclidean distance—between the sample vectors. This is usually done with permutation tests. Here we propose the application of exact parametric rotation tests which are no longer restricted by the finite number of possible permutations of a sample. The method is presented in the framework of the general linear model.  相似文献   

6.
The t-statistic used in the existing literature for testing the significance of linear multiple regression coefficients has only a limited use in testing the marginal significance of explanatory variables though it is used in testing the partial significance also. This article identifies the t-statistic appropriate for testing the partial significance.  相似文献   

7.
Inference for the general linear model makes several assumptions, including independence of errors, normality, and homogeneity of variance. Departure from the latter two of these assumptions may indicate the need for data transformation or removal of outlying observations. Informal procedures such as diagnostic plots of residuals are frequently used to assess the validity of these assumptions or to identify possible outliers. A simulation-based approach is proposed, which facilitates the interpretation of various diagnostic plots by adding simultaneous tolerance bounds. Several tests exist for normality or homoscedasticity in simple random samples. These tests are often applied to residuals from a linear model fit. The resulting procedures are approximate in that correlation among residuals is ignored. The simulation-based approach accounts for the correlation structure of residuals in the linear model and allows simultaneously checking for possible outliers, non normality, and heteroscedasticity, and it does not rely on formal testing.

[Supplementary materials are available for this article. Go to the publisher's online edition of Communications in Statistics—Simulation and Computation® for the following three supplemental resource: a word file containing figures illustrating the mode of operation for the bisectional algorithm, QQ-plots, and a residual plot for the mussels data.]  相似文献   

8.
A sequential method for approximating a general permutation test (SAPT) is proposed and evaluated. Permutations are randomly generated from some set G, and a sequential probability ratio test (SPRT) is used to determine whether an observed test statistic falls sufficiently far in the tail of the permutation distribution to warrant rejecting some hypothesis. An estimate and bounds on the power function of the SPRT are used to find bounds on the effective significance level of the SAPT. Guidelines are developed for choosing parameters in order to obtain a desired significance level and minimize the number of permutations needed to reach a decision. A theoretical estimate of the average number of permutations under the null hypothesis is given along with simulation results demonstrating the power and average number of permutations for various alternatives. The sequential approximation retains the generality of the permutation test,- while avoiding the computational complexities that arise in attempting to computer the full permutation distribution exactly  相似文献   

9.
For longitudinal data, the within-subject dependence structure and covariance parameters may be of practical and theoretical interests. The estimation of covariance parameters has received much attention and been studied mainly in the framework of generalized estimating equations (GEEs). The GEEs method, however, is sensitive to outliers. In this paper, an alternative set of robust generalized estimating equations for both the mean and covariance parameters are proposed in the partial linear model for longitudinal data. The asymptotic properties of the proposed estimators of regression parameters, non-parametric function and covariance parameters are obtained. Simulation studies are conducted to evaluate the performance of the proposed estimators under different contaminations. The proposed method is illustrated with a real data analysis.  相似文献   

10.
Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

11.
A new family of statistics is proposed to test for the presence of serial correlation in linear regression models. The tests are based on partial sums of lagged cross-products of regression residuals that define a class of interesting Gaussian processes. These processes are characterized in terms of regressor functions, the serial-correlation structure, the distribution of the noise process, and the order of the lag of the cross-products of residuals. It is shown that these four factors affect the lagged residual processes independently. Large-sample distributional results are presented for test statistics under the null hypothesis of no serial correlation or for alternatives from a range of interesting hypotheses. Some indication of the circumstances to which the asymptotic results apply in finite-sample situations and of those to which they should be applied with some caution are obtained through a simulation study. Tables of selected quantiles of the proposed tests are also given. The tests are illustrated with two examples taken from the empirical literature. It is also proposed that plots of lagged residual processes be used as diagnostic tools to gain insight into the correlation structure of residuals derived from regression fits.  相似文献   

12.
Statistics that usually accompany the regression model do not provide insight into the quality of the data or the potential influence of the individual observations on the estimates. In this study, the Q2 statistic is used as a criterion for detecting influential observations or outliers. The statistic is derived from the jackknifed residuals, the squared sum of which is generally known as the prediction sum of squares or PRESS. This article compares R 2 with Q2 and suggests that the latter be used as part of the data-quality check. It is shown, for two separate data sets obtained from regional cost of living and U.S. food industry studies, that in the presence of outliers the Q2 statistic can be negative, because it is sensitive to the choice of regressors and the inclusion of influential observations. Once the outliers are dropped from the sample, the discrepancy between Q2 and R 2 values is negligible.  相似文献   

13.
14.
Latent variable structural models and the partial least-squares (PLS) estimation procedure have found increased interest since being used in the context of customer satisfaction measurement. The well-known property that the estimates of the inner structure model are inconsistent implies biased estimates for finite sample sizes. A simplified version of the structural model that is used for the Swedish Customer Satisfaction Index (SCSI) system has been used to generate simulated data and to study the PLS algorithm in the presence of three inadequacies: (i) skew instead of symmetric distributions for manifest variables; (ii) multi-collinearity within blocks of manifest and between latent variables; and (iii) misspecification of the structural model (omission of regressors). The simulation results show that the PLS method is quite robust against these inadequacies. The bias that is caused by the inconsistency of PLS estimates is substantially increased only for extremely skewed distributions and for the erroneous omission of a highly relevant latent regressor variable. The estimated scores of the latent variables are always in very good agreement with the true values and seem to be unaffected by the inadequacies under investigation.  相似文献   

15.
The L1-type regularization provides a useful tool for variable selection in high-dimensional regression modeling. Various algorithms have been proposed to solve optimization problems for L1-type regularization. Especially the coordinate descent algorithm has been shown to be effective in sparse regression modeling. Although the algorithm shows a remarkable performance to solve optimization problems for L1-type regularization, it suffers from outliers, since the procedure is based on the inner product of predictor variables and partial residuals obtained from a non-robust manner. To overcome this drawback, we propose a robust coordinate descent algorithm, especially focusing on the high-dimensional regression modeling based on the principal components space. We show that the proposed robust algorithm converges to the minimum value of its objective function. Monte Carlo experiments and real data analysis are conducted to examine the efficiency of the proposed robust algorithm. We observe that our robust coordinate descent algorithm effectively performs for the high-dimensional regression modeling even in the presence of outliers.  相似文献   

16.
In large-scale data, for example, analyzing microarray data, which includes hypothesis testing for equality of means in order to discover differentially expressed genes, often deals with a large number of features versus a few number of replicates. Furthermore, some genes are differentially expressed and some others not. Thus, a usual permutation method, which is applied facing these situations, estimates the p-value poorly. This is because two types of genes are mixed. To overcome this obstacle, the null permutation samples are suggested in the literatures. We propose a modified uniformly most powerful unbiased test for testing the null hypothesis.  相似文献   

17.
Ordinal data, such as student's grades or customer satisfaction surveys, are widely used in daily life. We can fit a probit or logistic regression model to the ordinal data using software such as SAS and get the estimates of regression parameters. However, it is hard to define residuals and detect outliers due to the fact that the estimated probabilities of an observation falling in every category form a vector instead of a scalar. With the help of latent variable and latent residuals, a Bayesian perspective of detecting outliers is explored and several methods were proposed in this article. Several figures are also given.  相似文献   

18.
Partial specification of a prior distribution can be appealing to an analyst, but there is no conventional way to update a partial prior. In this paper, we show how a framework for Bayesian updating with data can be based on the Dirichlet(a) process. Within this framework, partial information predictors generalize standard minimax predictors and have interesting multiple-point shrinkage properties. Approximations to partial-information estimators for squared error loss are defined straightforwardly, and an estimate of the mean shrinks the sample mean. The proposed updating of the partial prior is a consequence of four natural requirements when the Dirichlet parameter a is continuous. Namely, the updated partial posterior should be calculable from knowledge of only the data and partial prior, it should be faithful to the full posterior distribution, it should assign positive probability to every observed event {X,}, and it should not assign probability to unobserved events not included in the partial prior specification.  相似文献   

19.
Summary.  Because highly correlated data arise from many scientific fields, we investigate parameter estimation in a semiparametric regression model with diverging number of predictors that are highly correlated. For this, we first develop a distribution-weighted least squares estimator that can recover directions in the central subspace, then use the distribution-weighted least squares estimator as a seed vector and project it onto a Krylov space by partial least squares to avoid computing the inverse of the covariance of predictors. Thus, distrbution-weighted partial least squares can handle the cases with high dimensional and highly correlated predictors. Furthermore, we also suggest an iterative algorithm for obtaining a better initial value before implementing partial least squares. For theoretical investigation, we obtain strong consistency and asymptotic normality when the dimension p of predictors is of convergence rate O { n 1/2/ log ( n )} and o ( n 1/3) respectively where n is the sample size. When there are no other constraints on the covariance of predictors, the rates n 1/2 and n 1/3 are optimal. We also propose a Bayesian information criterion type of criterion to estimate the dimension of the Krylov space in the partial least squares procedure. Illustrative examples with a real data set and comprehensive simulations demonstrate that the method is robust to non-ellipticity and works well even in 'small n –large p ' problems.  相似文献   

20.
This paper considers residuals for time series regression. Despite much literature on visual diagnostics for uncorrelated data, there is little on the autocorrelated case. To examine various aspects of the fitted time series regression model, three residuals are considered. The fitted regression model can be checked using orthogonal residuals; the time series error model can be analysed using marginal residuals; and the white noise error component can be tested using conditional residuals. When used together, these residuals allow identification of outliers, model mis‐specification and mean shifts. Due to the sensitivity of conditional residuals to model mis‐specification, it is suggested that the orthogonal and marginal residuals be examined first.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号