首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
Inference for the general linear model makes several assumptions, including independence of errors, normality, and homogeneity of variance. Departure from the latter two of these assumptions may indicate the need for data transformation or removal of outlying observations. Informal procedures such as diagnostic plots of residuals are frequently used to assess the validity of these assumptions or to identify possible outliers. A simulation-based approach is proposed, which facilitates the interpretation of various diagnostic plots by adding simultaneous tolerance bounds. Several tests exist for normality or homoscedasticity in simple random samples. These tests are often applied to residuals from a linear model fit. The resulting procedures are approximate in that correlation among residuals is ignored. The simulation-based approach accounts for the correlation structure of residuals in the linear model and allows simultaneously checking for possible outliers, non normality, and heteroscedasticity, and it does not rely on formal testing.

[Supplementary materials are available for this article. Go to the publisher's online edition of Communications in Statistics—Simulation and Computation® for the following three supplemental resource: a word file containing figures illustrating the mode of operation for the bisectional algorithm, QQ-plots, and a residual plot for the mussels data.]  相似文献   

2.
In many applications, a single Box–Cox transformation cannot necessarily produce the normality, constancy of variance and linearity of systematic effects. In this paper, by establishing a heterogeneous linear regression model for the Box–Cox transformed response, we propose a hybrid strategy, in which variable selection is employed to reduce the dimension of the explanatory variables in joint mean and variance models, and Box–Cox transformation is made to remedy the response. We propose a unified procedure which can simultaneously select significant variables in the joint mean and variance models of Box–Cox transformation which provide a useful extension of the ordinary normal linear regression models. With appropriate choice of the tuning parameters, we establish the consistency of this procedure and the oracle property of the obtained estimators. Moreover, we also consider the maximum profile likelihood estimator of the Box–Cox transformation parameter. Simulation studies and a real example are used to illustrate the application of the proposed methods.  相似文献   

3.
In this study, we construct a feasible region, in which we maximize the likelihood function, by using Shapiro–Wilk and Bartlett's test statistics to obtain Box–Cox power transformation parameter for solving the issues of non-normality and/or heterogeneity of variances in analysis of variance (ANOVA). Simulation studies illustrate that the proposed approach is more successful in attaining normality and variance stabilization, and is at least as good as the usual maximum likelihood estimation (MLE) in estimating the transformation parameter for different conditions. Our proposed method is illustrated on two real-life datasets. Moreover, the proposed algorithm is released under R package AID under the name of “boxcoxfr” for implementation.  相似文献   

4.
Consider estimation of a population mean of a response variable when the observations are missing at random with respect to the covariate. Two common approaches to imputing the missing values are the nonparametric regression weighting method and the Horvitz-Thompson (HT) inverse weighting approach. The regression approach includes the kernel regression imputation and the nearest neighbor imputation. The HT approach, employing inverse kernel-estimated weights, includes the basic estimator, the ratio estimator and the estimator using inverse kernel-weighted residuals. Asymptotic normality of the nearest neighbor imputation estimators is derived and compared to kernel regression imputation estimator under standard regularity conditions of the regression function and the missing pattern function. A comprehensive simulation study shows that the basic HT estimator is most sensitive to discontinuity in the missing data patterns, and the nearest neighbors estimators can be insensitive to missing data patterns unbalanced with respect to the distribution of the covariate. Empirical studies show that the nearest neighbor imputation method is most effective among these imputation methods for estimating a finite population mean and for classifying the species of the iris flower data.  相似文献   

5.
In this article, a new class of variance function estimators is proposed in the setting of heteroscedastic nonparametric regression models. To obtain a variance function estimator, the main proposal is to smooth the product of the response variable and residuals as opposed to the squared residuals. The asymptotic properties of the proposed methodology are investigated in order to compare its asymptotic behavior with that of the existing methods. The finite sample performance of the proposed estimator is studied through simulation studies. The effect of the curvature of the mean function on its finite sample behavior is also discussed.  相似文献   

6.
It is important to detect the variance heterogeneity in regression model because efficient inference requires that heteroscedasticity is taken into consideration if it really exists. For the varying-coefficient partially linear regression models, however, the problem of detecting heteroscedasticity has received very little attention. In this paper, we present two classes of tests of heteroscedasticity for varying-coefficient partially linear regression models. The first test statistic is constructed based on the residuals, in which the error term is from a normal distribution. The second one is motivated by the idea that testing heteroscedasticity is equivalent to testing pseudo-residuals for a constant mean. Asymptotic normality is established with different rates corresponding to the null hypothesis of homoscedasticity and the alternative. Some Monte Carlo simulations are conducted to investigate the finite sample performance of the proposed tests. The test methodologies are illustrated with a real data set example.  相似文献   

7.
The authors consider the problem of simultaneous transformation and variable selection for linear regression. They propose a fully Bayesian solution to the problem, which allows averaging over all models considered including transformations of the response and predictors. The authors use the Box‐Cox family of transformations to transform the response and each predictor. To deal with the change of scale induced by the transformations, the authors propose to focus on new quantities rather than the estimated regression coefficients. These quantities, referred to as generalized regression coefficients, have a similar interpretation to the usual regression coefficients on the original scale of the data, but do not depend on the transformations. This allows probabilistic statements about the size of the effect associated with each variable, on the original scale of the data. In addition to variable and transformation selection, there is also uncertainty involved in the identification of outliers in regression. Thus, the authors also propose a more robust model to account for such outliers based on a t‐distribution with unknown degrees of freedom. Parameter estimation is carried out using an efficient Markov chain Monte Carlo algorithm, which permits moves around the space of all possible models. Using three real data sets and a simulated study, the authors show that there is considerable uncertainty about variable selection, choice of transformation, and outlier identification, and that there is advantage in dealing with all three simultaneously. The Canadian Journal of Statistics 37: 361–380; 2009 © 2009 Statistical Society of Canada  相似文献   

8.
Quantitative model validation is playing an increasingly important role in performance and reliability assessment of a complex system whenever computer modelling and simulation are involved. The foci of this paper are to pursue a Bayesian probabilistic approach to quantitative model validation with non-normality data, considering data uncertainty and to investigate the impact of normality assumption on validation accuracy. The Box–Cox transformation method is employed to convert the non-normality data, with the purpose of facilitating the overall validation assessment of computational models with higher accuracy. Explicit expressions for the interval hypothesis testing-based Bayes factor are derived for the transformed data in the context of univariate and multivariate cases. Bayesian confidence measure is presented based on the Bayes factor metric. A generalized procedure is proposed to implement the proposed probabilistic methodology for model validation of complicated systems. Classic hypothesis testing method is employed to conduct a comparison study. The impact of data normality assumption and decision threshold variation on model assessment accuracy is investigated by using both classical and Bayesian approaches. The proposed methodology and procedure are demonstrated with a univariate stochastic damage accumulation model, a multivariate heat conduction problem and a multivariate dynamic system.  相似文献   

9.
In this paper, we propose a quantile approach to the multi-index semiparametric model for an ordinal response variable. Permitting non-parametric transformation of the response, the proposed method achieves a root-n rate of convergence and has attractive robustness properties. Further, the proposed model allows additional indices to model the remaining correlations between covariates and the residuals from the single-index, considerably reducing the error variance and thus leading to more efficient prediction intervals (PIs). The utility of the model is demonstrated by estimating PIs for functional status of the elderly based on data from the second longitudinal study of aging. It is shown that the proposed multi-index model provides significantly narrower PIs than competing models. Our approach can be applied to other areas in which the distribution of future observations must be predicted from ordinal response data.  相似文献   

10.
The rank transform procedure is often used in the analysis of variance when observations are not consistent with normality. The data are ranked and the analysis of variance is applied to the ranked data. Often the rank residuals will be consistent with normality and a valid analysis results. Here we find that the rank transform procedure is equivalent to applying the intended analysis of variance to first order orthonormal polynomials on the rank proportions. Using higher order orthonormal polynomials extends the analysis to higher order effects, roughly detecting dispersion, skewness etc. differences between treatment ranks. Using orthonormal polynomials on the original observations yields the usual analysis of variance for the first order polynomial, and higher order extensions for subsequent polynomials. Again first order reflects location differences, while higher orders roughly detect dispersion, skewness etc. differences between the treatments.  相似文献   

11.
The exponentially weighted moving average (EWMA) control charts with variable sampling intervals (VSIs) have been shown to be substantially quicker than the fixed sampling intervals (FSI) EWMA control charts in detecting process mean shifts. The usual assumption for designing a control chart is that the data or measurements are normally distributed. However, this assumption may not be true for some processes. In the present paper, the performances of the EWMA and combined –EWMA control charts with VSIs are evaluated under non-normality. It is shown that adding the VSI feature to the EWMA control charts results in very substantial decreases in the expected time to detect shifts in process mean under both normality and non-normality. However, the combined –EWMA chart has its false alarm rate and its detection ability is affected if the process data are not normally distributed.  相似文献   

12.
Considered are tests for normality of the errors in ridge regression. If an intercept is included in the model, it is shown that test statistics based on the empirical distribution function of the ridge residuals have the same limiting distribution as in the one-sample test for normality with estimated mean and variance. The result holds with weak assumptions on the behavior of the independent variables; asymptotic normality of the ridge estimator is not required.  相似文献   

13.
The appropriate interpretation of measurements often requires standardization for concomitant factors. For example, standardization of weight for both height and age is important in obesity research and in failure-to-thrive research in children. Regression quantiles from a reference population afford one intuitive and popular approach to standardization. Current methods for the estimation of regression quantiles can be classified as nonparametric with respect to distributional assumptions or as fully parametric. We propose a semiparametric method where we model the mean and variance as flexible regression spline functions and allow the unspecified distribution to vary smoothly as a function of covariates. Similarly to Cole and Green, our approach provides separate estimates and summaries for location, scale and distribution. However, similarly to Koenker and Bassett, we do not assume any parametric form for the distribution. Estimation for either cross-sectional or longitudinal samples is obtained by using estimating equations for the location and scale functions and through local kernel smoothing of the empirical distribution function for standardized residuals. Using this technique with data on weight, height and age for females under 3 years of age, we find that there is a close relationship between quantiles of weight for height and age and quantiles of body mass index (BMI=weight/height2) for age in this cohort.  相似文献   

14.
This work presents an optimal value to be used in the power transformation to transform the exponential to normality for statistical process control (SPC) applications. The optimal value is found by minimizing the sum of absolute differences between two distinct cumulative probability functions. Based on this criterion, a numerical search yields a proposed value of 3.5142, so the transformed distribution is well approximated by the normal distribution. Two examples are presented to demonstrate the effectiveness of using the transformation method and its applications in SPC. The transformed data are almost normally distributed and the performance of the individual charts is satisfactory. Compared to charts that use the original exponential data and probability control limits, the individual charts constructed using the transformed distribution are superior in appearance, ease of interpretation and implementation by practitioners.  相似文献   

15.
The issue of estimating usual nutrient intake distributions and prevalence of inadequate nutrient intakes is of interest in nutrition studies. Box–Cox transformations coupled with the normal distribution are usually employed for modeling nutrient intake data. When the data present highly asymmetric distribution or include outliers, this approach may lead to implausible estimates. Additionally, it does not allow interpretation of the parameters in terms of characteristics of the original data and requires back transformation of the transformed data to the original scale. This paper proposes an alternative approach for estimating usual nutrient intake distributions and prevalence of inadequate nutrient intakes through a Box–Cox t model with random intercept. The proposed model is flexible enough for modeling highly asymmetric data even when outliers are present. Unlike the usual approach, the proposed model does not require a transformation of the data. A simulation study suggests that the Box–Cox t model with random intercept estimates the usual intake distribution satisfactorily, and that it should be preferable to the usual approach particularly in cases of highly asymmetric heavy-tailed data. In applications to data sets on intake of 19 micronutrients, the Box–Cox t models provided better fit than its competitors in most of the cases.  相似文献   

16.
17.
ABSTRACT

It is well known that ignoring heteroscedasticity in regression analysis adversely affects the efficiency of estimation and renders the usual procedure for constructing prediction intervals inappropriate. In some applications, such as off-line quality control, knowledge of the variance function is also of considerable interest in its own right. Thus the modeling of variance constitutes an important part of regression analysis. A common practice in modeling variance is to assume that a certain function of the variance can be closely approximated by a function of a known parametric form. The logarithm link function is often used even if it does not fit the observed variation satisfactorily, as other alternatives may yield negative estimated variances. In this paper we propose a rich class of link functions for more flexible variance modeling which alleviates the major difficulty of negative variances. We suggest also an alternative analysis for heteroscedastic regression models that exploits the principle of “separation” discussed in Box (Signal-to-Noise Ratios, Performance Criteria and Transformation. Technometrics 1988, 30, 1–31). The proposed method does not require any distributional assumptions once an appropriate link function for modeling variance has been chosen. Unlike the analysis in Box (Signal-to-Noise Ratios, Performance Criteria and Transformation. Technometrics 1988, 30, 1–31), the estimated variances and their associated asymptotic variances are found in the original metric (although a transformation has been applied to achieve separation in a different scale), making interpretation of results considerably easier.  相似文献   

18.
Longitudinal or clustered response data arise in many applications such as biostatistics, epidemiology and environmental studies. The repeated responses cannot in general be assumed to be independent. One method of analysing such data is by using the generalized estimating equations (GEE) approach. The current GEE method for estimating regression effects in longitudinal data focuses on the modelling of the working correlation matrix assuming a known variance function. However, correct choice of the correlation structure may not necessarily improve estimation efficiency for the regression parameters if the variance function is misspecified [Wang YG, Lin X. Effects of variance-function misspecification in analysis of longitudinal data. Biometrics. 2005;61:413–421]. In this connection two problems arise: finding a correct variance function and estimating the parameters of the chosen variance function. In this paper, we study the problem of estimating the parameters of the variance function assuming that the form of the variance function is known and then the effect of a misspecified variance function on the estimates of the regression parameters. We propose a GEE approach to estimate the parameters of the variance function. This estimation approach borrows the idea of Davidian and Carroll [Variance function estimation. J Amer Statist Assoc. 1987;82:1079–1091] by solving a nonlinear regression problem where residuals are regarded as the responses and the variance function is regarded as the regression function. A limited simulation study shows that the proposed method performs at least as well as the modified pseudo-likelihood approach developed by Wang and Zhao [A modified pseudolikelihood approach for analysis of longitudinal data. Biometrics. 2007;63:681–689]. Both these methods perform better than the GEE approach.  相似文献   

19.
We consider a method of moments approach for dealing with censoring at zero for data expressed in levels when researchers would like to take logarithms. A Box–Cox transformation is employed. We explore this approach in the context of linear regression where both dependent and independent variables are censored. We contrast this method to two others, (1) dropping records of data containing censored values and (2) assuming normality for censored observations and the residuals in the model. Across the methods considered, where researchers are interested primarily in the slope parameter, estimation bias is consistently reduced using the method of moments approach.  相似文献   

20.
The conditional distribution given complete sufficient statistics is used along with the Rao-Blackwell theorem to obtain uniformly minimum variance unbiased (UMVU) estimators after a transformation to normality has been applied to data. The estimators considered are for the mean, the variance and the cumulative distribution of the original non-normal data. Previous procedures to obtain UMVU estimators have used Laplace transforms, Taylor expansions and the jackknife. An integration method developed in this paper requires only integrability of the normalizing transformation function. This method is easy to employ and it is always possible to obtain a numerical result.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号