首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
To bootstrap a regression problem, pairs of response and explanatory variables or residuals can be resam‐pled, according to whether we believe that the explanatory variables are random or fixed. In the latter case, different residuals have been proposed in the literature, including the ordinary residuals (Efron 1979), standardized residuals (Bickel & Freedman 1983) and Studentized residuals (Weber 1984). Freedman (1981) has shown that the bootstrap from ordinary residuals is asymptotically valid when the number of cases increases and the number of variables is fixed. Bickel & Freedman (1983) have shown the asymptotic validity for ordinary residuals when the number of variables and the number of cases both increase, provided that the ratio of the two converges to zero at an appropriate rate. In this paper, the authors introduce the use of BLUS (Best Linear Unbiased with Scalar covariance matrix) residuals in bootstrapping regression models. The main advantage of the BLUS residuals, introduced in Theil (1965), is that they are uncorrelated. The main disadvantage is that only np residuals can be computed for a regression problem with n cases and p variables. The asymptotic results of Freedman (1981) and Bickel & Freedman (1983) for the ordinary (and standardized) residuals are generalized to the BLUS residuals. A small simulation study shows that even though only np residuals are available, in small samples bootstrapping BLUS residuals can be as good as, and sometimes better than, bootstrapping from standardized or Studentized residuals.  相似文献   

2.
The admissibility results of Hoffmann (1977), proved in the context of a nonsingular covariance matrix are extended to the situation where the covariance matrix is singular. Admissible linear estimators in the Gauss-Markoff model are characterised and admissibility of the Best Linear Unbiased Estimator is investigated.  相似文献   

3.
Correspondence analysis is a versatile statistical technique that allows the user to graphically identify the association that may exist between variables of a contingency table. For two categorical variables, the classical approach involves applying singular value decomposition to the Pearson residuals of the table. These residuals allow for one to use a simple test to determine those cells that deviate from what is expected under independence. However, the assumptions concerning these residuals are not always satisfied and so such results can lead to questionable conclusions.One may consider instead, an adjustment of the Pearson residual, which is known to have properties associated with the standard normal distribution. This paper explores the application of these adjusted residuals to correspondence analysis and determines how they impact upon the configuration of points in the graphical display.  相似文献   

4.
In this study, we develop the adjusted deviance residuals for the gamma regression model (GRM) by following Cordeiro's (2004) method. These adjusted deviance residuals under the GRM are used for influence diagnostics. A comparative analysis has been sorted out between our proposed method of the adjusted deviance residuals and an existing method for influence diagnostics. These results are illustrated by a simulation study and using a real data set. They are presented for different values of dispersion and sample sizes and indicate the significant role of the GRM inferences.  相似文献   

5.
The error contrasts from an experimental design can be constructed from uncorrelated residuals normally associated with the linear model. In this paper uncorrelated residuals are defined for the linear model that has a design matrix which is less than full rank, typical of many experimental design representations. It transpires in this setting, that for certain choices of uncorrelated residuals, corresponding to recursive type residuals, there is a natural partition of information when two variance components are known to be present. Under an assumtion of normality of errors this leads to construction of appropriate F-tests for testing heteroscedasticity. The test, which can be optimal, is applied to two well known data sets to illustrate its usefullness.  相似文献   

6.
Abstract.  The Extended Growth Curve model is considered. It turns out that the estimated mean of the model is the projection of the observations on the space generated by the design matrices which turns out to be the sum of two tensor product spaces. The orthogonal complement of this space is decomposed into four orthogonal spaces and residuals are defined by projecting the observation matrix on the resulting components. The residuals are interpreted and some remarks are given as to why we should not use ordinary residuals, what kind of information our residuals give and how this information might be used to validate model assumptions and detect outliers and influential observations. It is shown that the residuals are symmetrically distributed around zero and are uncorrelated with each other. The covariance between the residuals and the estimated model as well as the dispersion matrices for the residuals are also given.  相似文献   

7.
We investigate mixed analysis of covariance models for the 'one-step' assessment of conditional QT prolongation. Initially, we consider three different covariance structures for the data, where between-treatment covariance of repeated measures is modelled respectively through random effects, random coefficients, and through a combination of random effects and random coefficients. In all three of those models, an unstructured covariance pattern is used to model within-treatment covariance. In a fourth model, proposed earlier in the literature, between-treatment covariance is modelled through random coefficients but the residuals are assumed to be independent identically distributed (i.i.d.). Finally, we consider a mixed model with saturated covariance structure. We investigate the precision and robustness of those models by fitting them to a large group of real data sets from thorough QT studies. Our findings suggest: (i) Point estimates of treatment contrasts from all five models are similar. (ii) The random coefficients model with i.i.d. residuals is not robust; the model potentially leads to both under- and overestimation of standard errors of treatment contrasts and therefore cannot be recommended for the analysis of conditional QT prolongation. (iii) The combined random effects/random coefficients model does not always converge; in the cases where it converges, its precision is generally inferior to the other models considered. (iv) Both the random effects and the random coefficients model are robust. (v) The random effects, the random coefficients, and the saturated model have similar precision and all three models are suitable for the one-step assessment of conditional QT prolongation.  相似文献   

8.
Traditionally, sphericity (i.e., independence and homoscedasticity for raw data) is put forward as the condition to be satisfied by the variance–covariance matrix of at least one of the two observation vectors analyzed for correlation, for the unmodified t test of significance to be valid under the Gaussian and constant population mean assumptions. In this article, the author proves that the sphericity condition is too strong and a weaker (i.e., more general) sufficient condition for valid unmodified t testing in correlation analysis is circularity (i.e., independence and homoscedasticity after linear transformation by orthonormal contrasts), to be satisfied by the variance–covariance matrix of one of the two observation vectors. Two other conditions (i.e., compound symmetry for one of the two observation vectors; absence of correlation between the components of one observation vector, combined with a particular pattern of joint heteroscedasticity in the two observation vectors) are also considered and discussed. When both observation vectors possess the same variance–covariance matrix up to a positive multiplicative constant, the circularity condition is shown to be necessary and sufficient. “Observation vectors” may designate partial realizations of temporal or spatial stochastic processes as well as profile vectors of repeated measures. From the proof, it follows that an effective sample size appropriately defined can measure the discrepancy from the more general sufficient condition for valid unmodified t testing in correlation analysis with autocorrelated and heteroscedastic sample data. The proof is complemented by a simulation study. Finally, the differences between the role of the circularity condition in the correlation analysis and its role in the repeated measures ANOVA (i.e., where it was first introduced) are scrutinized, and the link between the circular variance–covariance structure and the centering of observations with respect to the sample mean is emphasized.  相似文献   

9.
Beta regression is often used to model the relationship between a dependent variable that assumes values on the open interval (0, 1) and a set of predictor variables. An important challenge in beta regression is to find residuals whose distribution is well approximated by the standard normal distribution. Two previous works compared residuals in beta regression, but the authors did not include the quantile residual. Using Monte Carlo simulation techniques, this article studies the behavior of certain residuals in beta regression in several scenarios. Overall, the results suggest that the distribution of the quantile residual is better approximated by the standard normal distribution than that of the other residuals in most scenarios. Three applications illustrate the effectiveness of the quantile residual.  相似文献   

10.
Testing the equality of variances of two linear models with common β-parameter is considered. A test based on least squares residuals (ASR test) is proposed, and it is shown that this test is invariant under the group of scale and translation changes. For some special cases, it is also proved that this test has a monotone power function. Finding the exact critical values of this test is not easy; an approximation is given to facilitate the computation of these. The powers of the BLUS test, the F-test and the new test are computed for various alternatives and compared in a particular case. The proposed test seems to be locally more powerful than the alternative tests.  相似文献   

11.
On the Relation between Edge and Vertex Modelling in Shape Analysis   总被引:1,自引:0,他引:1  
Objects in the plane with no obvious landmarks can be described by either vertex transformation vectors or edge transformation vectors. In this paper we provide the relation between the two transformation vectors. Grenander & Miller (1994 ) use a multivariate normal distribution with a block circulant covariance matrix to model the edge transformation vector. This type of model is also feasible for the vertex transformation vector and in certain cases the free parameters of the two models match up in a simple way. A vertex model and an edge model are applied to a data set of sand particles to explore shape variability.  相似文献   

12.
The class of joint mean‐covariance models uses the modified Cholesky decomposition of the within subject covariance matrix in order to arrive to an unconstrained, statistically meaningful reparameterisation. The new parameterisation of the covariance matrix has two sets of parameters that separately describe the variances and correlations. Thus, with the mean or regression parameters, these models have three sets of distinct parameters. In order to alleviate the problem of inefficient estimation and downward bias in the variance estimates, inherent in the maximum likelihood estimation procedure, the usual REML estimation procedure adjusts for the degrees of freedom lost due to the estimation of the mean parameters. Because of the parameterisation of the joint mean covariance models, it is possible to adapt the usual REML procedure in order to estimate the variance (correlation) parameters by taking into account the degrees of freedom lost by the estimation of both the mean and correlation (variance) parameters. To this end, here we propose adjustments to the estimation procedures based on the modified and adjusted profile likelihoods. The methods are illustrated by an application to a real data set and simulation studies. The Canadian Journal of Statistics 40: 225–242; 2012 © 2012 Statistical Society of Canada  相似文献   

13.
Summary. Many geophysical regression problems require the analysis of large (more than 104 values) data sets, and, because the data may represent mixtures of concurrent natural processes with widely varying statistical properties, contamination of both response and predictor variables is common. Existing bounded influence or high breakdown point estimators frequently lack the ability to eliminate extremely influential data and/or the computational efficiency to handle large data sets. A new bounded influence estimator is proposed that combines high asymptotic efficiency for normal data, high breakdown point behaviour with contaminated data and computational simplicity for large data sets. The algorithm combines a standard M -estimator to downweight data corresponding to extreme regression residuals and removal of overly influential predictor values (leverage points) on the basis of the statistics of the hat matrix diagonal elements. For this, the exact distribution of the hat matrix diagonal elements p ii for complex multivariate Gaussian predictor data is shown to be β ( p ii ,  m ,  N − m ), where N is the number of data and m is the number of parameters. Real geophysical data from an auroral zone magnetotelluric study which exhibit severe outlier and leverage point contamination are used to illustrate the estimator's performance. The examples also demonstrate the utility of looking at both the residual and the hat matrix distributions through quantile–quantile plots to diagnose robust regression problems.  相似文献   

14.
Often the unknown covariance structure of a stationary, dependent, Gaussian error sequence can be simply parametrised. The error sequence can either be directly observed or observed only through a random sequence containing a deterministic regression model. The method of scoring is used here, in conjunction with recursive estimation techniques, to effect the maximum likelihood estimation of the covariance parameters. Sequences of recursive residuals, useful in model diagnostics and data analysis, are obtained in the estimation procedure.  相似文献   

15.
Khuri (1989) tests for the intraclass covariance structure implied by the balanced two-way mixed analysis of variance model by computing wilks' likelihood ratio test statistic using the sample covariance matrix of the vectors of treatment means. In the unbalanced case he uses a linear transformation to augment the treatment-mean vectors to vectors which are expected to satisfy the intraclass structure, and then computes Wilks' statistic using these augmented vectors. We point out that the augmentation process is in fact equivalent to deleting observations until the design is balanced, so that the augmented test actually uses less information than that contained in the original sample means.  相似文献   

16.
A p-component set of responses have been constructed by a location-scale transformation to a p-component set of error variables, the covariance matrix of the set of error variables being of intra-class covariance structure:all variances being unity, and covariance being equal [IML0001]. A sample of size n has been described as a conditional structural model, conditional on the value of the intra-class correlation coefficient ρ. The conditional technique of structural inference provides the marginal likelihood function of ρ based on the standardized residuals. For the normal case, the marginal likelihood function of ρ is seen to be dependent on the standardized residuals through the sample intra-class correlation coefficient. By the likelihood modulation technique, the nonnull distribution of the sample intra-class correlation coefficient has also been obtained.  相似文献   

17.
Simple principal components   总被引:3,自引:0,他引:3  
We introduce an algorithm for producing simple approximate principal components directly from a variance–covariance matrix. At the heart of the algorithm is a series of 'simplicity preserving' linear transformations. Each transformation seeks a direction within a two-dimensional subspace that has maximum variance. However, the choice of directions is limited so that the direction can be represented by a vector of integers whenever the subspace can also be represented by vector if integers. The resulting approximate components can therefore always be represented by integers. Furthermore the elements of these integer vectors are often small, particularly for the first few components. We demonstrate the performance of this algorithm on two data sets and show that good approximations to the principal components that are also clearly simple and interpretable can result.  相似文献   

18.
We first consider the estimation of the finite rate of population increase or population growth rate, u i , using capture-recapture data from open populations. We review estimation and modelling of u i under three main approaches to modelling openpopulation data: the classic approach of Jolly (1965) and Seber (1965), the superpopulation approach of Crosbie & Manly (1985) and Schwarz & Arnason (1996), and the temporal symmetry approach of Pradel (1996). Next, we consider the contributions of different demographic components to u i using a probabilistic approach based on the composition of the population at time i + 1 (Nichols et al., 2000b). The parameters of interest are identical to the seniority parameters, n i , of Pradel (1996). We review estimation of n i under the classic, superpopulation, and temporal symmetry approaches. We then compare these direct estimation approaches for u i and n i with analogues computed using projection matrix asymptotics. We also discuss various extensions of the estimation approaches to multistate applications and to joint likelihoods involving multiple data types.  相似文献   

19.
We consider the supervised classification setting, in which the data consist of p features measured on n observations, each of which belongs to one of K classes. Linear discriminant analysis (LDA) is a classical method for this problem. However, in the high-dimensional setting where p ? n, LDA is not appropriate for two reasons. First, the standard estimate for the within-class covariance matrix is singular, and so the usual discriminant rule cannot be applied. Second, when p is large, it is difficult to interpret the classification rule obtained from LDA, since it involves all p features. We propose penalized LDA, a general approach for penalizing the discriminant vectors in Fisher's discriminant problem in a way that leads to greater interpretability. The discriminant problem is not convex, so we use a minorization-maximization approach in order to efficiently optimize it when convex penalties are applied to the discriminant vectors. In particular, we consider the use of L(1) and fused lasso penalties. Our proposal is equivalent to recasting Fisher's discriminant problem as a biconvex problem. We evaluate the performances of the resulting methods on a simulation study, and on three gene expression data sets. We also survey past methods for extending LDA to the high-dimensional setting, and explore their relationships with our proposal.  相似文献   

20.
Generalized least squares estimation of a system of seemingly unrelated regressions is usually a two-stage method: (1) estimation of cross-equation covariance matrix from ordinary least squares residuals for transforming data, and (2) application of least squares on transformed data. In presence of multicollinearity problem, conventionally ridge regression is applied at stage 2. We investigate the usage of ridge residuals at stage 1, and show analytically that the covariance matrix based on the least squares residuals does not always result in more efficient estimator. A simulation study and an application to a system of firms' gross investment support our finding.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号