首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Simple principal components   总被引:3,自引:0,他引:3  
We introduce an algorithm for producing simple approximate principal components directly from a variance–covariance matrix. At the heart of the algorithm is a series of 'simplicity preserving' linear transformations. Each transformation seeks a direction within a two-dimensional subspace that has maximum variance. However, the choice of directions is limited so that the direction can be represented by a vector of integers whenever the subspace can also be represented by vector if integers. The resulting approximate components can therefore always be represented by integers. Furthermore the elements of these integer vectors are often small, particularly for the first few components. We demonstrate the performance of this algorithm on two data sets and show that good approximations to the principal components that are also clearly simple and interpretable can result.  相似文献   

2.
Traditionally, sphericity (i.e., independence and homoscedasticity for raw data) is put forward as the condition to be satisfied by the variance–covariance matrix of at least one of the two observation vectors analyzed for correlation, for the unmodified t test of significance to be valid under the Gaussian and constant population mean assumptions. In this article, the author proves that the sphericity condition is too strong and a weaker (i.e., more general) sufficient condition for valid unmodified t testing in correlation analysis is circularity (i.e., independence and homoscedasticity after linear transformation by orthonormal contrasts), to be satisfied by the variance–covariance matrix of one of the two observation vectors. Two other conditions (i.e., compound symmetry for one of the two observation vectors; absence of correlation between the components of one observation vector, combined with a particular pattern of joint heteroscedasticity in the two observation vectors) are also considered and discussed. When both observation vectors possess the same variance–covariance matrix up to a positive multiplicative constant, the circularity condition is shown to be necessary and sufficient. “Observation vectors” may designate partial realizations of temporal or spatial stochastic processes as well as profile vectors of repeated measures. From the proof, it follows that an effective sample size appropriately defined can measure the discrepancy from the more general sufficient condition for valid unmodified t testing in correlation analysis with autocorrelated and heteroscedastic sample data. The proof is complemented by a simulation study. Finally, the differences between the role of the circularity condition in the correlation analysis and its role in the repeated measures ANOVA (i.e., where it was first introduced) are scrutinized, and the link between the circular variance–covariance structure and the centering of observations with respect to the sample mean is emphasized.  相似文献   

3.
Given two jointly observed random vectors Y and Z of the same dimension, let Y be a reordered version of Y and Z the resulting vector of concomitants of order statistics. When X is a covariate of interest, also jointly observed with Y, the authors obtain the joint covariance structure of (X, y, Z) and the related correlation parameters explicitly, under the assumption that the vector (X, Y, Z) is normal and that its joint covariance structure is permutation symmetric. They also discuss extensions to elliptically contoured distributions.  相似文献   

4.
In this paper we examine the properties of four types of residual vectors, arising from fitting a linear regression model to a set of data by least squares. The four types of residuals are (i) the Stepwise residuals (Hedayat and Robson, 1970), (ii) the Recursive residuals (Brown, Durbin, and Evans, 1975), (iii) the Sequentially Adjusted residuals (to be defined herein), and (iv) the BLUS residuals (Theil, 1965, 1971). We also study the relationships among the four residual vectors. It is found that, for any given sequence of observations, (i) the first three sets of residuals are identical, (ii) each of the first three sets, being identical, is a member of Thei’rs (1965, 1971) family of residuals; specifically, they are Linear Unbiased with a Scalar covariance matrix (LUS) but not Best Linear Unbiased with a Scalar covariance matrix (BLUS). We find the explicit form of the transformation matrix and show that the first three sets of residual vectors can be written as an orthogonal transformation of the BLUS residual vector. These and other properties may prove to be useful in the statistical analysis of residuals.  相似文献   

5.
An alternative graphical method, called the SSR plot, is proposed for use with a multiple regression model. The new method uses the fact that the sum of squares for regression (SSR) of two explanatory variables can be partitioned into the SSR of one variable and the increment in SSR due to the addition of the second variable. The SSR plot represents each explanatory variable as a vector in a half circle. Our proposed SSR plot explains that the explanatory variables corresponding to the vectors located closer to the horizontal axis have stronger effects on the response variable. Furthermore, for a regression model with two explanatory variables, the magnitude of the angle between two vectors can be used to identify suppression.  相似文献   

6.
One majoraspect in medical research is to relate the survival times ofpatients with the relevant covariates or explanatory variables.The proportional hazards model has been used extensively in thepast decades with the assumption that the covariate effects actmultiplicatively on the hazard function, independent of time.If the patients become more homogeneous over time, say the treatmenteffects decrease with time or fade out eventually, then a proportionalodds model may be more appropriate. In the proportional oddsmodel, the odds ratio between patients can be expressed as afunction of their corresponding covariate vectors, in which,the hazard ratio between individuals converges to unity in thelong run. In this paper, we consider the estimation of the regressionparameter for a semiparametric proportional odds model at whichthe baseline odds function is an arbitrary, non-decreasing functionbut is left unspecified. Instead of using the exact survivaltimes, only the rank order information among patients is used.A Monte Carlo method is used to approximate the marginal likelihoodfunction of the rank invariant transformation of the survivaltimes which preserves the information about the regression parameter.The method can be applied to other transformation models withcensored data such as the proportional hazards model, the generalizedprobit model or others. The proposed method is applied to theVeteran's Administration lung cancer trial data.  相似文献   

7.
We introduce a notion of generator multigraph as an alternative to interaction graphs for the study of hierarchical loglinear models. Generator multigraphs are defined directly from the generator class of the model and are shown to be natural for recognizing decomposable models, obtaining maximum likelihood estimators, and finding conditional independencies in a model. The graph theory involved focuses on maximum spanning trees and edge cutsets (rather than on chordal graphs and minimal vertex separators as with interaction graphs).  相似文献   

8.
Abstract

A simple method based on sliced inverse regression (SIR) is proposed to explore an effective dimension reduction (EDR) vector for the single index model. We avoid the principle component analysis step of the original SIR by using two sample mean vectors in two slices of the response variable and their difference vector. The theories become simpler, the method is equivalent to the multiple linear regression with dichotomized response, and the estimator can be expressed by a closed form, although the objective function might be an unknown nonlinear. It can be applied for the case when the number of covariates is large, and it requires no matrix operation or iterative calculation.  相似文献   

9.
Although multivariate statistical process control has been receiving a well-deserved attention in the literature, little work has been done to deal with multi-attribute processes. While by the NORTA algorithm one can generate an arbitrary multi-dimensional random vector by transforming a multi-dimensional standard normal vector, in this article, using inverse transformation method, we initially transform a multi-attribute random vector so that the marginal probability distributions associated with the transformed random variables are approximately normal. Then, we estimate the covariance matrix of the transformed vector via simulation. Finally, we apply the well-known T 2 control chart to the transformed vector. We use some simulation experiments to illustrate the proposed method and to compare its performance with that of the deleted-Y method. The results show that the proposed method works better than the deleted-Y method in terms of the out-of-control average run length criterion.  相似文献   

10.
An alternative approximation to the variance of transformation score is given, based on an asymptotic expansion of the transformation estimator. It is then compared with the variance approximation given by Lawrance (1987) in terms of standardized scores. Simulations show that the two standardized scores behave very similarly when model error standard deviation is small. However,when error standard deviation is not small, the new standardized score outperforms that of Lawrance,especially in the structured models.  相似文献   

11.
This article considers pairwise-difference rank estimators of the coefficient vector in a transformation model. These estimators, like other existing rank estimators, require no subjective bandwidth choice. Monte Carlo simulations, numerical asymptotic efficiency comparisons, and two empirical applications suggest that the proposed estimators perform well in comparison with existing semiparametric estimators.  相似文献   

12.
There is an increasing number of goodness-of-fit tests whose test statistics measure deviations between the empirical characteristic function and an estimated characteristic function of the distribution in the null hypothesis. With the aim of overcoming certain computational difficulties with the calculation of some of these test statistics, a transformation of the data is considered. To apply such a transformation, the data are assumed to be continuous with arbitrary dimension, but we also provide a modification for discrete random vectors. Practical considerations leading to analytic formulas for the test statistics are studied, as well as theoretical properties such as the asymptotic null distribution, validity of the corresponding bootstrap approximation, and consistency of the test against fixed alternatives. Five applications are provided in order to illustrate the theory. These applications also include numerical comparison with other existing techniques for testing goodness-of-fit.  相似文献   

13.
We derive an exact F-test for random effects in the nested-error regression model. The derivation utilizes a matrix decomposition that offers a transformation of the response vector into two independent subvectors. When the random effects are absent, the test statistic reduces to a ratio of two independent residual sums of squares that are computed by fitting a regression model using each subvector. A small simulation study compares the power of the F-test with various recent tests and shows that the proposed test has a competitive performance under small as well as large number of clusters.  相似文献   

14.
Khuri (1989) tests for the intraclass covariance structure implied by the balanced two-way mixed analysis of variance model by computing wilks' likelihood ratio test statistic using the sample covariance matrix of the vectors of treatment means. In the unbalanced case he uses a linear transformation to augment the treatment-mean vectors to vectors which are expected to satisfy the intraclass structure, and then computes Wilks' statistic using these augmented vectors. We point out that the augmentation process is in fact equivalent to deleting observations until the design is balanced, so that the augmented test actually uses less information than that contained in the original sample means.  相似文献   

15.
When a vector of sample proportions is not obtained through a simple random sampling, the covariance matrix for the sample vector can differ substantially from the one corresponding to the multinomial model (Wilson 1989). For example, clustering effects of subject effects in repeated-measure experiments can cause the variance of the observed proportions to be much larger than variances under the multinomial model. The phenomenon is generally referred to as overdispersion. Tallis (1962) proposed a model for identically distributed multinomials with a common measure of correlation and referred to it as the generalized multinomial model. This generalized multinomial model is extended in this article to account for overdispersion by allowing the vectors of proportions to vary according to a Dirichlet distribution. The generalized Dirichlet-multinomial model (as it is referred to here) allows for a second order of pairwise correlation among units, a type of assumption found reasonable in some biological data (Kupper and Haseman 1978) and introduced here to business data. An alternative derivation allowing for two kinds of variation is also considered. Asymptotic normal properties of parameter estimators are used to construct Wald statistics for testing hypotheses. The methods are illustrated with applications to performance evaluation monthly data and an integrated circuit yield analysis.  相似文献   

16.
Using some logarithmic and integral transformation we transform a continuous covariate frailty model into a polynomial regression model with a random effect. The responses of this mixed model can be ‘estimated’ via conditional hazard function estimation. The random error in this model does not have zero mean and its variance is not constant along the covariate and, consequently, these two quantities have to be estimated. Since the asymptotic expression for the bias is complicated, the two-large-bandwidth trick is proposed to estimate the bias. The proposed transformation is very useful for clustered incomplete data subject to left truncation and right censoring (and for complex clustered data in general). Indeed, in this case no standard software is available to fit the frailty model, whereas for the transformed model standard software for mixed models can be used for estimating the unknown parameters in the original frailty model. A small simulation study illustrates the good behavior of the proposed method. This method is applied to a bladder cancer data set.  相似文献   

17.
Most parametric statistical methods are based on a set of assumptions: normality, linearity and homoscedasticity. Transformation of a metric response is a popular method to meet these assumptions. In particular, transformation of the response of a linear model is a popular method when attempting to satisfy the Gaussian assumptions on the error components in the model. A particular problem with common transformations such as the logarithm or the Box–Cox family is that negative and zero data values cannot be transformed. This paper proposes a new transformation which allows negative and zero data values. The method for estimating the transformation parameter consider an objective criteria based on kurtosis and skewness for achieving normality. Use of the new transformation and the method for estimating the transformation parameter are illustrated with three data sets.  相似文献   

18.
The behavior of the Box-Cox estimate of power transformation is further examined. Through the asymptotic expansions and small-σ approximations, the exact nature of dependence of transformation estimation on the model structure, the spread of the means and the error variance is revealed. The results are shown to be useful in assessing what Box and Cox called transformation potential of a particular data set.  相似文献   

19.
Compositional time series are multivariate time series which at each time point are proportions that sum to a constant. Accurate inference for such series which occur in several disciplines such as geology, economics and ecology is important in practice. Usual multivariate statistical procedures ignore the inherent constrained nature of these observations as parts of a whole and may lead to inaccurate estimation and prediction. In this article, a regression model with vector autoregressive moving average (VARMA) errors is fit to the compositional time series after an additive log ratio (ALR) transformation. Inference is carried out in a hierarchical Bayesian framework using Markov chain Monte Carlo techniques. The approach is illustrated on compositional time series of mortality events in Los Angeles in order to investigate dependence of different categories of mortality on air quality.  相似文献   

20.
Generalised estimating equations (GEE) for regression problems with vector‐valued responses are examined. When the response vectors are of mixed type (e.g. continuous–binary response pairs), the GEE approach is a semiparametric alternative to full‐likelihood copula methods, and is closely related to Prentice & Zhao's mean‐covariance estimation equations approach. When the response vectors are of the same type (e.g. measurements on left and right eyes), the GEE approach can be viewed as a ‘plug‐in’ to existing methods, such as the vglm function from the state‐of‐the‐art VGAM package in R. In either scenario, the GEE approach offers asymptotically correct inferences on model parameters regardless of whether the working variance–covariance model is correctly or incorrectly specified. The finite‐sample performance of the method is assessed using simulation studies based on a burn injury dataset and a sorbinil eye trial dataset. The method is applied to data analysis examples using the same two datasets, as well as to a trivariate binary dataset on three plant species in the Hunua ranges of Auckland.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号