首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 462 毫秒
1.
This paper discusses a pre-test regression estimator which uses the least squares estimate when it is “large” and a ridge regression estimate for “small” regression coefficients, where the preliminary test is applied separately to each regression coefficient in turn to determine whether it is “large” or “small.” For orthogonal regressors, the exact finite-sample bias and mean squared error of the pre-test estimator are derived. The latter is less biased than a ridge estimator, and over much of the parameter space the pre-test estimator has smaller mean squared error than least squares. A ridge estimator is found to be inferior to the pre-test estimator in terms of mean squared error in many situations, and at worst the latter estimator is only slightly less efficient than the former at commonly used significance levels.  相似文献   

2.
主成分回归方法已得到广泛应用,但该方法是否能减小参数估计的误差,理论上并没有明确的结论。以3个假设模型为例,运用模拟计算的方法对主成分回归方法进行了研究,发现主成分回归估计的误差可能比普通最小二乘估计更小,也可能更大,依赖于实际的模型。  相似文献   

3.
New robust estimates for variance components are introduced. Two simple models are considered: the balanced one-way classification model with a random factor and the balanced mixed model with one random factor and one fixed factor. However, the method of estimation proposed can be extended to more complex models. The new method of estimation we propose is based on the relationship between the variance components and the coefficients of the least-mean-squared-error predictor between two observations of the same group. This relationship enables us to transform the problem of estimating the variance components into the problem of estimating the coefficients of a simple linear regression model. The variance-component estimators derived from the least-squares regression estimates are shown to coincide with the maximum-likelihood estimates. Robust estimates of the variance components can be obtained by replacing the least-squares estimates by robust regression estimates. In particular, a Monte Carlo study shows that for outlier-contaminated normal samples, the estimates of variance components derived from GM regression estimates and the derived test outperform other robust procedures.  相似文献   

4.
In “stepwise” regression analysis, the usual procedure enters or removes variables at each “step” on the basis of testing whether certain partial correlation coefficients are zero. An alternative method suggested in this paper involves testing the hypothesis that the mean square error of prediction does not decrease from one step to the next. This is equivalent to testing that the partial correlation coefficient is equal to a certain nonzero constant. For sample sizes sufficiently large, Fisher's z transformation can be used to obtain an asymptotically UMP unbiased test. The two methods are contrasted with an example involving actual data.  相似文献   

5.
Matrix analogues are given for a known scalar identity which relates certain expectations with respect to the Wishart distribution. (The scalar identity was independently derived by C. Stein and L. Haff.) The matrix analogues are more aptly called “matrix extensions.” They can be derived by using the scalar identity; nevertheless, they are seen (in quite elementary terms) to be more general than the latter. A method of doing multivariate calculations is developed from the identities, and several examples are worked in detail. We compute the first two moments of the regression coefficients and another matrix arising in regression analysis. Also, we give a new result for the matrix analogue of squared multiple correlation: the bias correction of Ezekiel (1930), a result often used in model building, is extended to the case of two or more dependent variables.  相似文献   

6.
7.
We investigate the effect of measurement error on principal component analysis in the high‐dimensional setting. The effects of random, additive errors are characterized by the expectation and variance of the changes in the eigenvalues and eigenvectors. The results show that the impact of uncorrelated measurement error on the principal component scores is mainly in terms of increased variability and not bias. In practice, the error‐induced increase in variability is small compared with the original variability for the components corresponding to the largest eigenvalues. This suggests that the impact will be negligible when these component scores are used in classification and regression or for visualizing data. However, the measurement error will contribute to a large variability in component loadings, relative to the loading values, such that interpretation based on the loadings can be difficult. The results are illustrated by simulating additive Gaussian measurement error in microarray expression data from cancer tumours and control tissues.  相似文献   

8.
In some clinical trials and epidemiologic studies, investigators are interested in knowing whether the variability of a biomarker is independently predictive of clinical outcomes. This question is often addressed via a naïve approach where a sample-based estimate (e.g., standard deviation) is calculated as a surrogate for the “true” variability and then used in regression models as a covariate assumed to be free of measurement error. However, it is well known that the measurement error in covariates causes underestimation of the true association. The issue of underestimation can be substantial when the precision is low because of limited number of measures per subject. The joint analysis of survival data and longitudinal data enables one to account for the measurement error in longitudinal data and has received substantial attention in recent years. In this paper we propose a joint model to assess the predictive effect of biomarker variability. The joint model consists of two linked sub-models, a linear mixed model with patient-specific variance for longitudinal data and a full parametric Weibull distribution for survival data, and the association between two models is induced by a latent Gaussian process. Parameters in the joint model are estimated under Bayesian framework and implemented using Markov chain Monte Carlo (MCMC) methods with WinBUGS software. The method is illustrated in the Ocular Hypertension Treatment Study to assess whether the variability of intraocular pressure is an independent risk of primary open-angle glaucoma. The performance of the method is also assessed by simulation studies.  相似文献   

9.
The name “multicollinearity” was first introduced by Ragnar Frisch [2]. In his original formulation the economic variables are supposed to be composed of two parts, a systematic or “true” and an “error” component. There are at least two other cases when the same type of indeterminancy of the estimates arises due to different reasons. Considerable attention was given to this problem which arises when some or all the variables in a regression equation are highly inter-correlated and it becomes almost impossible to separate their influences and obtain the corresponding estimates of the regression coefficients. Consider a linear regression model  相似文献   

10.
In this article, we introduce restricted principal components regression (RPCR) estimator by combining the approaches followed in obtaining the restricted least squares estimator and the principal components regression estimator. The performance of the RPCR estimator with respect to the matrix and the generalized mean square error are examined. We also suggest a testing procedure for linear restrictions in principal components regression by using singly and doubly non-central F distribution.  相似文献   

11.
The Joy of Copulas: Bivariate Distributions with Uniform Marginals   总被引:1,自引:0,他引:1  
We describe a class of bivariate distributions whose marginals are uniform on the unit interval. Such distributions are often called “copulas.” The particular copulas we present are especially well suited for use in undergraduate mathematical statistics courses, as many of their basic properties can be derived using elementary calculus. In particular, we show how these copulas can be used to illustrate the existence of distributions with singular components and to give a geometric interpretation to Kendall's tau.  相似文献   

12.
In statistical practice, rectangular tables of numeric data are commonplace, and are often analyzed using dimension-reduction methods like the singular value decomposition and its close cousin, principal component analysis (PCA). This analysis produces score and loading matrices representing the rows and the columns of the original table and these matrices may be used for both prediction purposes and to gain structural understanding of the data. In some tables, the data entries are necessarily nonnegative (apart, perhaps, from some small random noise), and so the matrix factors meant to represent them should arguably also contain only nonnegative elements. This thinking, and the desire for parsimony, underlies such techniques as rotating factors in a search for “simple structure.” These attempts to transform score or loading matrices of mixed sign into nonnegative, parsimonious forms are, however, indirect and at best imperfect. The recent development of nonnegative matrix factorization, or NMF, is an attractive alternative. Rather than attempt to transform a loading or score matrix of mixed signs into one with only nonnegative elements, it directly seeks matrix factors containing only nonnegative elements. The resulting factorization often leads to substantial improvements in interpretability of the factors. We illustrate this potential by synthetic examples and a real dataset. The question of exactly when NMF is effective is not fully resolved, but some indicators of its domain of success are given. It is pointed out that the NMF factors can be used in much the same way as those coming from PCA for such tasks as ordination, clustering, and prediction. Supplementary materials for this article are available online.  相似文献   

13.
The problem of consistent estimation of regression coefficients in a multivariate linear ultrastructural measurement error model is considered in this article when some additional information on regression coefficients is available a priori. Such additional information is expressible in the form of stochastic linear restrictions. Utilizing stochastic restrictions given a priori, some methodologies are presented to obtain the consistent estimators of regression coefficients under two types of additional information separately, viz., covariance matrix of measurement errors and reliability matrix associated with explanatory variables. The measurement errors are assumed to be not necessarily normally distributed. The asymptotic properties of the proposed estimators are derived and analyzed analytically as well as numerically through a Monte Carlo simulation experiment.  相似文献   

14.
We consider the joint analysis of two matched matrices which have common rows and columns, for example multivariate data observed at two time points or split according to a dichotomous variable. Methods of interest include principal components analysis for interval-scaled data, correspondence analysis for frequency data, log-ratio analysis of compositional data and linear biplots in general, all of which depend on the singular value decomposition. A simple result in matrix algebra shows that by setting up two matched matrices in a particular block format, matrix sum and difference components can be analysed using a single application of the singular value decomposition algorithm. The methodology is applied to data from the International Social Survey Program comparing male and female attitudes on working wives across eight countries. The resulting biplots optimally display the overall cross-cultural differences as well as the male-female differences. The case of more than two matched matrices is also discussed.  相似文献   

15.
Consider the linear regression model Y = Xθ+ ε where Y denotes a vector of n observations on the dependent variable, X is a known matrix, θ is a vector of parameters to be estimated and e is a random vector of uncorrelated errors. If X'X is nearly singular, that is if the smallest characteristic root of X'X s small then a small perurbation in the elements of X, such as due to measurement errors, induces considerable variation in the least squares estimate of θ. In this paper we examine for the asymptotic case when n is large the effect of perturbation with regard to the bias and mean squared error of the estimate.  相似文献   

16.
Increasing attention is being given to problems involving binary outcomes with covariates subject to measurement error. Here, we consider the two group normal discriminant model where a subset of the continuous variates are subject to error and will typically be replaced by a vector of surrogates, perhaps of different dimension. Correcting for the measurement error is made possible by a double sampling scheme in which the surrogates are collected on all units and true values are obtained on a random subset of units. Such a scheme allows us to consider a rich set of measurement error models which extend the traditional additive error model. Maximum likelihood estimators and their asymptotic properties are derived under a variety of models for the relationship between true values and the surrogates. Specific attention is given to the coefficients in the resulting logistic regression model. Optimal allocations are derived which minimize the variance of the estimated slope subject to cost constraints for the case where there is a univariate covariate but a possibly multivariate surrogate.  相似文献   

17.
In this paper, we propose a bias corrected estimate of the regression coefficient for the generalized probit regression model when the covariates are subject to measurement error and the responses are subject to interval censoring. The main improvement of our method is that it reduces most of the bias that the naive estimates have. The great advantage of our method is that it is baseline and censoring distribution free, in a sense that the investigator does not need to calculate the baseline or the censoring distribution to obtain the estimator of the regression coefficient, an important property of Cox regression model. A sandwich estimator for the variance is also proposed. Our procedure can be generalized to general measurement error distribution as long as the first four moments of the measurement error are known. The results of extensive simulations show that our approach is very effective in eliminating the bias when the measurement error is not too large relative to the error term of the regression model.  相似文献   

18.
Principal component regression (PCR) has two steps: estimating the principal components and performing the regression using these components. These steps generally are performed sequentially. In PCR, a crucial issue is the selection of the principal components to be included in regression. In this paper, we build a hierarchical probabilistic PCR model with a dynamic component selection procedure. A latent variable is introduced to select promising subsets of components based upon the significance of the relationship between the response variable and principal components in the regression step. We illustrate this model using real and simulated examples. The simulations demonstrate that our approach outperforms some existing methods in terms of root mean squared error of the regression coefficient.  相似文献   

19.
In comparison to other experimental studies, multicollinearity appears frequently in mixture experiments, a special study area of response surface methodology, due to the constraints on the components composing the mixture. In the analysis of mixture experiments by using a special generalized linear model, logistic regression model, multicollinearity causes precision problems in the maximum-likelihood logistic regression estimate. Therefore, effects due to multicollinearity can be reduced to a certain extent by using alternative approaches. One of these approaches is to use biased estimators for the estimation of the coefficients. In this paper, we suggest the use of logistic ridge regression (RR) estimator in the cases where there is multicollinearity during the analysis of mixture experiments using logistic regression. Also, for the selection of the biasing parameter, we use fraction of design space plots for evaluating the effect of the logistic RR estimator with respect to the scaled mean squared error of prediction. The suggested graphical approaches are illustrated on the tumor incidence data set.  相似文献   

20.
The role of Wikipedia for learning has been debated because it does not conform to the usual standards. Despite this, people use it, due to the ubiquity of Wikipedia entries in the outcomes from popular search engines. It is important for academic disciplines, including statistics, to ensure they are correctly represented in a medium where anyone can assume the role of discipline expert. In this context, we first develop a tool for evaluating Wikipedia articles for topics with a procedural component. Then, using this tool, five Wikipedia articles on basic statistical concepts are critiqued from the point of view of a self-learner: “arithmetic mean,” “standard deviation,” “standard error,” “confidence interval,” and “histogram.” We find that the articles, in general, are poor, and some articles contain inaccuracies. We propose that Wikipedia be actively discouraged for self-learning (using, for example, a classroom activity) except to give a brief overview; that in more formal learning environments, teachers be explicit about not using Wikipedia as a learning resource for course content; and, because Wikipedia is used regardless of considered advice or the organizational protocols in place, teachers move away from minimal contact with Wikipedia towards more constructive engagement.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号