首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 132 毫秒
1.
In this paper, two new multiple influential observation detection methods, GCD.GSPR and mCD*, are introduced for logistic regression. The proposed diagnostic measures are compared with the generalized difference in fits (GDFFITS) and the generalized squared difference in beta (GSDFBETA), which are multiple influential diagnostics. The simulation study is conducted with one, two and five independent variable logistic regression models. The performance of the diagnostic measures is examined for a single contaminated independent variable for each model and in the case where all the independent variables are contaminated with certain contamination rates and intensity. In addition, the performance of the diagnostic measures is compared in terms of the correct identification rate and swamping rate via a frequently referred to data set in the literature.  相似文献   

2.
Since the seminal paper by Cook (1977) in which he introduced Cook's distance, the identification of influential observations has received a great deal of interest and extensive investigation in linear regression. It is well documented that most of the popular diagnostic measures that are based on single-case deletion can mislead the analysis in the presence of multiple influential observations because of the well-known masking and/or swamping phenomena. Atkinson (1981) proposed a modification of Cook's distance. In this paper we propose a further modification of the Cook's distance for the identification of a single influential observation. We then propose new measures for the identification of multiple influential observations, which are not affected by the masking and swamping problems. The efficiency of the new statistics is presented through several well-known data sets and a simulation study.  相似文献   

3.
In this paper, we translate variable selection for linear regression into multiple testing, and select significant variables according to testing result. New variable selection procedures are proposed based on the optimal discovery procedure (ODP) in multiple testing. Due to ODP’s optimality, if we guarantee the number of significant variables included, it will include less non significant variables than marginal p-value based methods. Consistency of our procedures is obtained in theory and simulation. Simulation results suggest that procedures based on multiple testing have improvement over procedures based on selection criteria, and our new procedures have better performance than marginal p-value based procedures.  相似文献   

4.
Rong Zhu  Xinyu Zhang 《Statistics》2018,52(1):205-227
The theories and applications of model averaging have been developed comprehensively in the past two decades. In this paper, we consider model averaging for multivariate multiple regression models. In order to make use of the correlation information of the dependent variables sufficiently, we propose a model averaging method based on Mahalanobis distance which is related to the correlation of the dependent variables. We prove the asymptotic optimality of the resulting Mahalanobis Mallows model averaging (MMMA) estimators under certain assumptions. In the simulation study, we show that the proposed MMMA estimators compare favourably with model averaging estimators based on AIC and BIC weights and the Mallows model averaging estimators from the single dependent variable regression models. We further apply our method to the real data on urbanization rate and the proportion of non-agricultural population in ethnic minority areas of China.  相似文献   

5.
Leverage values are being used in regression diagnostics as measures of unusual observations in the X-space. Detection of high leverage observations or points is crucial due to their responsibility for masking outliers. In linear regression, high leverage points (HLP) are those that stand far apart from the center (mean) of the data and hence the most extreme points in the covariate space get the highest leverage. But Hosemer and Lemeshow [Applied logistic regression, Wiley, New York, 1980] pointed out that in logistic regression, the leverage measure contains a component which can make the leverage values of genuine HLP misleadingly very small and that creates problem in the correct identification of the cases. Attempts have been made to identify the HLP based on the median distances from the mean, but since they are designed for the identification of a single high leverage point they may not be very effective in the presence of multiple HLP due to their masking (false–negative) and swamping (false–positive) effects. In this paper we propose a new method for the identification of multiple HLP in logistic regression where the suspect cases are identified by a robust group deletion technique and they are confirmed using diagnostic techniques. The usefulness of the proposed method is then investigated through several well-known examples and a Monte Carlo simulation.  相似文献   

6.
Penalized likelihood methods provide a range of practical modelling tools, including spline smoothing, generalized additive models and variants of ridge regression. Selecting the correct weights for penalties is a critical part of using these methods and in the single-penalty case the analyst has several well-founded techniques to choose from. However, many modelling problems suggest a formulation employing multiple penalties, and here general methodology is lacking. A wide family of models with multiple penalties can be fitted to data by iterative solution of the generalized ridge regression problem minimize || W 1/2 ( Xp − y ) ||2ρ+Σ i =1 m  θ i p ' S i p ( p is a parameter vector, X a design matrix, S i a non-negative definite coefficient matrix defining the i th penalty with associated smoothing parameter θ i , W a diagonal weight matrix, y a vector of data or pseudodata and ρ an 'overall' smoothing parameter included for computational efficiency). This paper shows how smoothing parameter selection can be performed efficiently by applying generalized cross-validation to this problem and how this allows non-linear, generalized linear and linear models to be fitted using multiple penalties, substantially increasing the scope of penalized modelling methods. Examples of non-linear modelling, generalized additive modelling and anisotropic smoothing are given.  相似文献   

7.
Ordered multiple categorical (MC) variable has been widely considered and studied as response variable, and few studies have carefully considered it as a predictor in linear regression. When doing this, the existence of some pseudo-categories may result in overfitting, and to detect those pseudo-categories by hypothesis test of all dummy variables might have low specificity. In this paper, we propose a transformation method of dummy variables for such ordered MC predictors, after which a model selection method combined with BIC will be elaborated. Theoretical consistency of our model selection method is established under some common assumptions. Both simulation studies and real data analysis of a medical survey indicate that our method provides good performance and is applicable to a wide range of biomedical research.  相似文献   

8.
Detection of multiple unusual observations such as outliers, high leverage points and influential observations (IOs) in regression is still a challenging task for statisticians due to the well-known masking and swamping effects. In this paper we introduce a robust influence distance that can identify multiple IOs, and propose a sixfold plotting technique based on the well-known group deletion approach to classify regular observations, outliers, high leverage points and IOs simultaneously in linear regression. Experiments through several well-referred data sets and simulation studies demonstrate that the proposed algorithm performs successfully in the presence of multiple unusual observations and can avoid masking and/or swamping effects.  相似文献   

9.
We consider the estimation of a regression coefficient in a linear regression when observations are missing due to nonresponse. Response is assumed to be determined by a nonobservable variable which is linearly related to an observable variable. The values of the observable variable are assumed to be available for the whole sample but the variable is not includsd in the regression relationship of interest . Several alternative estimators have been proposed for this situation under various simplifying assumptions. A sampling theory approach provides three alternative estimatrs by considering the observatins as obtained from a sub-sample, selected on the basis of the fully observable variable , as formulated by Nathan and Holt (1980). Under an econometric approach, Heckman (1979) proposed a two-stage (probit and OLS) estimator which is consistent under specificconditions. A simulation comparison of the four estimators and the ordinary least squares estimator , under multivariate normality of all the variables involved, indicates that the econometric approach estimator is not robust to departures from the conditions underlying its derivation, while two of the other estimators exhibit a similar degree of stable performance over a wide range of conditions. Simulations for a non-normal distribution show that gains in performance can be obtained if observations on the independent variable are available for the whole population.  相似文献   

10.
We study variable selection in quantile regression with multiple responses. Instead of applying conventional penalized quantile regression to each response separately, it is desired to solve them simultaneously when the sparsity patterns of the regression coefficients for different responses are similar, which is often the case in practice. In this paper, we propose employing a hierarchical penalty that enables us to detect a common sparsity pattern shared between different responses as well as additional sparsity patterns within the selected variables. We establish the oracle property of the proposed method and demonstrate it offers better performance than existing approaches.  相似文献   

11.
Bayesian selection of variables is often difficult to carry out because of the challenge in specifying prior distributions for the regression parameters for all possible models, specifying a prior distribution on the model space and computations. We address these three issues for the logistic regression model. For the first, we propose an informative prior distribution for variable selection. Several theoretical and computational properties of the prior are derived and illustrated with several examples. For the second, we propose a method for specifying an informative prior on the model space, and for the third we propose novel methods for computing the marginal distribution of the data. The new computational algorithms only require Gibbs samples from the full model to facilitate the computation of the prior and posterior model probabilities for all possible models. Several properties of the algorithms are also derived. The prior specification for the first challenge focuses on the observables in that the elicitation is based on a prior prediction y 0 for the response vector and a quantity a 0 quantifying the uncertainty in y 0. Then, y 0 and a 0 are used to specify a prior for the regression coefficients semi-automatically. Examples using real data are given to demonstrate the methodology.  相似文献   

12.
Simple Transformation Techniques for Improved Non-parametric Regression   总被引:2,自引:0,他引:2  
We propose and investigate two new methods for achieving less bias in non- parametric regression. We show that the new methods have bias of order h 4, where h is a smoothing parameter, in contrast to the basic kernel estimator's order h 2. The methods are conceptually very simple. At the first stage, perform an ordinary non-parametric regression on { xi , Yi } to obtain m^ ( xi ) (we use local linear fitting). In the first method, at the second stage, repeat the non-parametric regression but on the transformed dataset { m^ ( xi , Yi )}, taking the estimator at x to be this second stage estimator at m^ ( x ). In the second, and more appealing, method, again perform non-parametric regression on { m^ ( xi , Yi )}, but this time make the kernel weights depend on the original x scale rather than using the m^ ( x ) scale. We concentrate more of our effort in this paper on the latter because of its advantages over the former. Our emphasis is largely theoretical, but we also show that the latter method has practical potential through some simulated examples.  相似文献   

13.
This paper presents a method for assessing the sensitivity of predictions in Bayesian regression analyses. In parametric Bayesian analyses there is a family s0 of regression functions, parametrized by a finite-dimensional vector B. The family s0 is a subset of R, the set of all possible regression functions. A prior π0 on B induces a prior on R. This paper assesses sensitivity by computing bounds on the predictive probability of a fixed set K over a class of priors, Γ, induced by a class of families of regression functions, Γs, and a class of priors, Γπ. This paper is divided into three parts which (1) define Γ, (2) describe an algorithm for finding accurate bounds on predictive probabilities over Γ and (3) illustrate the method with two examples. It is found that sensitivity to the family of regression functions can be much more important than sensitivity to π0.  相似文献   

14.
The detection of outliers and influential observations has received a great deal of attention in the statistical literature in the context of least-squares (LS) regression. However, the explanatory variables can be correlated with each other and alternatives to LS come out to address outliers/influential observations and multicollinearity, simultaneously. This paper proposes new influence measures based on the affine combination type regression for the detection of influential observations in the linear regression model when multicollinearity exists. Approximate influence measures are also proposed for the affine combination type regression. Since the affine combination type regression includes the ridge, the Liu and the shrunken regressions as special cases, influence measures under the ridge, the Liu and the shrunken regressions are also examined to see the possible effect that multicollinearity can have on the influence of an observation. The Longley data set is given illustrating the influence measures in affine combination type regression and also in ridge, Liu and shrunken regressions so that the performance of different biased regressions on detecting and assessing the influential observations is examined.  相似文献   

15.
Interaction is very common in reality, but has received little attention in logistic regression literature. This is especially true for higher-order interactions. In conventional logistic regression, interactions are typically ignored. We propose a model selection procedure by implementing an association rules analysis. We do this by (1) exploring the combinations of input variables which have significant impacts to response (via association rules analysis); (2) selecting the potential (low- and high-order) interactions; (3) converting these potential interactions into new dummy variables; and (4) performing variable selections among all the input variables and the newly created dummy variables (interactions) to build up the optimal logistic regression model. Our model selection procedure establishes the optimal combination of main effects and potential interactions. The comparisons are made through thorough simulations. It is shown that the proposed method outperforms the existing methods in all cases. A real-life example is discussed in detail to demonstrate the proposed method.  相似文献   

16.
In the estimators t 3 , t 4 , t 5 of Mukerjee, Rao & Vijayan (1987), b y x and b y z are partial regression coefficients of y on x and z , respectively, based on the smaller sample. With the above interpretation of b y x and b y z in t 3 , t 4 , t 5 , all the calculations in Mukerjee at al. (1987) are correct. In this connection, we also wish to make it explicit that b x z in t 5 is an ordinary and not a partial regression coefficient. The 'corrected' MSEs of t 3 , t 4 , t 5 , as given in Ahmed (1998 Section 3) are computed assuming that our b y x and b y z are ordinary and not partial regression coefficients. Indeed, we had no intention of giving estimators using the corresponding ordinary regression coefficients which would lead to estimators inferior to those given by Kiregyera (1984). We accept responsibility for any notational confusion created by us and express regret to readers who have been confused by our notation. Finally, in consideration of the above, it may be noted that Tripathi & Ahmed's (1995) estimator t 0 , quoted also in Ahmed (1998), is no better than t 5 of Mukerjee at al. (1987).  相似文献   

17.
The power of the classical .F-test for testing the regression coefficient of a general linear model with elliptic t error variable depends on the degrees of freedom of the t- distribution. In this note it is shown that the power of the F-test based on t-distribution is greater than the normal based test at smaller level of significance.  相似文献   

18.
We analyze Poisson regression when covariates contain measurement errors and when multiple potential instrumental variables are available. Without empirical knowledge to select the most suitable variable as an instrument, we propose a novel model-averaging approach to resolve this issue. We prescribe an implementation and establish its optimality in terms of minimizing prediction risk. We further show that, as long as one model is correctly specified among all potential instrumental variable models, our method will lead to consistent prediction. The performance of our method is illustrated through simulations and a movie sales example.  相似文献   

19.
We use Owen's (1988, 1990) empirical likelihood method in upgraded mixture models. Two groups of independent observations are available. One is z 1, ..., z n which is observed directly from a distribution F ( z ). The other one is x 1, ..., x m which is observed indirectly from F ( z ), where the x i s have density ∫ p ( x | z ) dF ( z ) and p ( x | z ) is a conditional density function. We are interested in testing H 0: p ( x | z ) = p ( x | z ; θ ), for some specified smooth density function. A semiparametric likelihood ratio based statistic is proposed and it is shown that it converges to a chi-squared distribution. This is a simple method for doing goodness of fit tests, especially when x is a discrete variable with finitely many values. In addition, we discuss estimation of θ and F ( z ) when H 0 is true. The connection between upgraded mixture models and general estimating equations is pointed out.  相似文献   

20.
The joint effect of the deletion of the ith and jih cases is given by Gray and Ling (1984), they discussed the influence measures for influential subsets in linear regression analysis. The present paper is concerned with multiple sets of deletion measures in the linear regression model. In particular we are interested in the effects of the jointly and conditional influence analysis for the detection of two influential subsets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号