首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Logistic regression using conditional maximum likelihood estimation has recently gained widespread use. Many of the applications of logistic regression have been in situations in which the independent variables are collinear. It is shown that collinearity among the independent variables seriously effects the conditional maximum likelihood estimator in that the variance of this estimator is inflated in much the same way that collinearity inflates the variance of the least squares estimator in multiple regression. Drawing on the similarities between multiple and logistic regression several alternative estimators, which reduce the effect of the collinearity and are easy to obtain in practice, are suggested and compared in a simulation study.  相似文献   

3.
A regression predictor is well-calibrated if the predictions it gives are equal to the average responses that would be observed in an independent sample. The usual least squares predictor does not have this property, but its calibration can be improved by shrinking the predictions by a factor which depends on the signal-to-noise ratio of the regression model. We suggest a semi-Bayesian approach to estimating this factor, giving an estimate closely related to the so-called Stein Shririkasre Factor. The results are illustrated on a large medical data set.  相似文献   

4.
ABSTRACT

In this paper, we investigated the cross validation measures, namely OCV, GCV and Cp under the linear regression models when the error structure is autocorrelated and regressor data are correlated. The best performed ridge regression estimator is obtained by getting the optimal ridge parameter so as to minimize these measures. A Monte Carlo simulation study is given to see how the optimal ridge parameter is affected by autocorrelation and the strength of multicollinearity.  相似文献   

5.
In this paper we address the problem of estimating a vector of regression parameters in the Weibull censored regression model. Our main objective is to provide natural adaptive estimators that significantly improve upon the classical procedures in the situation where some of the predictors may or may not be associated with the response. In the context of two competing Weibull censored regression models (full model and candidate submodel), we consider an adaptive shrinkage estimation strategy that shrinks the full model maximum likelihood estimate in the direction of the submodel maximum likelihood estimate. We develop the properties of these estimators using the notion of asymptotic distributional risk. The shrinkage estimators are shown to have higher efficiency than the classical estimators for a wide class of models. Further, we consider a LASSO type estimation strategy and compare the relative performance with the shrinkage estimators. Monte Carlo simulations reveal that when the true model is close to the candidate submodel, the shrinkage strategy performs better than the LASSO strategy when, and only when, there are many inactive predictors in the model. Shrinkage and LASSO strategies are applied to a real data set from Veteran's administration (VA) lung cancer study to illustrate the usefulness of the procedures in practice.  相似文献   

6.
The purpose of this note is to gain insight on the performance of two well known operational Ridge Regression estimators by deriving the moments of their stochastic shrinkage parameters. We also show that, under certain conditions, one of them has bounded moments.  相似文献   

7.
文章基于课题组所测得的广东省各地区2001年与2002年市场化指数的平行数据(Paneldata),运用最小中位数平方回归(theLeastMedianofSquaresregression,简称LMS)技术,对广东省各地区市场化与经济增长的关系进行了实证分析,充分考虑到了离群点(outliers)对于回归模型的影响,得出的结论是广东各地区市场化水平与经济发展水平及经济增长显著相关,短期市场化水平的变化与经济增长的关系不显著。  相似文献   

8.
The bias of maximum likelihood estimators of the standard deviation of the response in location/scale regression models is considered. Results are obtained for a very wide family of densities for the response variable. These are used to propose point estimators with improved mean square error properties and to demonstrate the importance of bias correction in statistical inference when samples are moderately small.  相似文献   

9.
Estimation for the log-logistic and Weibull distributions can be performed by using the equations used for probability plotting, and this technique outperforms the maximum likelihood (ML) estimation often in small samples. This leads to a highly heteroskedastic regression problem. Exact expressions for the variances of the residuals are derived which can be used to perform weighted regression. In large samples, the ML performs best, but it is shown that in smaller samples, the weighted regression outperforms the ML estimation with respect to bias and mean square error.  相似文献   

10.
In this paper considering an appropriate transformation on the Lindley distribution, we propose the unit-Lindley distribution and investigate some of its statistical properties. An important fact associated with this new distribution is that it is possible to obtain the analytical expression for bias correction of the maximum likelihood estimator. Moreover, it belongs to the exponential family. This distribution allows us to incorporate covariates directly in the mean and consequently to quantify their influences on the average of the response variable. Finally, a practical application is presented to show that our model fits much better than the Beta regression.  相似文献   

11.
Summary.  We consider the problem of estimating the noise variance in homoscedastic nonparametric regression models. For low dimensional covariates t  ∈  R d ,  d =1, 2, difference-based estimators have been investigated in a series of papers. For a given length of such an estimator, difference schemes which minimize the asymptotic mean-squared error can be computed for d =1 and d =2. However, from numerical studies it is known that for finite sample sizes the performance of these estimators may be deficient owing to a large finite sample bias. We provide theoretical support for these findings. In particular, we show that with increasing dimension d this becomes more drastic. If d 4, these estimators even fail to be consistent. A different class of estimators is discussed which allow better control of the bias and remain consistent when d 4. These estimators are compared numerically with kernel-type estimators (which are asymptotically efficient), and some guidance is given about when their use becomes necessary.  相似文献   

12.
The shrinkage preliminary test ridge regression estimators (SPTRRE) based on the Wald (W), the likelihood ratio (LR) and the Lagrangian multiplier (LM) tests are considered in this paper. The bias and the risk functions of the proposed estimators are derived. The regions of optimality of the estimators are determined under the quadratic risk function. Under the null hypothesis, the SPTRRE based on LM test has the smallest risk, followed by the estimators based on LR and W tests. However, the SPTRRE based on W test performs the best followed by the LR and LM based estimators when the parameter moves away from the subspace of the restrictions. The conditions of superiority of the proposed estimator for both ridge and departure parameters are discussed. The optimum choice of the level of significance becomes the traditional choice by using the W test for all non-negative ridge parameters.  相似文献   

13.
In this paper we study the property of linearity of backward regression for non-adjacent records. In the case of weak records, a characterization of the geometric distribution is obtained. It also appears that a related characterization for ordinary records does not hold, showing the difference in conditional behaviour between weak and ordinary records.  相似文献   

14.
15.
ABSTRACT

This paper is concerned with the problem of estimation for the mean of the selected population from two normal populations with unknown means and common known variance in a Bayesian framework. The empirical Bayes estimator, when there are available additional observations, is derived and its bias and risk function are computed. The expected bias and risk of the empirical Bayes estimator and the intuitive estimator are compared. It is shown that the empirical Bayes estimator is asymptotically optimal and especially dominates the intuitive estimator in terms of Bayes risk, with respect to any normal prior. Also, the Bayesian correlation between the mean of the selected population (random parameter) and some interested estimators are obtained and compared.  相似文献   

16.
Summary. A new estimator of the regression parameters is introduced in a multivariate multiple-regression model in which both the vector of explanatory variables and the vector of response variables are assumed to be random. The affine equivariant estimate matrix is constructed using the sign covariance matrix (SCM) where the sign concept is based on Oja's criterion function. The influence function and asymptotic theory are developed to consider robustness and limiting efficiencies of the SCM regression estimate. The estimate is shown to be consistent with a limiting multinormal distribution. The influence function, as a function of the length of the contamination vector, is shown to be linear in elliptic cases; for the least squares (LS) estimate it is quadratic. The asymptotic relative efficiencies with respect to the LS estimate are given in the multivariate normal as well as the t -distribution cases. The SCM regression estimate is highly efficient in the multivariate normal case and, for heavy-tailed distributions, it performs better than the LS estimate. Simulations are used to consider finite sample efficiencies with similar results. The theory is illustrated with an example.  相似文献   

17.
A multiple regression model is considered in which the density of the response variable is a member of a very wide family which includes many well-known distributions. Schemes of observation in which the response observations are grouped or type 1 right censored are examined. Results on the asymptotic variance efficiencies of the maximum likelihood estimators of the regression coefficients and standard deviation of the error distribution are presented for the two schemes.  相似文献   

18.
One of the main problems that the drug discovery research field confronts is to identify small molecules, modulators of protein function, which are likely to be therapeutically useful. Common practices rely on the screening of vast libraries of small molecules (often 1–2 million molecules) in order to identify a molecule, known as a lead molecule, which specifically inhibits or activates the protein function. To search for the lead molecule, we investigate the molecular structure, which generally consists of an extremely large number of fragments. Presence or absence of particular fragments, or groups of fragments, can strongly affect molecular properties. We study the relationship between molecular properties and its fragment composition by building a regression model, in which predictors, represented by binary variables indicating the presence or absence of fragments, are grouped in subsets and a bi-level penalization term is introduced for the high dimensionality of the problem. We evaluate the performance of this model in two simulation studies, comparing different penalization terms and different clustering techniques to derive the best predictor subsets structure. Both studies are characterized by small sets of data relative to the number of predictors under consideration. From the results of these simulation studies, we show that our approach can generate models able to identify key features and provide accurate predictions. The good performance of these models is then exhibited with real data about the MMP–12 enzyme.  相似文献   

19.
This paper considers linear and nonlinear regression with a response variable that is allowed to be “missing at random”. The only structural assumptions on the distribution of the variables are that the errors have mean zero and are independent of the covariates. The independence assumption is important. It enables us to construct an estimator for the response density that uses all the observed data, in contrast to the usual local smoothing techniques, and which therefore permits a faster rate of convergence. The idea is to write the response density as a convolution integral which can be estimated by an empirical version, with a weighted residual-based kernel estimator plugged in for the error density. For an appropriate class of regression functions, and a suitably chosen bandwidth, this estimator is consistent and converges with the optimal parametric rate n1/2. Moreover, the estimator is proved to be efficient (in the sense of Hájek and Le Cam) if an efficient estimator is used for the regression parameter.  相似文献   

20.
We propose a method for estimating parameters in generalized linear models with missing covariates and a non-ignorable missing data mechanism. We use a multinomial model for the missing data indicators and propose a joint distribution for them which can be written as a sequence of one-dimensional conditional distributions, with each one-dimensional conditional distribution consisting of a logistic regression. We allow the covariates to be either categorical or continuous. The joint covariate distribution is also modelled via a sequence of one-dimensional conditional distributions, and the response variable is assumed to be completely observed. We derive the E- and M-steps of the EM algorithm with non-ignorable missing covariate data. For categorical covariates, we derive a closed form expression for the E- and M-steps of the EM algorithm for obtaining the maximum likelihood estimates (MLEs). For continuous covariates, we use a Monte Carlo version of the EM algorithm to obtain the MLEs via the Gibbs sampler. Computational techniques for Gibbs sampling are proposed and implemented. The parametric form of the assumed missing data mechanism itself is not `testable' from the data, and thus the non-ignorable modelling considered here can be viewed as a sensitivity analysis concerning a more complicated model. Therefore, although a model may have `passed' the tests for a certain missing data mechanism, this does not mean that we have captured, even approximately, the correct missing data mechanism. Hence, model checking for the missing data mechanism and sensitivity analyses play an important role in this problem and are discussed in detail. Several simulations are given to demonstrate the methodology. In addition, a real data set from a melanoma cancer clinical trial is presented to illustrate the methods proposed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号