首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Whittemore (1981) proposed an approach for calculating the sample size needed to test hypotheses with specified significance and power against a given alternative for logistic regression with small response probability. Based on the distribution of covariate, which could be either discrete or continuous, this approach first provides a simple closed-form approximation to the asymptotic covariance matrix of the maximum likelihood estimates, and then uses it to calculate the sample size needed to test a hypothesis about the parameter. Self et al. (1992) described a general approach for power and sample size calculations within the framework of generalized linear models, which include logistic regression as a special case. Their approach is based on an approximation to the distribution of the likelihood ratio statistic. Unlike the Whittemore approach, their approach is not limited to situations of small response probability. However, it is restricted to models with a finite number of covariate configurations. This study compares these two approaches to see how accurate they would be for the calculations of power and sample size in logistic regression models with various response probabilities and covariate distributions. The results indicate that the Whittemore approach has a slight advantage in achieving the nominal power only for one case with small response probability. It is outperformed for all other cases with larger response probabilities. In general, the approach proposed in Self et al. (1992) is recommended for all values of the response probability. However, its extension for logistic regression models with an infinite number of covariate configurations involves an arbitrary decision for categorization and leads to a discrete approximation. As shown in this paper, the examined discrete approximations appear to be sufficiently accurate for practical purpose.  相似文献   

In epidemiologic studies where the outcome is binary, the data often arise as clusters, as when siblings, friends or neighbors are used as matched controls in a case-control study. Conditional logistic regression (CLR) is typically used for such studies to estimate the odds ratio for an exposure of interest. However, CLR assumes the exposure coefficient is the same in every cluster, and CLR-based inference can be badly biased when homogeneity is violated. Existing methods for testing goodness-of-fit for CLR are not designed to detect such violations. Good alternative methods of analysis exist if one suspects there is heterogeneity across clusters. However, routine use of alternative robust approaches when there is no appreciable heterogeneity could cause loss of precision and be computationally difficult, particularly if the clusters are small. We propose a simple non-parametric test, the test of heterogeneous susceptibility (THS), to assess the assumption of homogeneity of a coefficient across clusters. The test is easy to apply and provides guidance as to the appropriate method of analysis. Simulations demonstrate that the THS has reasonable power to reveal violations of homogeneity. We illustrate by applying the THS to a study of periodontal disease.  相似文献   

The problems of existence and uniqueness of maximum likelihood estimates for logistic regression were completely solved by Silvapulle in 1981 and Albert and Anderson in 1984. In this paper, we extend the well-known results by Silvapulle and by Albert and Anderson to weighted logistic regression. We analytically prove the equivalence between the overlap condition used by Albert and Anderson and that used by Silvapulle. We show that the maximum likelihood estimate of weighted logistic regression does not exist if there is a complete separation or a quasicomplete separation of the data points, and exists and is unique if there is an overlap of data points. Our proofs and results for weighted logistic apply to unweighted logistic regression.  相似文献   

This paper develops alternatives to maximum likelihood estimators (MLE) for logistic regression models and compares the mean squared error (MSE) of the estimators. The MLE for the vector of underlying success probabilities has low MSE only when the true probabilities are extreme (i.e., near 0 or 1). Extreme probabilities correspond to logistic regression parameter vectors which are large in norm. A competing “restricted” MLE and an empirical version of it are suggested as estimators with better performance than the MLE for central probabilities. An approximate EM-algorithm for estimating the restriction is described. As in the case of normal theory ridge estimators, the proposed estimators are shown to be formally derivable by Bayes and empirical Bayes arguments. The small sample operating characteristics of the proposed estimators are compared to the MLE via a simulation study; both the estimation of individual probabilities and of logistic parameters are considered.  相似文献   

This paper presents a method for Bayesian inference for the regression parameters in a linear model with independent and identically distributed errors that does not require the specification of a parametric family of densities for the error distribution. This method first selects a nonparametric kernel density estimate of the error distribution which is unimodal and based on the least-squares residuals. Once the error distribution is selected, the Metropolis algorithm is used to obtain the marginal posterior distribution of the regression parameters. The methodology is illustrated with data sets, and its performance relative to standard Bayesian techniques is evaluated using simulation results.  相似文献   

Various methods have been suggested in the literature to handle a missing covariate in the presence of surrogate covariates. These methods belong to one of two paradigms. In the imputation paradigm, Pepe and Fleming (1991) and Reilly and Pepe (1995) suggested filling in missing covariates using the empirical distribution of the covariate obtained from the observed data. We can proceed one step further by imputing the missing covariate using nonparametric maximum likelihood estimates (NPMLE) of the density of the covariate. Recently Murphy and Van der Vaart (1998a) showed that such an approach yields a consistent, asymptotically normal, and semiparametric efficient estimate for the logistic regression coefficient. In the weighting paradigm, Zhao and Lipsitz (1992) suggested an estimating function using completely observed records after weighting inversely by the probability of observation. An extension of this weighting approach designed to achieve semiparametric efficient bound is considered by Robins, Hsieh and Newey (RHN) (1995). The two ends of each paradigm (NPMLE and RHN) attain the efficiency bound and are asymptotically equivalent. However, both require a substantial amount of computation. A question arises whether and when, in practical situations, this extensive computation is worthwhile. In this paper we investigate the performance of single and multiple imputation estimates, weighting estimates, semiparametric efficient estimates, and two new imputation estimates. Simulation studies suggest that the sample size should be substantially large (e.g. n=2000) for NPMLE and RHN to be more efficient than simpler imputation estimates. When the sample size is moderately large (n≤ 1500), simpler imputation estimates have as small a variance as semiparametric efficient estimates.  相似文献   

Consider the nonparametric heteroscedastic regression model Y=m(X)+σ(X)?, where m(·) is an unknown conditional mean function and σ(·) is an unknown conditional scale function. In this paper, the limit distribution of the quantile estimate for the scale function σ(X) is derived. Since the limit distribution depends on the unknown density of the errors, an empirical likelihood ratio statistic based on quantile estimator is proposed. This statistics is used to construct confidence intervals for the variance function. Under certain regularity conditions, it is shown that the quantile estimate of the scale function converges to a Brownian motion and the empirical likelihood ratio statistic converges to a chi-squared random variable. Simulation results demonstrate the superiority of the proposed method over the least squares procedure when the underlying errors have heavy tails.  相似文献   

Logistic regression is estimated by maximizing the log-likelihood objective function formulated under the assumption of maximizing the overall accuracy. That does not apply to the imbalanced data. The resulting models tend to be biased towards the majority class (i.e. non-event), which can bring great loss in practice. One strategy for mitigating such bias is to penalize the misclassification costs of observations differently in the log-likelihood function. Existing solutions require either hard hyperparameter estimating or high computational complexity. We propose a novel penalized log-likelihood function by including penalty weights as decision variables for observations in the minority class (i.e. event) and learning them from data along with model coefficients. In the experiments, the proposed logistic regression model is compared with the existing ones on the statistics of area under receiver operating characteristics (ROC) curve from 10 public datasets and 16 simulated datasets, as well as the training time. A detailed analysis is conducted on an imbalanced credit dataset to examine the estimated probability distributions, additional performance measurements (i.e. type I error and type II error) and model coefficients. The results demonstrate that both the discrimination ability and computation efficiency of logistic regression models are improved using the proposed log-likelihood function as the learning objective.  相似文献   


We propose to compare population means and variances under a semiparametric density ratio model. The proposed method is easy to implement by employing logistic regression procedures in many statistical software, and it often works very well when data are not normal. In this paper, we construct semiparametric estimators of the differences of two population means and variances, and derive their asymptotic distributions. We prove that the proposed semiparametric estimators are asymptotically more efficient than the corresponding non parametric ones. In addition, a simulation study and the analysis of two real data sets are presented. Finally, a short discussion is provided.  相似文献   

This article presents methods for the construction of two-sided and one-sided simultaneous hyperbolic bands for the logistic and probit regression models when the predictor variable is restricted to a given interval. The bands are constructed based on the asymptotic properties of the maximum likelihood estimators. Past articles have considered building two-sided asymptotic confidence bands for the logistic model, such as Piegorsch and Casella (1988 Piegorsch, W.W., Casella, G. (1988). Confidence bands for logistic regression with restricted predictor variables. Biometrics 44:739750.[Crossref], [PubMed], [Web of Science ®] [Google Scholar]). However, the confidence bands given by Piegorsch and Casella are conservative under a single interval restriction, and it is shown in this article that their bands can be sharpened using the methods proposed here. Furthermore, no method has yet appeared in the literature for constructing one-sided confidence bands for the logistic model, and no work has been done for building confidence bands for the probit model, over a limited range of the predictor variable. This article provides methods for computing critical points in these areas.  相似文献   

In this paper, we propose a semiparametric method of estimating receiver operating characteristic (ROC) surfaces for continuous diagnostic tests under density ratio models. Implementation of our method is easy since the usual polytomous logistic regression procedures in many statistical software packages can be employed. A simulated example is provided to facilitate the implementation of our method. Simulation results show that the proposed semiparametric ROC surface estimator is more efficient than the nonparametric counterpart and the parametric counterpart whether the normality assumption of data holds or not. Moreover, some simulation results on the underlying semiparametric distribution function estimators are also reported. In addition, some discussions on the proposed method as well as analysis of a real data set are provided.  相似文献   

Goodness-of-fit tests for logistic regression models using extreme residuals are considered. Approximations to the moments of the Pearson residuals are given for model fits made by maximum likelihood, minimum chi-square and weighted least squares and used to define modified residuals. Approximations to the critical values of the extreme statistics based on the ordinary and modified Pearson residuals are developed and assessed for the case of a single explanatory variable.  相似文献   

We present a variational estimation method for the mixed logistic regression model. The method is based on a lower bound approximation of the logistic function [Jaakkola, J.S. and Jordan, M.I., 2000, Bayesian parameter estimation via variational methods. Statistics & Computing, 10, 25–37.]. Based on the approximation, an EM algorithm can be derived that results in a considerable simplification of the maximization problem in that it does not require the numerical evaluation of integrals over the random effects. We assess the performance of the variational method for the mixed logistic regression model in a simulation study and an empirical data example, and compare it to Laplace's method. The results indicate that the variational method is a viable choice for estimating the fixed effects of the mixed logistic regression model under the condition that the number of outcomes within each cluster is sufficiently high.  相似文献   

Monotonic transformations of explanatory continuous variables are often used to improve the fit of the logistic regression model to the data. However, no analytic studies have been done to study the impact of such transformations. In this paper, we study invariant properties of the logistic regression model under monotonic transformations. We prove that the maximum likelihood estimates, information value, mutual information, Kolmogorov–Smirnov (KS) statistics, and lift table are all invariant under certain monotonic transformations.  相似文献   

One feature of the usual polychotomous logistic regression model for categorical outcomes is that a covariate must be included in all the regression equations. If a covariate is not important in all of them, the procedure will estimate unnecessary parameters. More flexible approaches allow different subsets of covariates in different regressions. One alternative uses individualized regressions which express the polychotomous model as a series of dichotomous models. Another uses a model in which a reduced set of parameters is simultaneously estimated for all the regressions. Large-sample efficiencies of these procedures were compared in a variety of circumstances in which there was a common baseline category for the outcome and the covariates were normally distributed. For a correctly specified model, the reduced estimates were over 100% efficient for nonzero slope parameters and up to 500% efficient when the baseline frequency and the effect of interest were small. The individualized estimates could have efficiencies less than 50% when the effect of interest was large, but were also up to 130% efficient when the baseline frequency was large and the effect of interest was small. Efficiency was usually enhanced by correlation among the covariates. For an underspecified reduced model, asymptotic bias in the reduced estimates was approximately proportional to the magnitude of the omitted parameter and to the reciprocal of the baseline frequency.  相似文献   

We consider asymptotic properties of the maximum likelihood and related estimators in a clustered logistic joinpoint model with an unknown joinpoint. Sufficient conditions are given for the consistency of confidence bounds produced by the parametric bootstrap; one of the conditions required is that the true location of the joinpoint is not at one of the observation times. A simulation study is presented to illustrate the lack of consistency of the bootstrap confidence bounds when the joinpoint is an observation time. A removal algorithm is presented which corrects this problem, but at the price of an increased mean square error. Finally, the methods are applied to data on yearly cancer mortality in the US for individuals age 65 and over.  相似文献   

A general class of minimum distance estimators for logistic regression models based on the ϕ-divergence measures is introduced: The minimum ϕ-divergence estimator, which is seen to be a generalization of the maximum likelihood estimator. Its asymptotic properties are studied as well as its behaviour in small samples throught a simulation study. This work was supported partially by Grant DGI (BMF2003-00892).  相似文献   

Logistic regression is often confronted with separation of likelihood problem, especially with unbalanced success–failure distribution. We propose to address this issue by drawing a ranked set sample (RSS). Simulation studies illustrated the advantages of logistic regression models fitted with RSS samples with small sample size regardless of the distribution of the binary response. As sample size increases, RSS eventually becomes comparable to SRS, but still has the advantage over SRS in mitigating the problem of separation of likelihood. Even in the presence of ranking errors, models from RSS samples yield higher predictive ability than its SRS counterpart.  相似文献   

We are interested in comparing logistic regressions for several test treatments or populations with a logistic regression for a standard treatment or population. The research was motivated by some real life problems, which are discussed as data examples. We propose a step-down likelihood ratio method for declaring differences between the test treatments or populations and the standard treatment or population. Competitors based on the sequentially rejective Bonferroni Wald statistic, sequentially rejective exact Wald statistic and Reiers?l's statistic are also discussed. It is shown that the proposed method asymptotically controls the probability of type I error. A Monte Carlo simulation shows that the proposed method performs well for relatively small sample sizes, outperforming its competitors.  相似文献   

In some applications, quality engineers cannot monitor the processes at the beginning of the production process. Because the process parameters are unknown and there are not enough initial samples to estimate the process parameters. Self-starting control charts are applied to monitor processes at the start-up stages with no enough initial samples. In this paper, we propose three self-starting control charts to monitor a logistic regression profile which models the relationship between a binomial response variable and explanatory variables. Also, we compare the proposed control charts with each other through simulation studies in terms of average run length (ARL) criterion.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号