首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Interaction is very common in reality, but has received little attention in logistic regression literature. This is especially true for higher-order interactions. In conventional logistic regression, interactions are typically ignored. We propose a model selection procedure by implementing an association rules analysis. We do this by (1) exploring the combinations of input variables which have significant impacts to response (via association rules analysis); (2) selecting the potential (low- and high-order) interactions; (3) converting these potential interactions into new dummy variables; and (4) performing variable selections among all the input variables and the newly created dummy variables (interactions) to build up the optimal logistic regression model. Our model selection procedure establishes the optimal combination of main effects and potential interactions. The comparisons are made through thorough simulations. It is shown that the proposed method outperforms the existing methods in all cases. A real-life example is discussed in detail to demonstrate the proposed method.  相似文献   

2.
3.
This study considers the binary classification of functional data collected in the form of curves. In particular, we assume a situation in which the curves are highly mixed over the entire domain, so that the global discriminant analysis based on the entire domain is not effective. This study proposes an interval-based classification method for functional data: the informative intervals for classification are selected and used for separating the curves into two classes. The proposed method, called functional logistic regression with fused lasso penalty, combines the functional logistic regression as a classifier and the fused lasso for selecting discriminant segments. The proposed method automatically selects the most informative segments of functional data for classification by employing the fused lasso penalty and simultaneously classifies the data based on the selected segments using the functional logistic regression. The effectiveness of the proposed method is demonstrated with simulated and real data examples.  相似文献   

4.
A table of expected success rates under normally distributed success logit, used in conjunction with logistic regression analysis, enables easy calculation of expected win for betting on success of a future dichotomous trial.  相似文献   

5.
In the multinomial regression model, we consider the methodology for simultaneous model selection and parameter estimation by using the shrinkage and LASSO (least absolute shrinkage and selection operation) [R. Tibshirani, Regression shrinkage and selection via the LASSO, J. R. Statist. Soc. Ser. B 58 (1996), pp. 267–288] strategies. The shrinkage estimators (SEs) provide significant improvement over their classical counterparts in the case where some of the predictors may or may not be active for the response of interest. The asymptotic properties of the SEs are developed using the notion of asymptotic distributional risk. We then compare the relative performance of the LASSO estimator with two SEs in terms of simulated relative efficiency. A simulation study shows that the shrinkage and LASSO estimators dominate the full model estimator. Further, both SEs perform better than the LASSO estimators when there are many inactive predictors in the model. A real-life data set is used to illustrate the suggested shrinkage and LASSO estimators.  相似文献   

6.
Much research has been performed in the area of multiple linear regression, with the resuit that the field is well-developed. This is not true of logistic regression, however. The latter presents special problems because the response is not continuous. Some of these problems are: the difficulty of developing a suitable R2 statistic, possibly poor results produced by the method of maximum likelihood, and the challenge to develop suitable graphical techniques. We describe recent work in some of these directions, and discuss the need for additional research.  相似文献   

7.
8.
Mediation is a hypothesized causal chain among three variables. Mediation analysis for continuous response variables is well developed in the literature, and it can be shown that the indirect effect is equal to the total effect minus the direct effect. However, mediation analysis for categorical responses is still not fully developed. The purpose of this article is to propose a simpler method of analysing the mediation effect among three variables when the dependent and mediator variables are both dichotomous. We propose using the latent variable technique which in turn will adjust for the necessary condition that indirect effect is equal to the total effect minus the direct effect. An intensive simulation study is conducted to compare the proposed method with other methods in the literature. Our theoretical derivation and simulation study show that the proposed approach is simpler to use and at least as good as other approaches provided in the literature. We illustrate our approach to test for the potential mediators on the relationship between depression and obesity among children and adolescents compared to the method in Winship and Mare using National children health survey data 2011–2012.  相似文献   

9.
Abstract

In this article, we propose a new penalized-likelihood method to conduct model selection for finite mixture of regression models. The penalties are imposed on mixing proportions and regression coefficients, and hence order selection of the mixture and the variable selection in each component can be simultaneously conducted. The consistency of order selection and the consistency of variable selection are investigated. A modified EM algorithm is proposed to maximize the penalized log-likelihood function. Numerical simulations are conducted to demonstrate the finite sample performance of the estimation procedure. The proposed methodology is further illustrated via real data analysis.  相似文献   

10.
Classification models can demonstrate apparent prediction accuracy even when there is no underlying relationship between the predictors and the response. Variable selection procedures can lead to false positive variable selections and overestimation of true model performance. A simulation study was conducted using logistic regression with forward stepwise, best subsets, and LASSO variable selection methods with varying total sample sizes (20, 50, 100, 200) and numbers of random noise predictor variables (3, 5, 10, 15, 20, 50). Using our critical values can help reduce needless follow-up on variables having no true association with the outcome.  相似文献   

11.
12.
In this article, we propose a novel robust data-analytic procedure, dynamic quantile regression (DQR), for model selection. It is robust in the sense that it can simultaneously estimate the coefficients and the distribution of errors over a large collection of error distributions even those that are heavy-tailed and may not even possess variances or means; and DQR is easy to implement in the sense that it does not need to decide in advance which quantile(s) should be gathered. Asymptotic properties of related estimators are derived. Simulations and illustrative real examples are also given.  相似文献   

13.
14.
Stepwise variable selection procedures are computationally inexpensive methods for constructing useful regression models for a single dependent variable. At each step a variable is entered into or deleted from the current model, based on the criterion of minimizing the error sum of squares (SSE). When there is more than one dependent variable, the situation is more complex. In this article we propose variable selection criteria for multivariate regression which generalize the univariate SSE criterion. Specifically, we suggest minimizing some function of the estimated error covariance matrix: the trace, the determinant, or the largest eigenvalue. The computations associated with these criteria may be burdensome. We develop a computational framework based on the use of the SWEEP operator which greatly reduces these calculations for stepwise variable selection in multivariate regression.  相似文献   

15.
A semiparametric logistic regression model is proposed in which its nonparametric component is approximated with fixed-knot cubic B-splines. To assess the linearity of the nonparametric component, we construct a penalized likelihood ratio test statistic. When the number of knots is fixed, the null distribution of the test statistic is shown to be asymptotically the distribution of a linear combination of independent chi-squared random variables, each with one degree of freedom. We set the asymptotic null expectation of this test statistic equal to a value to determine the smoothing parameter value. Monte Carlo experiments are conducted to investigate the performance of the proposed test. Its practical use is illustrated with a real-life example.  相似文献   

16.
High dimensional models are getting much attention from diverse research fields involving very many parameters with a moderate size of data. Model selection is an important issue in such a high dimensional data analysis. Recent literature on theoretical understanding of high dimensional models covers a wide range of penalized methods including LASSO and SCAD. This paper presents a systematic overview of the recent development in high dimensional statistical models. We provide a brief review on the recent development of theory, methods, and guideline on applications of several penalized methods. The review includes appropriate settings to be implemented and limitations along with potential solution for each of the reviewed method. In particular, we provide a systematic review of statistical theory of the high dimensional methods by considering a unified high-dimensional modeling framework together with high level conditions. This framework includes (generalized) linear regression and quantile regression as its special cases. We hope our review helps researchers in this field to have a better understanding of the area and provides useful information to future study.  相似文献   

17.
ABSTRACT

Nowadays, generalized linear models have many applications. Some of these models which have more applications in the real world are the models with random effects; that is, some of the unknown parameters are considered random variables. In this article, this situation is considered in logistic regression models with a random intercept having exponential distribution. The aim is to obtain the Bayesian D-optimal design; thus, the method is to maximize the Bayesian D-optimal criterion. For the model was considered here, this criterion is a function of the quasi-information matrix that depends on the unknown parameters of the model. In the Bayesian D-optimal criterion, the expectation is acquired in respect of the prior distributions that are considered for the unknown parameters. Thus, it will only be a function of experimental settings (support points) and their weights. The prior distribution of the fixed parameters is considered uniform and normal. The Bayesian D-optimal design is finally calculated numerically by R3.1.1 software.  相似文献   

18.
The beta regression models are commonly used by practitioners to model variables that assume values in the standard unit interval (0, 1). In this paper, we consider the issue of variable selection for beta regression models with varying dispersion (VBRM), in which both the mean and the dispersion depend upon predictor variables. Based on a penalized likelihood method, the consistency and the oracle property of the penalized estimators are established. Following the coordinate descent algorithm idea of generalized linear models, we develop new variable selection procedure for the VBRM, which can efficiently simultaneously estimate and select important variables in both mean model and dispersion model. Simulation studies and body fat data analysis are presented to illustrate the proposed methods.  相似文献   

19.
In epidemiologic studies where the outcome is binary, the data often arise as clusters, as when siblings, friends or neighbors are used as matched controls in a case-control study. Conditional logistic regression (CLR) is typically used for such studies to estimate the odds ratio for an exposure of interest. However, CLR assumes the exposure coefficient is the same in every cluster, and CLR-based inference can be badly biased when homogeneity is violated. Existing methods for testing goodness-of-fit for CLR are not designed to detect such violations. Good alternative methods of analysis exist if one suspects there is heterogeneity across clusters. However, routine use of alternative robust approaches when there is no appreciable heterogeneity could cause loss of precision and be computationally difficult, particularly if the clusters are small. We propose a simple non-parametric test, the test of heterogeneous susceptibility (THS), to assess the assumption of homogeneity of a coefficient across clusters. The test is easy to apply and provides guidance as to the appropriate method of analysis. Simulations demonstrate that the THS has reasonable power to reveal violations of homogeneity. We illustrate by applying the THS to a study of periodontal disease.  相似文献   

20.
Sparsity-inducing penalties are useful tools for variable selection and are also effective for regression problems where the data are functions. We consider the problem of selecting not only variables but also decision boundaries in multiclass logistic regression models for functional data, using sparse regularization. The parameters of the functional logistic regression model are estimated in the framework of the penalized likelihood method with the sparse group lasso-type penalty, and then tuning parameters for the model are selected using the model selection criterion. The effectiveness of the proposed method is investigated through simulation studies and the analysis of a gene expression data set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号