首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Screening procedures play an important role in data analysis, especially in high-throughput biological studies where the datasets consist of more covariates than independent subjects. In this article, a Bayesian screening procedure is introduced for the binary response models with logit and probit links. In contrast to many screening rules based on marginal information involving one or a few covariates, the proposed Bayesian procedure simultaneously models all covariates and uses closed-form screening statistics. Specifically, we use the posterior means of the regression coefficients as screening statistics; by imposing a generalized g-prior on the regression coefficients, we derive the analytical form of their posterior means and compute the screening statistics without Markov chain Monte Carlo implementation. We evaluate the utility of the proposed Bayesian screening method using simulations and real data analysis. When the sample size is small, the simulation results suggest improved performance with comparable computational cost.  相似文献   

2.
Binary dynamic fixed and mixed logit models are extensively studied in the literature. These models are developed to examine the effects of certain fixed covariates through a parametric regression function as a part of the models. However, there are situations where one may like to consider more covariates in the model but their direct effect is not of interest. In this paper we propose a generalization of the existing binary dynamic logit (BDL) models to the semi-parametric longitudinal setup to address this issue of additional covariates. The regression function involved in such a semi-parametric BDL model contains (i) a parametric linear regression function in some primary covariates, and (ii) a non-parametric function in certain secondary covariates. We use a simple semi-parametric conditional quasi-likelihood approach for consistent estimation of the non-parametric function, and a semi-parametric likelihood approach for the joint estimation of the main regression and dynamic dependence parameters of the model. The finite sample performance of the estimation approaches is examined through a simulation study. The asymptotic properties of the estimators are also discussed. The proposed model and the estimation approaches are illustrated by reanalysing a longitudinal infectious disease data.  相似文献   

3.
We examine the finite sample properties of the maximum likelihood estimator for the binary logit model with random covariates. Previous studies have either relied on large-sample asymptotics or have assumed non-random covariates. Analytic expressions for the first-order bias and second-order mean squared error function for the maximum likelihood estimator in this model are derived, and we undertake numerical evaluations to illustrate these analytic results for the single covariate case. For various data distributions, the bias of the estimator is signed the same as the covariate’s coefficient, and both the absolute bias and the mean squared errors increase symmetrically with the absolute value of that parameter. The behaviour of a bias-adjusted maximum likelihood estimator, constructed by subtracting the (maximum likelihood) estimator of the first-order bias from the original estimator, is examined in a Monte Carlo experiment. This bias-correction is effective in all of the cases considered, and is recommended for use when this logit model is estimated by maximum likelihood using small samples.  相似文献   

4.
Conventional, parametric multinomial logit models are in general not sufficient for capturing the complex structures of electorates. In this paper, we use a semiparametric multinomial logit model to give an analysis of party preferences along individuals’ characteristics using a sample of the German electorate in 2006. Germany is a particularly strong case for more flexible nonparametric approaches in this context, since due to the reunification and the preceding different political histories the composition of the electorate is very complex and nuanced. Our analysis reveals strong interactions of the covariates age and income, and highly nonlinear shapes of the factor impacts for each party’s likelihood to be supported. Notably, we develop and provide a smoothed likelihood estimator for semiparametric multinomial logit models, which can be applied also in other application fields, such as, e.g., marketing.  相似文献   

5.
Item response theory (IRT) models provide an important contribution in the analysis of polytomous items, such as Likert scale items in survey data. We propose a bifactor generalized partial credit model (bifac-GPC model) with flexible link functions - probit, logit and complementary log-log - for use in analysis of ordered polytomous item scale data. In order to estimate the parameters of the proposed model, we use a Bayesian approach through the NUTS algorithm and show the advantages of implementing IRT models through the Stan language. We present an application to marketing scale data. Specifically, we apply the model to a dataset of non-users of a mobile banking service in order to highlight the advantages of this model. The results show important managerial implications resulting from consumer perceptions. We provide a discussion of the methodology for this type of data and extensions. Codes are available for practitioners and researchers to replicate the application.  相似文献   

6.
Proportion differences are often used to estimate and test treatment effects in clinical trials with binary outcomes. In order to adjust for other covariates or intra-subject correlation among repeated measures, logistic regression or longitudinal data analysis models such as generalized estimating equation or generalized linear mixed models may be used for the analyses. However, these analysis models are often based on the logit link which results in parameter estimates and comparisons in the log-odds ratio scale rather than in the proportion difference scale. A two-step method is proposed in the literature to approximate the calculation of confidence intervals for the proportion difference using a concept of effective sample sizes. However, the performance of this two-step method has not been investigated in their paper. On this note, we examine the properties of the two-step method and propose an adjustment to the effective sample size formula based on Bayesian information theory. Simulations are conducted to evaluate the performance and to show that the modified effective sample size improves the coverage property of the confidence intervals.  相似文献   

7.
We propose a test for state dependence in binary panel data with individual covariates. For this aim, we rely on a quadratic exponential model in which the association between the response variables is accounted for in a different way with respect to more standard formulations. The level of association is measured by a single parameter that may be estimated by a Conditional Maximum Likelihood (CML) approach. Under the dynamic logit model, the conditional estimator of this parameter converges to zero when the hypothesis of absence of state dependence is true. Therefore, it is possible to implement a t-test for this hypothesis which may be very simply performed and attains the nominal significance level under several structures of the individual covariates. Through an extensive simulation study, we find that our test has good finite sample properties and it is more robust to the presence of (autocorrelated) covariates in the model specification in comparison with other existing testing procedures for state dependence. The proposed approach is illustrated by two empirical applications: the first is based on data coming from the Panel Study of Income Dynamics and concerns employment and fertility; the second is based on the Health and Retirement Study and concerns the self reported health status.  相似文献   

8.
Summary. In the analysis of medical survival data, semiparametric proportional hazards models are widely used. When the proportional hazards assumption is not tenable, these models will not be suitable. Other models for covariate effects can be useful. In particular, we consider accelerated life models, in which the effect of covariates is to scale the quantiles of the base-line distribution. Solomon and Hutton have suggested that there is some robustness to misspecification of survival regression models. They showed that the relative importance of covariates is preserved under misspecification with assumptions of small coefficients and orthogonal transformation of covariates. We elucidate these results by applications to data from five trials which compare two common anti-epileptic drugs (carbamazepine versus sodium valporate monotherapy for epilepsy) and to survival of a cohort of people with cerebral palsy. Results on the robustness against model misspecification depend on the assumptions of small coefficients and on the underlying distribution of the data. These results hold in cerebral palsy but do not hold in epilepsy data which have early high hazard rates. The orthogonality of coefficients is not important. However, the choice of model is important for an estimation of the magnitude of effects, particularly if the base-line shape parameter indicates high initial hazard rates.  相似文献   

9.
In the analysis of time-to-event data with multiple causes using a competing risks Cox model, often the cause of failure is unknown for some of the cases. The probability of a missing cause is typically assumed to be independent of the cause given the time of the event and covariates measured before the event occurred. In practice, however, the underlying missing-at-random assumption does not necessarily hold. Motivated by colorectal cancer molecular pathological epidemiology analysis, we develop a method to conduct valid analysis when additional auxiliary variables are available for cases only. We consider a weaker missing-at-random assumption, with missing pattern depending on the observed quantities, which include the auxiliary covariates. We use an informative likelihood approach that will yield consistent estimates even when the underlying model for missing cause of failure is misspecified. The superiority of our method over naive methods in finite samples is demonstrated by simulation study results. We illustrate the use of our method in an analysis of colorectal cancer data from the Nurses’ Health Study cohort, where, apparently, the traditional missing-at-random assumption fails to hold.  相似文献   

10.
In this paper, we consider the Bayesian analysis of competing risks data, when the data are partially complete in both time and type of failures. It is assumed that the latent cause of failures have independent Weibull distributions with the common shape parameter, but different scale parameters. When the shape parameter is known, it is assumed that the scale parameters have Beta–Gamma priors. In this case, the Bayes estimates and the associated credible intervals can be obtained in explicit forms. When the shape parameter is also unknown, it is assumed that it has a very flexible log-concave prior density functions. When the common shape parameter is unknown, the Bayes estimates of the unknown parameters and the associated credible intervals cannot be obtained in explicit forms. We propose to use Markov Chain Monte Carlo sampling technique to compute Bayes estimates and also to compute associated credible intervals. We further consider the case when the covariates are also present. The analysis of two competing risks data sets, one with covariates and the other without covariates, have been performed for illustrative purposes. It is observed that the proposed model is very flexible, and the method is very easy to implement in practice.  相似文献   

11.
In many clinical studies more than one observer may be rating a characteristic measured on an ordinal scale. For example, a study may involve a group of physicians rating a feature seen on a pathology specimen or a computer tomography scan. In clinical studies of this kind, the weighted κ coefficient is a popular measure of agreement for ordinally scaled ratings. Our research stems from a study in which the severity of inflammatory skin disease was rated. The investigators wished to determine and evaluate the strength of agreement between a variable number of observers taking into account patient-specific (age and gender) as well as rater-specific (whether board certified in dermatology) characteristics. This suggested modelling κ as a function of these covariates. We propose the use of generalized estimating equations to estimate the weighted κ coefficient. This approach also accommodates unbalanced data which arise when some subjects are not judged by the same set of observers. Currently an estimate of overall κ for a simple unbalanced data set without covariates involving more than two observers is unavailable. In the inflammatory skin disease study none of the covariates were significantly associated with κ, thus enabling the calculation of an overall weighted κ for this unbalanced data set. In the second motivating example (multiple sclerosis), geographic location was significantly associated with κ. In addition we also compared the results of our method with current methods of testing for heterogeneity of weighted κ coefficients across strata (geographic location) that are available for balanced data sets.  相似文献   

12.
Graphs and networks are common ways of depicting information. In biology, many different biological processes are represented by graphs, such as regulatory networks, metabolic pathways and protein-protein interaction networks. This kind of a priori use of graphs is a useful supplement to the standard numerical data such as microarray gene expression data. In this paper, we consider the problem of regression analysis and variable selection when the covariates are linked on a graph. We study a graph-constrained regularization procedure and its theoretical properties for regression analysis to take into account the neighborhood information of the variables measured on a graph, where a smoothness penalty on the coefficients is defined as a quadratic form of the Laplacian matrix associated with the graph. We establish estimation and model selection consistency results and provide estimation bounds for both fixed and diverging numbers of parameters in regression models. We demonstrate by simulations and a real dataset that the proposed procedure can lead to better variable selection and prediction than existing methods that ignore the graph information associated with the covariates.  相似文献   

13.
We propose a method for estimating parameters in generalized linear models with missing covariates and a non-ignorable missing data mechanism. We use a multinomial model for the missing data indicators and propose a joint distribution for them which can be written as a sequence of one-dimensional conditional distributions, with each one-dimensional conditional distribution consisting of a logistic regression. We allow the covariates to be either categorical or continuous. The joint covariate distribution is also modelled via a sequence of one-dimensional conditional distributions, and the response variable is assumed to be completely observed. We derive the E- and M-steps of the EM algorithm with non-ignorable missing covariate data. For categorical covariates, we derive a closed form expression for the E- and M-steps of the EM algorithm for obtaining the maximum likelihood estimates (MLEs). For continuous covariates, we use a Monte Carlo version of the EM algorithm to obtain the MLEs via the Gibbs sampler. Computational techniques for Gibbs sampling are proposed and implemented. The parametric form of the assumed missing data mechanism itself is not `testable' from the data, and thus the non-ignorable modelling considered here can be viewed as a sensitivity analysis concerning a more complicated model. Therefore, although a model may have `passed' the tests for a certain missing data mechanism, this does not mean that we have captured, even approximately, the correct missing data mechanism. Hence, model checking for the missing data mechanism and sensitivity analyses play an important role in this problem and are discussed in detail. Several simulations are given to demonstrate the methodology. In addition, a real data set from a melanoma cancer clinical trial is presented to illustrate the methods proposed.  相似文献   

14.
In some problems in survival analysis there may be more than one plausible measure of time for each individual. For example mileage may be a better indication of the age of a car than months. This paper considers the possibility of combining two (or more) time scales measured on each individual into a single scale. A collapsibility condition is proposed for regarding the combined scale as fully informative regarding survival. The resulting model may be regarded as a generalization of the usual accelerated life model that allows time-dependent covariates. Parametric methods for the choice of time scale, for testing the validity of the collapsibility assumption and for parametric inference about the failure distribution along the new scale are discussed. Two examples are used to illustrate the methods, namely Hyde's (1980) Channing House data and a large cohort mortality study of asbestos workers in Quebec.  相似文献   

15.
Conditional power calculations are frequently used to guide the decision whether or not to stop a trial for futility or to modify planned sample size. These ignore the information in short‐term endpoints and baseline covariates, and thereby do not make fully efficient use of the information in the data. We therefore propose an interim decision procedure based on the conditional power approach which exploits the information contained in baseline covariates and short‐term endpoints. We will realize this by considering the estimation of the treatment effect at the interim analysis as a missing data problem. This problem is addressed by employing specific prediction models for the long‐term endpoint which enable the incorporation of baseline covariates and multiple short‐term endpoints. We show that the proposed procedure leads to an efficiency gain and a reduced sample size, without compromising the Type I error rate of the procedure, even when the adopted prediction models are misspecified. In particular, implementing our proposal in the conditional power approach enables earlier decisions relative to standard approaches, whilst controlling the probability of an incorrect decision. This time gain results in a lower expected number of recruited patients in case of stopping for futility, such that fewer patients receive the futile regimen. We explain how these methods can be used in adaptive designs with unblinded sample size re‐assessment based on the inverse normal P‐value combination method to control Type I error. We support the proposal by Monte Carlo simulations based on data from a real clinical trial.  相似文献   

16.
In this article, we highlight some interesting facts about Bayesian variable selection methods for linear regression models in settings where the design matrix exhibits strong collinearity. We first demonstrate via real data analysis and simulation studies that summaries of the posterior distribution based on marginal and joint distributions may give conflicting results for assessing the importance of strongly correlated covariates. The natural question is which one should be used in practice. The simulation studies suggest that posterior inclusion probabilities and Bayes factors that evaluate the importance of correlated covariates jointly are more appropriate, and some priors may be more adversely affected in such a setting. To obtain a better understanding behind the phenomenon, we study some toy examples with Zellner’s g-prior. The results show that strong collinearity may lead to a multimodal posterior distribution over models, in which joint summaries are more appropriate than marginal summaries. Thus, we recommend a routine examination of the correlation matrix and calculation of the joint inclusion probabilities for correlated covariates, in addition to marginal inclusion probabilities, for assessing the importance of covariates in Bayesian variable selection.  相似文献   

17.
Two-phase study designs can reduce cost and other practical burdens associated with large scale epidemiologic studies by limiting ascertainment of expensive covariates to a smaller but informative sub-sample (phase-II) of the main study (phase-I). During the analysis of such studies, however, subjects who are selected at phase-I but not at phase-II, remain informative as they may have partial covariate information. A variety of semi-parametric methods now exist for incorporating such data from phase-I subjects when the covariate information can be summarized into a finite number of strata. In this article, we consider extending the pseudo-score approach proposed by Chatterjee et al. (J Am Stat Assoc 98:158–168, 2003) using a kernel smoothing approach to incorporate information on continuous phase-I covariates. Practical issues and algorithms for implementing the methods using existing software are discussed. A sandwich-type variance estimator based on the influence function representation of the pseudo-score function is proposed. Finite sample performance of the methods are studies using simulated data. Advantage of the proposed smoothing approach over alternative methods that use discretized phase-I covariate information is illustrated using two-phase data simulated within the National Wilms Tumor Study (NWTS).  相似文献   

18.
In survival analysis, time-dependent covariates are usually present as longitudinal data collected periodically and measured with error. The longitudinal data can be assumed to follow a linear mixed effect model and Cox regression models may be used for modelling of survival events. The hazard rate of survival times depends on the underlying time-dependent covariate measured with error, which may be described by random effects. Most existing methods proposed for such models assume a parametric distribution assumption on the random effects and specify a normally distributed error term for the linear mixed effect model. These assumptions may not be always valid in practice. In this article, we propose a new likelihood method for Cox regression models with error-contaminated time-dependent covariates. The proposed method does not require any parametric distribution assumption on random effects and random errors. Asymptotic properties for parameter estimators are provided. Simulation results show that under certain situations the proposed methods are more efficient than the existing methods.  相似文献   

19.
Summary.  We present a new class of methods for high dimensional non-parametric regression and classification called sparse additive models. Our methods combine ideas from sparse linear modelling and additive non-parametric regression. We derive an algorithm for fitting the models that is practical and effective even when the number of covariates is larger than the sample size. Sparse additive models are essentially a functional version of the grouped lasso of Yuan and Lin. They are also closely related to the COSSO model of Lin and Zhang but decouple smoothing and sparsity, enabling the use of arbitrary non-parametric smoothers. We give an analysis of the theoretical properties of sparse additive models and present empirical results on synthetic and real data, showing that they can be effective in fitting sparse non-parametric models in high dimensional data.  相似文献   

20.
The benefits of adjusting for baseline covariates are not as straightforward with repeated binary responses as with continuous response variables. Therefore, in this study, we compared different methods for analyzing repeated binary data through simulations when the outcome at the study endpoint is of interest. Methods compared included chi‐square, Fisher's exact test, covariate adjusted/unadjusted logistic regression (Adj.logit/Unadj.logit), covariate adjusted/unadjusted generalized estimating equations (Adj.GEE/Unadj.GEE), covariate adjusted/unadjusted generalized linear mixed model (Adj.GLMM/Unadj.GLMM). All these methods preserved the type I error close to the nominal level. Covariate adjusted methods improved power compared with the unadjusted methods because of the increased treatment effect estimates, especially when the correlation between the baseline and outcome was strong, even though there was an apparent increase in standard errors. Results of the Chi‐squared test were identical to those for the unadjusted logistic regression. Fisher's exact test was the most conservative test regarding the type I error rate and also with the lowest power. Without missing data, there was no gain in using a repeated measures approach over a simple logistic regression at the final time point. Analysis of results from five phase III diabetes trials of the same compound was consistent with the simulation findings. Therefore, covariate adjusted analysis is recommended for repeated binary data when the study endpoint is of interest. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号