首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Interaction is very common in reality, but has received little attention in logistic regression literature. This is especially true for higher-order interactions. In conventional logistic regression, interactions are typically ignored. We propose a model selection procedure by implementing an association rules analysis. We do this by (1) exploring the combinations of input variables which have significant impacts to response (via association rules analysis); (2) selecting the potential (low- and high-order) interactions; (3) converting these potential interactions into new dummy variables; and (4) performing variable selections among all the input variables and the newly created dummy variables (interactions) to build up the optimal logistic regression model. Our model selection procedure establishes the optimal combination of main effects and potential interactions. The comparisons are made through thorough simulations. It is shown that the proposed method outperforms the existing methods in all cases. A real-life example is discussed in detail to demonstrate the proposed method.  相似文献   

2.
In the context of genetics and genomic medicine, gene-environment (G×E) interactions have a great impact on the risk of human diseases. Some existing methods for identifying G×E interactions are considered to be limited, since they analyze one or a few number of G factors at a time, assume linear effects of E factors, and use inefficient selection methods. In this paper, we propose a new method to identify significant main effects and G×E interactions. This is based on a semivarying coefficient least-squares support vector regression (LS-SVR) technique, which is devised by utilizing flexible semiparametric LS-SVR approach for censored survival data. This semivarying coefficient model is used to deal with the nonlinear effects of E factors. We also derive a generalized cross validation (GCV) function for determining the optimal values of hyperparameters of the proposed method. This GCV function is also used to identify significant main effects and G×E interactions. The proposed method is evaluated through numerical studies.  相似文献   

3.
The increasing amount of data stored in the form of dynamic interactions between actors necessitates the use of methodologies to automatically extract relevant information. The interactions can be represented by dynamic networks in which most existing methods look for clusters of vertices to summarize the data. In this paper, a new framework is proposed in order to cluster the vertices while detecting change points in the intensities of the interactions. These change points are key in the understanding of the temporal interactions. The model used involves non-homogeneous Poisson point processes with cluster-dependent piecewise constant intensity functions and common discontinuity points. A variational expectation maximization algorithm is derived for inference. We show that the pruned exact linear time method, originally developed for change points detection in univariate time series, can be considered for the maximization step. This allows the detection of both the number of change points and their location. Experiments on artificial and real datasets are carried out, and the proposed approach is compared with related methods.  相似文献   

4.
Multivariate adaptive regression spline fitting or MARS (Friedman 1991) provides a useful methodology for flexible adaptive regression with many predictors. The MARS methodology produces an estimate of the mean response that is a linear combination of adaptively chosen basis functions. Recently, a Bayesian version of MARS has been proposed (Denison, Mallick and Smith 1998a, Holmes and Denison, 2002) combining the MARS methodology with the benefits of Bayesian methods for accounting for model uncertainty to achieve improvements in predictive performance. In implementation of the Bayesian MARS approach, Markov chain Monte Carlo methods are used for computations, in which at each iteration of the algorithm it is proposed to change the current model by either (a) Adding a basis function (birth step) (b) Deleting a basis function (death step) or (c) Altering an existing basis function (change step). In the algorithm of Denison, Mallick and Smith (1998a), when a birth step is proposed, the type of basis function is determined by simulation from the prior. This works well in problems with a small number of predictors, is simple to program, and leads to a simple form for Metropolis-Hastings acceptance probabilities. However, in problems with very large numbers of predictors where many of the predictors are useless it may be difficult to find interesting interactions with such an approach. In the original MARS algorithm of Friedman (1991) a heuristic is used of building up higher order interactions from lower order ones, which greatly reduces the complexity of the search for good basis functions to add to the model. While we do not exactly follow the intuition of the original MARS algorithm in this paper, we nevertheless suggest a similar idea in which the Metropolis-Hastings proposals of Denison, Mallick and Smith (1998a) are altered to allow dependence on the current model. Our modification allows more rapid identification and exploration of important interactions, especially in problems with very large numbers of predictor variables and many useless predictors. Performance of the algorithms is compared in simulation studies.  相似文献   

5.
Selection of treatments to fit the specific needs for a certain patient is one major challenge in modern medicine. Personalized treatments rely on established patient–treatment interactions. In recent years, various statistical methods for the identification and estimation of interactions between relevant covariates and treatment were proposed. In this article, different available methods for detection and estimation of a covariate–treatment interaction for a time-to-event outcome, namely the standard Cox regression model assuming a linear interaction, the fractional polynomials approach for interaction, the modified outcome approach, the local partial-likelihood approach, and STEPP (Subpopulation Treatment Effect Pattern Plots) were applied to data from the SPACE trial, a randomized clinical trial comparing stent-protected angioplasty (CAS) to carotid endarterectomy (CEA) in patients with symptomatic stenosis, with the aim to analyse the interaction between age and treatment. Time from primary intervention to the first relevant event (any stroke or death) was considered as outcome parameter. The analyses suggest a qualitative interaction between patient age and treatment indicating a lower risk after treatment with CAS compared to CEA for younger patients, while for elderly patients a lower risk after CEA was observed. Differences in the statistical methods regarding the observed results, applicability, and interpretation are discussed.  相似文献   

6.
In biomedical research, profiling is now commonly conducted, generating high-dimensional genomic measurements (without loss of generality, say genes). An important analysis objective is to rank genes according to their marginal associations with a disease outcome/phenotype. Clinical-covariates, including for example clinical risk factors and environmental exposures, usually exist and need to be properly accounted for. In this study, we propose conducting marginal ranking of genes using a receiver operating characteristic (ROC) based method. This method can accommodate categorical, censored survival, and continuous outcome variables in a very similar manner. Unlike logistic-model-based methods, it does not make very specific assumptions on model, making it robust. In ranking genes, we account for both the main effects of clinical-covariates and their interactions with genes, and develop multiple diagnostic accuracy improvement measurements. Using simulation studies, we show that the proposed method is effective in that genes associated with or gene–covariate interactions associated with the outcome receive high rankings. In data analysis, we observe some differences between the rankings using the proposed method and the logistic-model-based method.  相似文献   

7.

We propose two nonparametric Bayesian methods to cluster big data and apply them to cluster genes by patterns of gene–gene interaction. Both approaches define model-based clustering with nonparametric Bayesian priors and include an implementation that remains feasible for big data. The first method is based on a predictive recursion which requires a single cycle (or few cycles) of simple deterministic calculations for each observation under study. The second scheme is an exact method that divides the data into smaller subsamples and involves local partitions that can be determined in parallel. In a second step, the method requires only the sufficient statistics of each of these local clusters to derive global clusters. Under simulated and benchmark data sets the proposed methods compare favorably with other clustering algorithms, including k-means, DP-means, DBSCAN, SUGS, streaming variational Bayes and an EM algorithm. We apply the proposed approaches to cluster a large data set of gene–gene interactions extracted from the online search tool “Zodiac.”

  相似文献   

8.
Latin hypercube designs (LHDs) are widely used in computer experiments because of their one-dimensional uniformity and other properties. Recently, a number of methods have been proposed to construct LHDs with properties that all linear effects are mutually orthogonal and orthogonal to all second-order effects, i.e., quadratic effects and bilinear interactions. This paper focuses on the construction of LHDs with the above desirable properties under the Fourier-polynomial model. A convenient and flexible algorithm for constructing such orthogonal LHDs is provided. Most of the resulting designs have different run sizes from that of Butler (2001), and thus are new and very suitable for factor screening and building Fourier-polynomial models in computer experiments as discussed in Butler (2001).  相似文献   

9.
Analysis of Variance by Randomization when Variances are Unequal   总被引:1,自引:0,他引:1  
If there are significant factor and interaction effects with analysis of variance using ran-domization inference, they can be detected by tests that compare the F -statistics for the real data with the distributions of these statistics obtained by randomly allocating either the original observations or the residuals to the various factor combinations. Such tests involve the assumption that the effect of factors or interactions is to shift the observations for a factor combination by a fixed amount, without changing the amount of variation at that combination. In reality the expected amount of variation at each factor combination, as measured by the variance, may not be constant, which may upset the properties of the tests for the effects of factors and interactions. This paper discusses several possible methods for adjusting the randomization procedure to allow for this type of problem, including generalizations of methods that have been proposed for comparing the means of several samples when there is unequal variance but no factor structure. A simulation study shows that the best of the methods examined is one for which the randomized sets of data are designed to approximate the distributions of F -statistics when unequal variance is present.  相似文献   

10.
This article presents a continuous-time Bayesian model for analyzing durations of behavior displays in social interactions. Duration data of social interactions are often complex because of repeated behaviors (events) at individual or group (e.g. dyad) level, multiple behaviors (multistates), and several choices of exit from a current event (competing risks). A multilevel, multistate model is proposed to adequately characterize the behavioral processes. The model incorporates dyad-specific and transition-specific random effects to account for heterogeneity among dyads and interdependence among competing risks. The proposed method is applied to child–parent observational data derived from the School Transitions Project to assess the relation of emotional expression in child–parent interaction to risk for early and persisting child conduct problems.  相似文献   

11.
Interactions among multiple genes across the genome may contribute to the risks of many complex human diseases. Whole-genome single nucleotide polymorphisms (SNPs) data collected for many thousands of SNP markers from thousands of individuals under the case-control design promise to shed light on our understanding of such interactions. However, nearby SNPs are highly correlated due to linkage disequilibrium (LD) and the number of possible interactions is too large for exhaustive evaluation. We propose a novel Bayesian method for simultaneously partitioning SNPs into LD-blocks and selecting SNPs within blocks that are associated with the disease, either individually or interactively with other SNPs. When applied to homogeneous population data, the method gives posterior probabilities for LD-block boundaries, which not only result in accurate block partitions of SNPs, but also provide measures of partition uncertainty. When applied to case-control data for association mapping, the method implicitly filters out SNP associations created merely by LD with disease loci within the same blocks. Simulation study showed that this approach is more powerful in detecting multi-locus associations than other methods we tested, including one of ours. When applied to the WTCCC type 1 diabetes data, the method identified many previously known T1D associated genes, including PTPN22, CTLA4, MHC, and IL2RA. The method also revealed some interesting two-way associations that are undetected by single SNP methods. Most of the significant associations are located within the MHC region. Our analysis showed that the MHC SNPs form long-distance joint associations over several known recombination hotspots. By controlling the haplotypes of the MHC class II region, we identified additional associations in both MHC class I (HLA-A, HLA-B) and class III regions (BAT1). We also observed significant interactions between genes PRSS16, ZNF184 in the extended MHC region and the MHC class II genes. The proposed method can be broadly applied to the classification problem with correlated discrete covariates.  相似文献   

12.
Kane has discussed a simple method for identifying the confounded interactions from 2n factorial experiments when a replication consists of (1) two blocks and (2) more than two blocks. It should be noted that Kane's method holds only for (1) regular design and (2) when one interaction is confounded. In the present investigation, we proposed a new way of identifying the confounded designs and the confounded interactions in 2n factorial experiments. Furthermore, the same method is extended to 3n and Sn factorial experiments.  相似文献   

13.
Post marketing data offer rich information and cost-effective resources for physicians and policy-makers to address some critical scientific questions in clinical practice. However, the complex confounding structures (e.g., nonlinear and nonadditive interactions) embedded in these observational data often pose major analytical challenges for proper analysis to draw valid conclusions. Furthermore, often made available as electronic health records (EHRs), these data are usually massive with hundreds of thousands observational records, which introduce additional computational challenges. In this paper, for comparative effectiveness analysis, we propose a statistically robust yet computationally efficient propensity score (PS) approach to adjust for the complex confounding structures. Specifically, we propose a kernel-based machine learning method for flexibly and robustly PS modeling to obtain valid PS estimation from observational data with complex confounding structures. The estimated propensity score is then used in the second stage analysis to obtain the consistent average treatment effect estimate. An empirical variance estimator based on the bootstrap is adopted. A split-and-merge algorithm is further developed to reduce the computational workload of the proposed method for big data, and to obtain a valid variance estimator of the average treatment effect estimate as a by-product. As shown by extensive numerical studies and an application to postoperative pain EHR data comparative effectiveness analysis, the proposed approach consistently outperforms other competing methods, demonstrating its practical utility.  相似文献   

14.
Multiplicative-interaction (M-I) logit models are proposed for three-way IxJx2 contingency tables where the third variable constitutes a binary response. Models are derived by assigning unknown scores to the categories and forming product interactions from them. Asymptotic results under special sampling constraints are derived for maximum likelihood estimates and the goodness-of-fit statistics. The class of models proposed in this paper are found to be useful when no obvious scores are available. An example is included.  相似文献   

15.
We propose third-order likelihood-based methods to derive highly accurate p-value approximations for testing autocorrelated disturbances in nonlinear regression models. The proposed methods are particularly accurate for small- and medium-sized samples whereas commonly used first-order methods like the signed log-likelihood ratio test, the Kobayashi (1991) test, and the standardized test can be seriously misleading in these cases. Two Monte Carlo simulations are provided to show how the proposed methods outperform the above first-order methods. An empirical example applied to US population census data is also provided to illustrate the implementation of the proposed method and its usefulness in practice.  相似文献   

16.
Summary.  As biological knowledge accumulates rapidly, gene networks encoding genomewide gene–gene interactions have been constructed. As an improvement over the standard mixture model that tests all the genes identically and independently distributed a priori , Wei and co-workers have proposed modelling a gene network as a discrete or Gaussian Markov random field (MRF) in a mixture model to analyse genomic data. However, how these methods compare in practical applications is not well understood and this is the aim here. We also propose two novel constraints in prior specifications for the Gaussian MRF model and a fully Bayesian approach to the discrete MRF model. We assess the accuracy of estimating the false discovery rate by posterior probabilities in the context of MRF models. Applications to a chromatin immuno-precipitation–chip data set and simulated data show that the modified Gaussian MRF models have superior performance compared with other models, and both MRF-based mixture models, with reasonable robustness to misspecified gene networks, outperform the standard mixture model.  相似文献   

17.
The bootstrap particle filter (BPF) is the cornerstone of many algorithms used for solving generally intractable inference problems with hidden Markov models. The long-term stability of the BPF arises from particle interactions that typically make parallel implementations of the BPF nontrivial. We propose a method whereby particle interaction is done in several stages. With the proposed method, full interaction can be accomplished even if we allow only pairwise communications between processing elements at each stage. We show that our method preserves the consistency and the long-term stability of the BPF, although our analysis suggests that the constraints on the stagewise interactions introduce errors leading to a lower convergence rate than standard Monte Carlo. The proposed method also suggests a new, more flexible, adaptive resampling scheme, which, according to our numerical experiments, is the method of choice, displaying a notable gain in efficiency in certain parallel computing scenarios.  相似文献   

18.
When there is an outlier in the data set, the efficiency of traditional methods decreases. In order to solve this problem, Kadilar et al. (2007) adapted Huber-M method which is only one of robust regression methods to ratio-type estimators and decreased the effect of outlier problem. In this study, new ratio-type estimators are proposed by considering Tukey-M, Hampel M, Huber MM, LTS, LMS and LAD robust methods based on the Kadilar et al. (2007). Theoretically, we obtain the mean square error (MSE) for these estimators. We compared with MSE values of proposed estimators and MSE values of estimators based on Huber-M and OLS methods. As a result of these comparisons, we observed that our proposed estimators give more efficient results than both Huber M approach which was proposed by Kadilar et al. (2007) and OLS approach. Also, under all conditions, all of the other proposed estimators except Lad method are more efficient than robust estimators proposed by Kadilar et al. (2007). And, these theoretical results are supported with the aid of a numerical example and simulation by basing on data that includes an outlier.  相似文献   

19.
Recently, Zhang [Simultaneous confidence intervals for several inverse Gaussian populations. Stat Probab Lett. 2014;92:125–131] proposed simultaneous pairwise confidence intervals (SPCIs) based on the fiducial generalized pivotal quantity concept to make inferences about the inverse Gaussian means under heteroscedasticity. In this paper, we propose three new methods for constructing SPCIs to make inferences on the means of several inverse Gaussian distributions when scale parameters and sample sizes are unequal. One of the methods results in a set of classic SPCIs (in the sense that it is not simulation-based inference) and the two others are based on a parametric bootstrap approach. The advantages of our proposed methods over Zhang’s (2014) method are: (i) the simulation results show that the coverage probability of the proposed parametric bootstrap approaches is fairly close to the nominal confidence coefficient while the coverage probability of Zhang’s method is smaller than the nominal confidence coefficient when the number of groups and the variance of groups are large and (ii) the proposed set of classic SPCIs is conservative in contrast to Zhang’s method.  相似文献   

20.
Feature screening and variable selection are fundamental in analysis of ultrahigh-dimensional data, which are being collected in diverse scientific fields at relatively low cost. Distance correlation-based sure independence screening (DC-SIS) has been proposed to perform feature screening for ultrahigh-dimensional data. The DC-SIS possesses sure screening property and filters out unimportant predictors in a model-free manner. Like all independence screening methods, however, it fails to detect the truly important predictors which are marginally independent of the response variable due to correlations among predictors. When there are many irrelevant predictors which are highly correlated with some strongly active predictors, the independence screening may miss other active predictors with relatively weak marginal signals. To improve the performance of DC-SIS, we introduce an effective iterative procedure based on distance correlation to detect all truly important predictors and potentially interactions in both linear and nonlinear models. Thus, the proposed iterative method possesses the favourable model-free and robust properties. We further illustrate its excellent finite-sample performance through comprehensive simulation studies and an empirical analysis of the rat eye expression data set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号