首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In segmentation problems, inference on change-point position and model selection are two difficult issues due to the discrete nature of change-points. In a Bayesian context, we derive exact, explicit and tractable formulae for the posterior distribution of variables such as the number of change-points or their positions. We also demonstrate that several classical Bayesian model selection criteria can be computed exactly. All these results are based on an efficient strategy to explore the whole segmentation space, which is very large. We illustrate our methodology on both simulated data and a comparative genomic hybridization profile.  相似文献   

2.
In survival analysis, we may encounter the following three problems: nonlinear covariate effect, variable selection and measurement error. Existing studies only address one or two of these problems. The goal of this study is to fill the knowledge gap and develop a novel approach to simultaneously address all three problems. Specifically, a partially time-varying coefficient proportional hazards model is proposed to more flexibly describe covariate effects. Corrected score and conditional score approaches are employed to accommodate potential measurement error. For the selection of relevant variables and regularised estimation, a penalisation approach is adopted. It is shown that the proposed approach has satisfactory asymptotic properties. It can be effectively realised using an iterative algorithm. The performance of the proposed approach is assessed via simulation studies and further illustrated by application to data from an AIDS clinical trial.  相似文献   

3.
4.
5.
Summary.  We propose covariance-regularized regression, a family of methods for prediction in high dimensional settings that uses a shrunken estimate of the inverse covariance matrix of the features to achieve superior prediction. An estimate of the inverse covariance matrix is obtained by maximizing the log-likelihood of the data, under a multivariate normal model, subject to a penalty; it is then used to estimate coefficients for the regression of the response onto the features. We show that ridge regression, the lasso and the elastic net are special cases of covariance-regularized regression, and we demonstrate that certain previously unexplored forms of covariance-regularized regression can outperform existing methods in a range of situations. The covariance-regularized regression framework is extended to generalized linear models and linear discriminant analysis, and is used to analyse gene expression data sets with multiple class and survival outcomes.  相似文献   

6.
A maximum estimability (maxest) criterion is proposed for design classification and selection. It is an extension and refinement of Webb's resolution criterion for general factorial designs. By using the estimability vector associated with the maxest criterion, projective properties of nonregular designs are studied from the estimability perspective. Comparisons with other criteria are also discussed.  相似文献   

7.
In this paper we develop and test experimental methodologies for selection of the best alternative among a discrete number of available treatments. We consider a scenario where a researcher sequentially decides which treatments are assigned to experimental units. This problem is particularly challenging if a single measurement of the response to a treatment is time-consuming and there is a limited time for experimentation. This time can be decreased if it is possible to perform measurements in parallel. In this work we propose and discuss asynchronous extensions of two well-known Ranking & Selection policies, namely, Optimal Computing Budget Allocation (OCBA) and Knowledge Gradient (KG) policy. Our extensions (Asynchronous Optimal Computing Budget Allocation (AOCBA) and Asynchronous Knowledge Gradient (AKG), respectively) allow for parallel asynchronous allocation of measurements. Additionally, since the standard KG method is sequential (it can only allocate one experiment at a time) we propose a parallel synchronous extension of KG policy – Synchronous Knowledge Gradient (SKG). Computer simulations of our algorithms indicate that our parallel KG-based policies (AKG, SKG) outperform the standard OCBA method as well as AOCBA, if the number of evaluated alternatives is small or the computing/experimental budget is limited. For experimentations with large budgets and big sets of alternatives, both the OCBA and AOCBA policies are more efficient.  相似文献   

8.
In this paper we investigate the application of stochastic complexity theory to classification problems. In particular, we define the notion of admissible models as a function of problem complexity, the number of data pointsN, and prior belief. This allows us to derive general bounds relating classifier complexity with data-dependent parameters such as sample size, class entropy and the optimal Bayes error rate. We discuss the application of these results to a variety of problems, including decision tree classifiers, Markov models for image segmentation, and feedforward multilayer neural network classifiers.  相似文献   

9.
ABSTRACT

Joint models are statistical tools for estimating the association between time-to-event and longitudinal outcomes. One challenge to the application of joint models is its computational complexity. Common estimation methods for joint models include a two-stage method, Bayesian and maximum-likelihood methods. In this work, we consider joint models of a time-to-event outcome and multiple longitudinal processes and develop a maximum-likelihood estimation method using the expectation–maximization algorithm. We assess the performance of the proposed method via simulations and apply the methodology to a data set to determine the association between longitudinal systolic and diastolic blood pressure measures and time to coronary artery disease.  相似文献   

10.
11.
Abstract

Presence of detection limit (DL) in covariates causes inflated bias and inaccurate mean squared error to the estimators of the regression parameters. This paper suggests a response-driven multiple imputation method to correct the deleterious impact introduced by the covariate DL in the estimators of the parameters of simple logistic regression model. The performance of the method has been thoroughly investigated, and found to outperform the existing competing methods. The proposed method is computationally simple and easily implementable by using three existing R libraries. The method is robust to the violation of distributional assumption for the covariate of interest.  相似文献   

12.
Multiple biomarkers are frequently observed or collected for detecting or understanding a disease. The research interest of this article is to extend tools of receiver operating characteristic (ROC) analysis from univariate marker setting to multivariate marker setting for evaluating predictive accuracy of biomarkers using a tree-based classification rule. Using an arbitrarily combined and-or classifier, an ROC function together with a weighted ROC function (WROC) and their conjugate counterparts are introduced for examining the performance of multivariate markers. Specific features of the ROC and WROC functions and other related statistics are discussed in comparison with those familiar properties for univariate marker. Nonparametric methods are developed for estimating the ROC and WROC functions, and area under curve and concordance probability. With emphasis on population average performance of markers, the proposed procedures and inferential results are useful for evaluating marker predictability based on multivariate marker measurements with different choices of markers, and for evaluating different and-or combinations in classifiers.  相似文献   

13.
Evaluating and comparing process capabilities are important tasks of production management. Manufacturers should apply the process with the highest capability among competing processes. A process group selection method is developed to solve the process selection problem based on overall yields. The goal is to select the processes with the highest overall yield among I processes under multiple quality characteristics, I > 2. The proposed method uses Bonferroni adjustment to control the overall error rate of comparing multiple processes. The critical values and the required sample sizes for designated powers are provided for practical use.  相似文献   

14.
In longitudinal studies, nonlinear mixed-effects models have been widely applied to describe the intra- and the inter-subject variations in data. The inter-subject variation usually receives great attention and it may be partially explained by time-dependent covariates. However, some covariates may be measured with substantial errors and may contain missing values. We proposed a multiple imputation method, implemented by a Markov Chain Monte-Carlo method along with Gibbs sampler, to address the covariate measurement errors and missing data in nonlinear mixed-effects models. The multiple imputation method is illustrated in a real data example. Simulation studies show that the multiple imputation method outperforms the commonly used naive methods.  相似文献   

15.
We demonstrate how to perform direct simulation from the posterior distribution of a class of multiple changepoint models where the number of changepoints is unknown. The class of models assumes independence between the posterior distribution of the parameters associated with segments of data between successive changepoints. This approach is based on the use of recursions, and is related to work on product partition models. The computational complexity of the approach is quadratic in the number of observations, but an approximate version, which introduces negligible error, and whose computational cost is roughly linear in the number of observations, is also possible. Our approach can be useful, for example within an MCMC algorithm, even when the independence assumptions do not hold. We demonstrate our approach on coal-mining disaster data and on well-log data. Our method can cope with a range of models, and exact simulation from the posterior distribution is possible in a matter of minutes.  相似文献   

16.
Variable selection in multiple linear regression models is considered. It is shown that for the special case of orthogonal predictor variables, an adaptive pre-test-type procedure proposed by Venter and Steel [Simultaneous selection and estimation for the some zeros family of normal models, J. Statist. Comput. Simul. 45 (1993), pp. 129–146] is almost equivalent to least angle regression, proposed by Efron et al. [Least angle regression, Ann. Stat. 32 (2004), pp. 407–499]. A new adaptive pre-test-type procedure is proposed, which extends the procedure of Venter and Steel to the general non-orthogonal case in a multiple linear regression analysis. This new procedure is based on a likelihood ratio test where the critical value is determined data-dependently. A practical illustration and results from a simulation study are presented.  相似文献   

17.
ABSTRACT

Genetic data are frequently categorical and have complex dependence structures that are not always well understood. For this reason, clustering and classification based on genetic data, while highly relevant, are challenging statistical problems. Here we consider a versatile U-statistics-based approach for non-parametric clustering that allows for an unconventional way of solving these problems. In this paper we propose a statistical test to assess group homogeneity taking into account multiple testing issues and a clustering algorithm based on dissimilarities within and between groups that highly speeds up the homogeneity test. We also propose a test to verify classification significance of a sample in one of two groups. We present Monte Carlo simulations that evaluate size and power of the proposed tests under different scenarios. Finally, the methodology is applied to three different genetic data sets: global human genetic diversity, breast tumour gene expression and Dengue virus serotypes. These applications showcase this statistical framework's ability to answer diverse biological questions in the high dimension low sample size scenario while adapting to the specificities of the different datatypes.  相似文献   

18.
In what follows, we introduce two Bayesian models for feature selection in high-dimensional data, specifically designed for the purpose of classification. We use two approaches to the problem: one which discards the components which have “almost constant” values (Model 1) and another which retains the components for which variations in-between the groups are larger than those within the groups (Model 2). We assume that p?n, i.e. the number of components p is much larger than the number of samples n, and that only few of those p components are useful for subsequent classification. We show that particular cases of the above two models recover familiar variance or ANOVA-based component selection. When one has only two classes and features are a priori independent, Model 2 reduces to the Feature Annealed Independence Rule (FAIR) introduced by Fan and Fan (2008) and can be viewed as a natural generalization of FAIR to the case of L>2 classes. The performance of the methodology is studies via simulations and using a biological dataset of animal communication signals comprising 43 groups of electric signals recorded from tropical South American electric knife fishes.  相似文献   

19.
Handling dependence or not in feature selection is still an open question in supervised classification issues where the number of covariates exceeds the number of observations. Some recent papers surprisingly show the superiority of naive Bayes approaches based on an obviously erroneous assumption of independence, whereas others recommend to infer on the dependence structure in order to decorrelate the selection statistics. In the classical linear discriminant analysis (LDA) framework, the present paper first highlights the impact of dependence in terms of instability of feature selection. A second objective is to revisit the above issue using a flexible factor modeling for the covariance. This framework introduces latent components of dependence, conditionally on which a new Bayes consistency is defined. A procedure is then proposed for the joint estimation of the expectation and variance parameters of the model. The present method is compared to recent regularized diagonal discriminant analysis approaches, assuming independence among features, and regularized LDA procedures, both in terms of classification performance and stability of feature selection. The proposed method is implemented in the R package FADA, freely available from the R repository CRAN.  相似文献   

20.
Consistency of propensity score matching estimators hinges on the propensity score's ability to balance the distributions of covariates in the pools of treated and non-treated units. Conventional balance tests merely check for differences in covariates’ means, but cannot account for differences in higher moments. For this reason, this paper proposes balance tests which test for differences in the entire distributions of continuous covariates based on quantile regression (to derive Kolmogorov–Smirnov and Cramer–von-Mises–Smirnov-type test statistics) and resampling methods (for inference). Simulations suggest that these methods are very powerful and capture imbalances related to higher moments when conventional balance tests fail to do so.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号