首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 437 毫秒
1.
Biclustering is the simultaneous clustering of two related dimensions, for example, of individuals and features, or genes and experimental conditions. Very few statistical models for biclustering have been proposed in the literature. Instead, most of the research has focused on algorithms to find biclusters. The models underlying them have not received much attention. Hence, very little is known about the adequacy and limitations of the models and the efficiency of the algorithms. In this work, we shed light on associated statistical models behind the algorithms. This allows us to generalize most of the known popular biclustering techniques, and to justify, and many times improve on, the algorithms used to find the biclusters. It turns out that most of the known techniques have a hidden Bayesian flavor. Therefore, we adopt a Bayesian framework to model biclustering. We propose a measure of biclustering complexity (number of biclusters and overlapping) through a penalized plaid model, and present a suitable version of the deviance information criterion to choose the number of biclusters, a problem that has not been adequately addressed yet. Our ideas are motivated by the analysis of gene expression data.  相似文献   

2.
We propose a multiple imputation method to deal with incomplete categorical data. This method imputes the missing entries using the principal component method dedicated to categorical data: multiple correspondence analysis (MCA). The uncertainty concerning the parameters of the imputation model is reflected using a non-parametric bootstrap. Multiple imputation using MCA (MIMCA) requires estimating a small number of parameters due to the dimensionality reduction property of MCA. It allows the user to impute a large range of data sets. In particular, a high number of categories per variable, a high number of variables or a small number of individuals are not an issue for MIMCA. Through a simulation study based on real data sets, the method is assessed and compared to the reference methods (multiple imputation using the loglinear model, multiple imputation by logistic regressions) as well to the latest works on the topic (multiple imputation by random forests or by the Dirichlet process mixture of products of multinomial distributions model). The proposed method provides a good point estimate of the parameters of the analysis model considered, such as the coefficients of a main effects logistic regression model, and a reliable estimate of the variability of the estimators. In addition, MIMCA has the great advantage that it is substantially less time consuming on data sets of high dimensions than the other multiple imputation methods.  相似文献   

3.
The number of parameters mushrooms in a linear mixed effects (LME) model in the case of multivariate repeated measures data. Computation of these parameters is a real problem with the increase in the number of response variables or with the increase in the number of time points. The problem becomes more intricate and involved with the addition of additional random effects. A multivariate analysis is not possible in a small sample setting. We propose a method to estimate these many parameters in bits and pieces from baby models, by taking a subset of response variables at a time, and finally using these bits and pieces at the end to get the parameter estimates for the mother model, with all variables taken together. Applying this method one can calculate the fixed effects, the best linear unbiased predictions (BLUPs) for the random effects in the model, and also the BLUPs at each time of observation for each response variable, to monitor the effectiveness of the treatment for each subject. The proposed method is illustrated with an example of multiple response variables measured over multiple time points arising from a clinical trial in osteoporosis.  相似文献   

4.
In some physical systems, where the goal is to describe behaviour over an entire field using scattered observations, a multiple regression model can be derived from the discretization of a continuous process. These models often have more parameters than observations. We propose a technique for constructing smoothed estimators in this situation. Our method assumes the model has random explanatory and response variables, and imposes a smoothness penalty based on the signal-to-noise ratio of the model. Results are présentés using a known value for the ratio, and a method for estimating the ratio is discussed. The procedure is applied to modelling temperature measurements taken in the California Current.  相似文献   

5.
In this paper, we propose a new generalized multiple frequency model to analyze non-stationary signals. The model under the assumption of additive stationary errors can be used quite effectively to analyze different signals. We propose the usual least-squares estimators to estimate the unknown parameters and it is shown that the estimators are strongly consistent. We obtain the asymptotic distributions also. The performance of the proposed model is compared with the multiple frequency model using Monte Carlo simulations. Finally, several real data are analyzed using both the proposed model and the multiple frequency model.  相似文献   

6.
In this paper Bayesian methods are applied to a stochastic volatility model using both the prices of the asset and the prices of options written on the asset. Posterior densities for all model parameters, latent volatilities and the market price of volatility risk are produced via a Markov Chain Monte Carlo (MCMC) sampling algorithm. Candidate draws for the unobserved volatilities are obtained in blocks by applying the Kalman filter and simulation smoother to a linearization of a nonlinear state space representation of the model. Crucially, information from both the spot and option prices affects the draws via the specification of a bivariate measurement equation, with implied Black–Scholes volatilities used to proxy observed option prices in the candidate model. Alternative models nested within the Heston (1993) framework are ranked via posterior odds ratios, as well as via fit, predictive and hedging performance. The method is illustrated using Australian News Corporation spot and option price data.  相似文献   

7.
Summary. A Bayesian method for segmenting weed and crop textures is described and implemented. The work forms part of a project to identify weeds and crops in images so that selective crop spraying can be carried out. An image is subdivided into blocks and each block is modelled as a single texture. The number of different textures in the image is assumed unknown. A hierarchical Bayesian procedure is used where the texture labels have a Potts model (colour Ising Markov random field) prior and the pixels within a block are distributed according to a Gaussian Markov random field, with the parameters dependent on the type of texture. We simulate from the posterior distribution by using a reversible jump Metropolis–Hastings algorithm, where the number of different texture components is allowed to vary. The methodology is applied to a simulated image and then we carry out texture segmentation on the weed and crop images that motivated the work.  相似文献   

8.
Abstract

Presence of detection limit (DL) in covariates causes inflated bias and inaccurate mean squared error to the estimators of the regression parameters. This paper suggests a response-driven multiple imputation method to correct the deleterious impact introduced by the covariate DL in the estimators of the parameters of simple logistic regression model. The performance of the method has been thoroughly investigated, and found to outperform the existing competing methods. The proposed method is computationally simple and easily implementable by using three existing R libraries. The method is robust to the violation of distributional assumption for the covariate of interest.  相似文献   

9.
In this paper Bayesian methods are applied to a stochastic volatility model using both the prices of the asset and the prices of options written on the asset. Posterior densities for all model parameters, latent volatilities and the market price of volatility risk are produced via a Markov Chain Monte Carlo (MCMC) sampling algorithm. Candidate draws for the unobserved volatilities are obtained in blocks by applying the Kalman filter and simulation smoother to a linearization of a nonlinear state space representation of the model. Crucially, information from both the spot and option prices affects the draws via the specification of a bivariate measurement equation, with implied Black-Scholes volatilities used to proxy observed option prices in the candidate model. Alternative models nested within the Heston (1993) framework are ranked via posterior odds ratios, as well as via fit, predictive and hedging performance. The method is illustrated using Australian News Corporation spot and option price data.  相似文献   

10.
This paper considers a statistical model for the detection mechanism of qualitative microbiological test methods with a parameter for the detection proportion (the probability to detect a single organism) and a parameter for the false positive rate. It is demonstrated that the detection proportion and the bacterial density cannot be estimated separately, not even in a multiple dilution experiment. Only the product can be estimated, changing the interpretation of the most probable number estimator. The asymptotic power of the likelihood ratio statistic for comparing an alternative method with the compendial method, is optimal for a single dilution experiment. The bacterial density should either be close to two CFUs per test unit or equal to zero, depending on differences in the model parameters between the two test methods. The proposed strategy for method validation is to use these two dilutions and test for differences in the two model parameters, addressing the validation parameters specificity and accuracy. Robustness of these two parameters might still be required, but all other validation parameters can be omitted. A confidence interval‐based approach for the ratio of the detection proportions for the two methods is recommended, since it is most informative and close to the power of the likelihood ratio test. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

11.
范新妍等 《统计研究》2021,38(2):99-113
传统信用评分方法主要利用统计分类方法,只能预测借款人是否会发生违约,但不能预测违约发生的时点。治愈率模型是二分类和生存分析的混合模型,不仅可以预测是否会发生违约,而且可以预测违约发生的时点,比传统二分类方法可以提供更多的信息。另外,随着大数据的发展,数据源越来越多,针对相同或者相似任务,可以收集到多个数据集,本文提出了融合多源数据的整合治愈率模型,可以对多个数据集同时建模和估计参数,通过复合惩罚函数进行组间和组内双层变量选择,并通过促进两个子模型回归系数符号相同,提高模型的可解释性。通过数值模拟发现,所提方法在变量选择和参数估计上均有明显优势。最后,将所提方法应用于信用贷款的违约时点预测中,模型表现良好。  相似文献   

12.
In this paper, we define a multiple cases deletion model (MCDM) in linear measurement error models (LMEMs). Then, by using the corrected score method of Nakamura (1990), the estimation of parameters is obtained. Furthermore, Based on MCDM, we provide computationally inexpensive deletion diagnostic tools for LMEMs. An example illustrates that our method is useful for diagnosing influential subsets of observations.  相似文献   

13.
We consider the semiparametric proportional hazards model for the cause-specific hazard function in analysis of competing risks data with missing cause of failure. The inverse probability weighted equation and augmented inverse probability weighted equation are proposed for estimating the regression parameters in the model, and their theoretical properties are established for inference. Simulation studies demonstrate that the augmented inverse probability weighted estimator is doubly robust and the proposed method is appropriate for practical use. The simulations also compare the proposed estimators with the multiple imputation estimator of Lu and Tsiatis (2001). The application of the proposed method is illustrated using data from a bone marrow transplant study.  相似文献   

14.
Statistical models are sometimes incorporated into computer software for making predictions about future observations. When the computer model consists of a single statistical model this corresponds to estimation of a function of the model parameters. This paper is concerned with the case that the computer model implements multiple, individually-estimated statistical sub-models. This case frequently arises, for example, in models for medical decision making that derive parameter information from multiple clinical studies. We develop a method for calculating the posterior mean of a function of the parameter vectors of multiple statistical models that is easy to implement in computer software, has high asymptotic accuracy, and has a computational cost linear in the total number of model parameters. The formula is then used to derive a general result about posterior estimation across multiple models. The utility of the results is illustrated by application to clinical software that estimates the risk of fatal coronary disease in people with diabetes.  相似文献   

15.
Resolvable block designs for v varieties in blocks of size k require v to be a multiple of k so that all blocks are of the same size. If a factorization of v is not possible then a resolvable design with blocks of unequal size is necessary. Patterson & Williams (1976) suggested the use of designs derived from α -designs and conjectured that such designs are likely to be very efficient in the class of resolvable designs with block sizes k and k – 1. This paper examines these derived designs and compares them with designs generated directly using an interchange algorithm. It concludes that the derived designs should be used when v is large, but that for small v they can be relatively inefficient.  相似文献   

16.
The topic of heterogeneity in the analysis of recurrent event data has received considerable attention recent times. Frailty models are widely employed in such situations as they allow us to model the heterogeneity through common random effect. In this paper, we introduce a shared frailty model for gap time distributions of recurrent events with multiple causes. The parameters of the model are estimated using EM algorithm. An extensive simulation study is used to assess the performance of the method. Finally, we apply the proposed model to a real-life data.  相似文献   

17.
The measurable multiple bio-markers for a disease are used as indicators for studying the response variable of interest in order to monitor and model disease progression. However, it is common for subjects to drop out of the studies prematurely resulting in unbalanced data and hence complicating the inferences involving such data. In this paper we consider a case where data are unbalanced among subjects and also within a subject because for some reason only a subset of the multiple outcomes of the response variable are observed at any one occasion. We propose a nonlinear mixed-effects model for the multivariate response variable data and derive a joint likelihood function that takes into account the partial dropout of the outcomes of the response variable. We further show how the methodology can be used in the estimation of the parameters that characterise HIV disease dynamics. An approximation technique of the parameters is also given and illustrated using a routine observational HIV dataset.  相似文献   

18.
In modern scientific research, multiblock missing data emerges with synthesizing information across multiple studies. However, existing imputation methods for handling block-wise missing data either focus on the single-block missing pattern or heavily rely on the model structure. In this study, we propose a single regression-based imputation algorithm for multiblock missing data. First, we conduct a sparse precision matrix estimation based on the structure of block-wise missing data. Second, we impute the missing blocks with their means conditional on the observed blocks. Theoretical results about variable selection and estimation consistency are established in the context of a generalized linear model. Moreover, simulation studies show that compared with existing methods, the proposed imputation procedure is robust to various missing mechanisms because of the good properties of regression imputation. An application to Alzheimer's Disease Neuroimaging Initiative data also confirms the superiority of our proposed method.  相似文献   

19.
A well-known problem in multiple regression is that it is possible to reject the hypothesis that all slope parameters are equal to zero, yet when applying the usual Student's T-test to the individual parameters, no significant differences are found. An alternative strategy is to estimate prediction error via the 0.632 bootstrap method for all models of interest and declare the parameters associated with the model that yields the smallest prediction error to differ from zero. The main results in this paper are that this latter strategy can have practical value versus Student's T; replacing squared error with absolute error can be beneficial in some situations and replacing least squares with an extension of the Theil-Sen estimator can substantially increase the probability of identifying the correct model under circumstances that are described.  相似文献   

20.
Various diagnostic statistics have been proposed to help identify cases that markedly affect, or influence, the features of a fitted linear regression model. Once influential cases are found, decisions can be made regarding their worth in the model building process. Since a subject data set may contain both singly influential cases and influential multiple case subsets, the capability to assess the joint influence of cases is needed for a complete analysis. The aim of this work is to briefly review Cook’s distance measure for multiple cases, an effective diagnostic for this purpose, and present a method using it to search for influential multiple case subsets. The method is applied in two example analyses by way of a MINITAB Statistical Software macro.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号