首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
An efficient optimization algorithm for identifying the best least squares regression model under the condition of non-negative coefficients is proposed. The algorithm exposits an innovative solution via the unrestricted least squares and is based on the regression tree and branch-and-bound techniques for computing the best subset regression. The aim is to filling a gap in computationally tractable solutions to the non-negative least squares problem and model selection. The proposed method is illustrated with a real dataset. Experimental results on real and artificial random datasets confirm the computational efficacy of the new strategy and demonstrates its ability to solve large model selection problems that are subject to non-negativity constrains.  相似文献   

2.
Model-based clustering is a method that clusters data with an assumption of a statistical model structure. In this paper, we propose a novel model-based hierarchical clustering method for a finite statistical mixture model based on the Fisher distribution. The main foci of the proposed method are: (a) provide efficient solution to estimate the parameters of a Fisher mixture model (FMM); (b) generate a hierarchy of FMMs and (c) select the optimal model. To this aim, we develop a Bregman soft clustering method for FMM. Our model estimation strategy exploits Bregman divergence and hierarchical agglomerative clustering. Whereas, our model selection strategy comprises a parsimony-based approach and an evaluation graph-based approach. We empirically validate our proposed method by applying it on simulated data. Next, we apply the method on real data to perform depth image analysis. We demonstrate that the proposed clustering method can be used as a potential tool for unsupervised depth image analysis.  相似文献   

3.
In this article, a new algorithm for rather expensive simulation problems is presented, which consists of two phases. In the first phase, as a model-based algorithm, the simulation output is used directly in the optimization stage. In the second phase, the simulation model is replaced by a valid metamodel. In addition, a new optimization algorithm is presented. To evaluate the performance of the proposed algorithm, it is applied to the (s,S) inventory problem as well as to five test functions. Numerical results show that the proposed algorithm leads to better solutions with less computational time than the corresponding metamodel-based algorithm.  相似文献   

4.
5.
In observational studies, unbalanced observed covariates between treatment groups often cause biased inferences on the estimation of treatment effects. Recently, generalized propensity score (GPS) has been proposed to overcome this problem; however, a practical technique to apply the GPS is lacking. This study demonstrates how clustering algorithms can be used to group similar subjects based on transformed GPS. We compare four popular clustering algorithms: k-means clustering (KMC), model-based clustering, fuzzy c-means clustering and partitioning around medoids based on the following three criteria: average dissimilarity between subjects within clusters, average Dunn index and average silhouette width under four various covariate scenarios. Simulation studies show that the KMC algorithm has overall better performance compared with the other three clustering algorithms. Therefore, we recommend using the KMC algorithm to group similar subjects based on the transformed GPS.  相似文献   

6.
In this article, to reduce computational load in performing Bayesian variable selection, we used a variant of reversible jump Markov chain Monte Carlo methods, and the Holmes and Held (HH) algorithm, to sample model index variables in logistic mixed models involving a large number of explanatory variables. Furthermore, we proposed a simple proposal distribution for model index variables, and used a simulation study and real example to compare the performance of the HH algorithm with our proposed and existing proposal distributions. The results show that the HH algorithm with our proposed proposal distribution is a computationally efficient and reliable selection method.  相似文献   

7.
When a genetic algorithm (GA) is employed in a statistical problem, the result is affected by both variability due to sampling and the stochastic elements of algorithm. Both of these components should be controlled in order to obtain reliable results. In the present work we analyze parametric estimation problems tackled by GAs, and pursue two objectives: the first one is related to a formal variability analysis of final estimates, showing that it can be easily decomposed in the two sources of variability. In the second one we introduce a framework of GA estimation with fixed computational resources, which is a form of statistical and the computational tradeoff question, crucial in recent problems. In this situation the result should be optimal from both the statistical and computational point of view, considering the two sources of variability and the constraints on resources. Simulation studies will be presented for illustrating the proposed method and the statistical and computational tradeoff question.  相似文献   

8.
Bayesian hierarchical modeling with Gaussian process random effects provides a popular approach for analyzing point-referenced spatial data. For large spatial data sets, however, generic posterior sampling is infeasible due to the extremely high computational burden in decomposing the spatial correlation matrix. In this paper, we propose an efficient algorithm—the adaptive griddy Gibbs (AGG) algorithm—to address the computational issues with large spatial data sets. The proposed algorithm dramatically reduces the computational complexity. We show theoretically that the proposed method can approximate the real posterior distribution accurately. The sufficient number of grid points for a required accuracy has also been derived. We compare the performance of AGG with that of the state-of-the-art methods in simulation studies. Finally, we apply AGG to spatially indexed data concerning building energy consumption.  相似文献   

9.
Summary.  The main statistical problem in many epidemiological studies which involve repeated measurements of surrogate markers is the frequent occurrence of missing data. Standard likelihood-based approaches like the linear random-effects model fail to give unbiased estimates when data are non-ignorably missing. In human immunodeficiency virus (HIV) type 1 infection, two markers which have been widely used to track progression of the disease are CD4 cell counts and HIV–ribonucleic acid (RNA) viral load levels. Repeated measurements of these markers tend to be informatively censored, which is a special case of non-ignorable missingness. In such cases, we need to apply methods that jointly model the observed data and the missingness process. Despite their high correlation, longitudinal data of these markers have been analysed independently by using mainly random-effects models. Touloumi and co-workers have proposed a model termed the joint multivariate random-effects model which combines a linear random-effects model for the underlying pattern of the marker with a log-normal survival model for the drop-out process. We extend the joint multivariate random-effects model to model simultaneously the CD4 cell and viral load data while adjusting for informative drop-outs due to disease progression or death. Estimates of all the model's parameters are obtained by using the restricted iterative generalized least squares method or a modified version of it using the EM algorithm as a nested algorithm in the case of censored survival data taking also into account non-linearity in the HIV–RNA trend. The method proposed is evaluated and compared with simpler approaches in a simulation study. Finally the method is applied to a subset of the data from the 'Concerted action on seroconversion to AIDS and death in Europe' study.  相似文献   

10.
In this paper, we consider a partially linear transformation model for data subject to length-biasedness and right-censoring which frequently arise simultaneously in biometrics and other fields. The partially linear transformation model can account for nonlinear covariate effects in addition to linear effects on survival time, and thus reconciles a major disadvantage of the popular semiparamnetric linear transformation model. We adopt local linear fitting technique and develop an unbiased global and local estimating equations approach for the estimation of unknown covariate effects. We provide an asymptotic justification for the proposed procedure, and develop an iterative computational algorithm for its practical implementation, and a bootstrap resampling procedure for estimating the standard errors of the estimator. A simulation study shows that the proposed method performs well in finite samples, and the proposed estimator is applied to analyse the Oscar data.  相似文献   

11.
Multivariable optimization under large data environment concerns with how to reliably obtain a set of optimization results from a mass of data that influence the object function complexly. This is an important issue in statistical calculation because the complexity between variable parameters leads to repeated statistical calculation analysis and a significant amount of data waste. A statistical multivariable optimization method using improved orthogonal algorithm based on large data is proposed. Considering the optimization problem with multi-parameters under large data environment, a multi-parameter optimization model used for improved orthogonal algorithm is established based on large data. Furthermore, an extensive simulation study on temperature field distribution of anti-/de-icing component was conducted to verify the validity of the statistical calculation analysis optimization method. The optimized temperature field distribution meets the anti-/de-icing requirements through numerical simulation. Simulation results show that the optimization effect is more evident and accurate than the non-optimized temperature distribution with the optimized results of the proposed method. Results verify the effectiveness of the proposed method.  相似文献   

12.
For many stochastic models, it is difficult to make inference about the model parameters because it is impossible to write down a tractable likelihood given the observed data. A common solution is data augmentation in a Markov chain Monte Carlo (MCMC) framework. However, there are statistical problems where this approach has proved infeasible but where simulation from the model is straightforward leading to the popularity of the approximate Bayesian computation algorithm. We introduce a forward simulation MCMC (fsMCMC) algorithm, which is primarily based upon simulation from the model. The fsMCMC algorithm formulates the simulation of the process explicitly as a data augmentation problem. By exploiting non‐centred parameterizations, an efficient MCMC updating schema for the parameters and augmented data is introduced, whilst maintaining straightforward simulation from the model. The fsMCMC algorithm is successfully applied to two distinct epidemic models including a birth–death–mutation model that has only previously been analysed using approximate Bayesian computation methods.  相似文献   

13.
Variable selection is an important task in regression analysis. Performance of the statistical model highly depends on the determination of the subset of predictors. There are several methods to select most relevant variables to construct a good model. However in practice, the dependent variable may have positive continuous values and not normally distributed. In such situations, gamma distribution is more suitable than normal for building a regression model. This paper introduces an heuristic approach to perform variable selection using artificial bee colony optimization for gamma regression models. We evaluated the proposed method against with classical selection methods such as backward and stepwise. Both simulation studies and real data set examples proved the accuracy of our selection procedure.  相似文献   

14.
We consider the Arnason-Schwarz model, usually used to estimate survival and movement probabilities from capture-recapture data. A missing data structure of this model is constructed which allows a clear separation of information relative to capture and relative to movement. Extensions of the Arnason-Schwarz model are considered. For example, we consider a model that takes into account both the individual migration history and the individual reproduction history. Biological assumptions of these extensions are summarized via a directed graph. Owing to missing data, the posterior distribution of parameters is numerically intractable. To overcome those computational difficulties we advocate a Gibbs sampling algorithm that takes advantage of the missing data structure inherent in capture-recapture models. Prior information on survival, capture and movement probabilities typically consists of a prior mean and of a prior 95% credible confidence interval. Dirichlet distributions are used to incorporate some prior information on capture, survival probabilities, and movement probabilities. Finally, the influence of the prior on the Bayesian estimates of movement probabilities is examined.  相似文献   

15.
We consider the Arnason-Schwarz model, usually used to estimate survival and movement probabilities from capture-recapture data. A missing data structure of this model is constructed which allows a clear separation of information relative to capture and relative to movement. Extensions of the Arnason-Schwarz model are considered. For example, we consider a model that takes into account both the individual migration history and the individual reproduction history. Biological assumptions of these extensions are summarized via a directed graph. Owing to missing data, the posterior distribution of parameters is numerically intractable. To overcome those computational difficulties we advocate a Gibbs sampling algorithm that takes advantage of the missing data structure inherent in capture-recapture models. Prior information on survival, capture and movement probabilities typically consists of a prior mean and of a prior 95% credible confidence interval. Dirichlet distributions are used to incorporate some prior information on capture, survival probabilities, and movement probabilities. Finally, the influence of the prior on the Bayesian estimates of movement probabilities is examined.  相似文献   

16.
Due to the significant increase of communications between individuals via social media (Facebook, Twitter, Linkedin) or electronic formats (email, web, e-publication) in the past two decades, network analysis has become an unavoidable discipline. Many random graph models have been proposed to extract information from networks based on person-to-person links only, without taking into account information on the contents. This paper introduces the stochastic topic block model, a probabilistic model for networks with textual edges. We address here the problem of discovering meaningful clusters of vertices that are coherent from both the network interactions and the text contents. A classification variational expectation-maximization algorithm is proposed to perform inference. Simulated datasets are considered in order to assess the proposed approach and to highlight its main features. Finally, we demonstrate the effectiveness of our methodology on two real-word datasets: a directed communication network and an undirected co-authorship network.  相似文献   

17.
In the analysis of censored survival data Cox proportional hazards model (1972) is extremely popular among the practitioners. However, in many real-life situations the proportionality of the hazard ratios does not seem to be an appropriate assumption. To overcome such a problem, we consider a class of nonproportional hazards models known as generalized odds-rate class of regression models. The class is general enough to include several commonly used models, such as proportional hazards model, proportional odds model, and accelerated life time model. The theoretical and computational properties of these models have been re-examined. The propriety of the posterior has been established under some mild conditions. A simulation study is conducted and a detailed analysis of the data from a prostate cancer study is presented to further illustrate the proposed methodology.  相似文献   

18.
We revisit the problem of testing homoscedasticity (or, equality of variances) of several normal populations which has applications in many statistical analyses, including design of experiments. The standard text books and widely used statistical packages propose a few popular tests including Bartlett's test, Levene's test and a few adjustments of the latter. Apparently, the popularity of these tests have been based on limited simulation study carried out a few decades ago. The traditional tests, including the classical likelihood ratio test (LRT), are asymptotic in nature, and hence do not perform well for small sample sizes. In this paper we propose a simple parametric bootstrap (PB) modification of the LRT, and compare it against the other popular tests as well as their PB versions in terms of size and power. Our comprehensive simulation study bursts some popularly held myths about the commonly used tests and sheds some new light on this important problem. Though most popular statistical software/packages suggest using Bartlette's test, Levene's test, or modified Levene's test among a few others, our extensive simulation study, carried out under both the normal model as well as several non-normal models clearly shows that a PB version of the modified Levene's test (which does not use the F-distribution cut-off point as its critical value), and Loh's exact test are the “best” performers in terms of overall size as well as power.  相似文献   

19.
Approximate Bayesian computation (ABC) is a popular approach to address inference problems where the likelihood function is intractable, or expensive to calculate. To improve over Markov chain Monte Carlo (MCMC) implementations of ABC, the use of sequential Monte Carlo (SMC) methods has recently been suggested. Most effective SMC algorithms that are currently available for ABC have a computational complexity that is quadratic in the number of Monte Carlo samples (Beaumont et al., Biometrika 86:983?C990, 2009; Peters et al., Technical report, 2008; Toni et al., J.?Roy. Soc. Interface 6:187?C202, 2009) and require the careful choice of simulation parameters. In this article an adaptive SMC algorithm is proposed which admits a computational complexity that is linear in the number of samples and adaptively determines the simulation parameters. We demonstrate our algorithm on a toy example and on a birth-death-mutation model arising in epidemiology.  相似文献   

20.
Detecting local spatial clusters for count data is an important task in spatial epidemiology. Two broad approaches—moving window and disease mapping methods—have been suggested in some of the literature to find clusters. However, the existing methods employ somewhat arbitrarily chosen tuning parameters, and the local clustering results are sensitive to the choices. In this paper, we propose a penalized likelihood method to overcome the limitations of existing local spatial clustering approaches for count data. We start with a Poisson regression model to accommodate any type of covariates, and formulate the clustering problem as a penalized likelihood estimation problem to find change points of intercepts in two-dimensional space. The cost of developing a new algorithm is minimized by modifying an existing least absolute shrinkage and selection operator algorithm. The computational details on the modifications are shown, and the proposed method is illustrated with Seoul tuberculosis data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号