首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The paper considers the problem of phylogenetic tree construction. Our approach to the problem bases itself on a non-parametric paradigm seeking a model-free construction and symmetry on Types I and II errors. Trees are constructed through sequential tests using Hamming distance dissimilarity measures, from internal nodes to the tips. The method presents some novelties. The first, which is an advantage over the traditional methods, is that it is very fast, computationally efficient and feasible to be used for very large data sets. Two other novelties are its capacity to deal directly with multiple sequences per group (and built its statistical properties upon this richer information) and that the best tree will not have a predetermined number of tips, that is, the resulting number of tips will be statistically meaningful. We apply the method in two data sets of DNA sequences, illustrating that it can perform quite well even on very unbalanced designs. Computational complexities are also addressed. Supplemental materials are available online.  相似文献   

2.
There is an increasing amount of literature focused on Bayesian computational methods to address problems with intractable likelihood. One approach is a set of algorithms known as Approximate Bayesian Computational (ABC) methods. One of the problems with these algorithms is that their performance depends on the appropriate choice of summary statistics, distance measure and tolerance level. To circumvent this problem, an alternative method based on the empirical likelihood has been introduced. This method can be easily implemented when a set of constraints, related to the moments of the distribution, is specified. However, the choice of the constraints is sometimes challenging. To overcome this difficulty, we propose an alternative method based on a bootstrap likelihood approach. The method is easy to implement and in some cases is actually faster than the other approaches considered. We illustrate the performance of our algorithm with examples from population genetics, time series and stochastic differential equations. We also test the method on a real dataset.  相似文献   

3.
This paper addresses the problem of co-clustering binary data in the latent block model framework with diagonal constraints for resulting data partitions. We consider the Bernoulli generative mixture model and present three new methods differing in the assumptions made about the degree of homogeneity of diagonal blocks. The proposed models are parsimonious and allow to take into account the structure of a data matrix when reorganizing it into homogeneous diagonal blocks. We derive algorithms for each of the presented models based on the classification expectation-maximization algorithm which maximizes the complete data likelihood. We show that our contribution can outperform other state-of-the-art (co)-clustering methods on synthetic sparse and non-sparse data. We also prove the efficiency of our approach in the context of document clustering, by using real-world benchmark data sets.  相似文献   

4.
A transformation is proposed to convert the nonlinear constraints of the parameters in the mixture transition distribution (MTD) model into box-constraints. The proposed transformation removes the difficulties associated with the maximum likelihood estimation (MLE) process in the MTD modeling so that the MLEs of the parameters can be easily obtained via a hybrid algorithm from the evolutionary algorithms and/or quasi-Newton algorithms for global optimization. Simulation studies are conducted to demonstrate MTD modeling by the proposed novel approach through a global search algorithm in R environment. Finally, the proposed approach is used for the MTD modelings of three real data sets.  相似文献   

5.
We consider the assessment of deoxyribonucleic acid (DNA) profiles from biological samples containing a mixture of DNA from more than one person. The problem has been investigated in the context of likelihood ratios by Weir and co-workers under the assumption of independent alleles in DNA profiles. However, uncertainty about independence may arise from various factors such as population substructure and relatedness. This issue has received considerable attention in recent years. Ignoring this uncertainty may seriously overstate the strength of the evidence and therefore disadvantage innocent suspects. Taking this uncertainty into account, we develop a general formula for calculating the match probabilities of DNA profiles. Thus, we extend the result derived by Weir and co-workers to the dependence situation, which is often more to the benefit of the defendant in comparison with the simple product rule result based on an independence assumption. The effect of dependence of alleles on likelihood ratio estimates can be seen in the analysis of two real data sets.  相似文献   

6.
7.
The approximate Bayesian computation (ABC) algorithm is used to estimate parameters from complicated phenomena, where likelihood is intractable. Here, we report the development of an algorithm to choose the tolerance level for ABC. We have illustrated the performance of our proposed method by simulating the estimation of scaled mutation and recombination rates. The result shows that the proposed algorithm performs well.  相似文献   

8.
We propose a two-stage algorithm for computing maximum likelihood estimates for a class of spatial models. The algorithm combines Markov chain Monte Carlo methods such as the Metropolis–Hastings–Green algorithm and the Gibbs sampler, and stochastic approximation methods such as the off-line average and adaptive search direction. A new criterion is built into the algorithm so stopping is automatic once the desired precision has been set. Simulation studies and applications to some real data sets have been conducted with three spatial models. We compared the algorithm proposed with a direct application of the classical Robbins–Monro algorithm using Wiebe's wheat data and found that our procedure is at least 15 times faster.  相似文献   

9.
Although devised in 1936 by Fisher, discriminant analysis is still rapidly evolving, as the complexity of contemporary data sets grows exponentially. Our classification rules explore these complexities by modeling various correlations in higher-order data. Moreover, our classification rules are suitable to data sets where the number of response variables is comparable or larger than the number of observations. We assume that the higher-order observations have a separable variance-covariance matrix and two different Kronecker product structures on the mean vector. In this article, we develop quadratic classification rules among g different populations where each individual has κth order (κ ≥2) measurements. We also provide the computational algorithms to compute the maximum likelihood estimates for the model parameters and eventually the sample classification rules.  相似文献   

10.
We study the problem of classification with multiple q-variate observations with and without time effect on each individual. We develop new classification rules for populations with certain structured and unstructured mean vectors and under certain covariance structures. The new classification rules are effective when the number of observations is not large enough to estimate the variance–covariance matrix. Computational schemes for maximum likelihood estimates of required population parameters are given. We apply our findings to two real data sets as well as to a simulated data set.  相似文献   

11.
Estimation in mixed linear models is, in general, computationally demanding, since applied problems may involve extensive data sets and large numbers of random effects. Existing computer algorithms are slow and/or require large amounts of memory. These problems are compounded in generalized linear mixed models for categorical data, since even approximate methods involve fitting of a linear mixed model within steps of an iteratively reweighted least squares algorithm. Only in models in which the random effects are hierarchically nested can the computations for fitting these models to large data sets be carried out rapidly. We describe a data augmentation approach to these computational difficulties in which we repeatedly fit an overlapping series of submodels, incorporating the missing terms in each submodel as 'offsets'. The submodels are chosen so that they have a nested random-effect structure, thus allowing maximum exploitation of the computational efficiency which is available in this case. Examples of the use of the algorithm for both metric and discrete responses are discussed, all calculations being carried out using macros within the MLwiN program.  相似文献   

12.
We study a problem of model selection for data produced by two different context tree sources. Motivated by linguistic questions, we consider the case where the probabilistic context trees corresponding to the two sources are finite and share many of their contexts. In order to understand the differences between the two sources, it is important to identify which contexts and which transition probabilities are specific to each source. We consider a class of probabilistic context tree models with three types of contexts: those which appear in one, the other, or both sources. We use a BIC penalized maximum likelihood procedure that jointly estimates the two sources. We propose a new algorithm which efficiently computes the estimated context trees. We prove that the procedure is strongly consistent. We also present a simulation study showing the practical advantage of our procedure over a procedure that works separately on each data set.  相似文献   

13.
Two equivalent methods (gene counting and maximum likelihood) for estimating gene frequencies in a general genetic marker system based on observed phenotype data are derived. Under the maximum likelihood approach, an expression is given for the estimated covariance matrix from which estimated standard errors of the estimators can be found. In addition, consideration is given to the problem of estimating gene frequencies when there are available several independent population data sets.  相似文献   

14.
Mixture separation for mixed-mode data   总被引:3,自引:0,他引:3  
One possible approach to cluster analysis is the mixture maximum likelihood method, in which the data to be clustered are assumed to come from a finite mixture of populations. The method has been well developed, and much used, for the case of multivariate normal populations. Practical applications, however, often involve mixtures of categorical and continuous variables. Everitt (1988) and Everitt and Merette (1990) recently extended the normal model to deal with such data by incorporating the use of thresholds for the categorical variables. The computations involved in this model are so extensive, however, that it is only feasible for data containing very few categorical variables. In the present paper we consider an alternative model, known as the homogeneous Conditional Gaussian model in graphical modelling and as the location model in discriminant analysis. We extend this model to the finite mixture situation, obtain maximum likelihood estimates for the population parameters, and show that computation is feasible for an arbitrary number of variables. Some data sets are clustered by this method, and a small simulation study demonstrates characteristics of its performance.  相似文献   

15.
Assessing the selective influence of amino acid properties is important in understanding evolution at the molecular level. A collection of methods and models has been developed in recent years to determine if amino acid sites in a given DNA sequence alignment display substitutions that are altering or conserving a prespecified set of amino acid properties. Residues showing an elevated number of substitutions that favorably alter a physicochemical property are considered targets of positive natural selection. Such approaches usually perform independent analyses for each amino acid property under consideration, without taking into account the fact that some of the properties may be highly correlated. We propose a Bayesian hierarchical regression model with latent factor structure that allows us to determine which sites display substitutions that conserve or radically change a set of amino acid properties, while accounting for the correlation structure that may be present across such properties. We illustrate our approach by analyzing simulated data sets and an alignment of lysin sperm DNA.  相似文献   

16.
Mini-batch algorithms have become increasingly popular due to the requirement for solving optimization problems, based on large-scale data sets. Using an existing online expectation–maximization (EM) algorithm framework, we demonstrate how mini-batch (MB) algorithms may be constructed, and propose a scheme for the stochastic stabilization of the constructed mini-batch algorithms. Theoretical results regarding the convergence of the mini-batch EM algorithms are presented. We then demonstrate how the mini-batch framework may be applied to conduct maximum likelihood (ML) estimation of mixtures of exponential family distributions, with emphasis on ML estimation for mixtures of normal distributions. Via a simulation study, we demonstrate that the mini-batch algorithm for mixtures of normal distributions can outperform the standard EM algorithm. Further evidence of the performance of the mini-batch framework is provided via an application to the famous MNIST data set.  相似文献   

17.
The problem of estimation of the parameters of two-parameter inverse Weibull distributions has been considered. We establish existence and uniqueness of the maximum likelihood estimators of the scale and shape parameters. We derive Bayes estimators of the parameters under the entropy loss function. Hierarchical Bayes estimator, equivariant estimator and a class of minimax estimators are derived when shape parameter is known. Ordered Bayes estimators using information about second population are also derived. We investigate the reliability of multi-component stress-strength model using classical and Bayesian approaches. Risk comparison of the classical and Bayes estimators is done using Monte Carlo simulations. Applications of the proposed estimators are shown using real data sets.  相似文献   

18.
The authors propose a class of procedures for local likelihood estimation from data that are either interval‐censored or that have been aggregated into bins. One such procedure relies on an algorithm that generalizes existing self‐consistency algorithms by introducing kernel smoothing at each step of the iteration. The entire class of procedures yields estimates that are obtained as solutions of fixed point equations. By discretizing and applying numerical integration, the authors use fixed point theory to study convergence of algorithms for the class. Rapid convergence is effected by the implementation of a local EM algorithm as a global Newton iteration. The latter requires an explicit solution of the local likelihood equations which can be found by using the symbolic Newton‐Raphson algorithm, if necessary.  相似文献   

19.
In this paper, we discuss a model for pseudo-panel data when some but not all of the individuals stay in the sample for more than one period. We use data on the labor market of the Basque Country from 1993 to 1999 treated through FORTRAN 77 programing. We construct economically reasonable age cohorts for active population and use gender, qualification and social status as explanatory variables in our model. Given the class of data we use, we analyze the properties of the random error and estimate the model through maximum likelihood, finding significant results from an applied point of view.  相似文献   

20.
The most common phenomena in the evolution process are natural selection and genetic drift. In this article, we propose a probabilistic method to calculate the mean and variance time for random genetic drift equilibrium, measured as number of generations, based on Markov process and a complex probabilistic model. We studied the case of a constant, panmictic population of diploid organisms, which had a demonstrated lack of mutation, selection or migration for a determined autonomic locus, and two possible alleles, H and h. The calculations presented in this article were based on a Markov process. They explain how genetic and genotypic frequencies changed in different generations and how the heterozygote alleles became extinguished after many generations. This calculation could be used in more evolutionary applications. Finally, some simulations are presented to illustrate the theoretical calculations presented using different basal situations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号