Mini-batch algorithms have become increasingly popular due to the requirement for solving optimization problems, based on large-scale data sets. Using an existing online expectation–maximization (EM) algorithm framework, we demonstrate how mini-batch (MB) algorithms may be constructed, and propose a scheme for the stochastic stabilization of the constructed mini-batch algorithms. Theoretical results regarding the convergence of the mini-batch EM algorithms are presented. We then demonstrate how the mini-batch framework may be applied to conduct maximum likelihood (ML) estimation of mixtures of exponential family distributions, with emphasis on ML estimation for mixtures of normal distributions. Via a simulation study, we demonstrate that the mini-batch algorithm for mixtures of normal distributions can outperform the standard EM algorithm. Further evidence of the performance of the mini-batch framework is provided via an application to the famous MNIST data set. 相似文献
Likelihood-free methods such as approximate Bayesian computation (ABC) have extended the reach of statistical inference to problems with computationally intractable likelihoods. Such approaches perform well for small-to-moderate dimensional problems, but suffer a curse of dimensionality in the number of model parameters. We introduce a likelihood-free approximate Gibbs sampler that naturally circumvents the dimensionality issue by focusing on lower-dimensional conditional distributions. These distributions are estimated by flexible regression models either before the sampler is run, or adaptively during sampler implementation. As a result, and in comparison to Metropolis-Hastings-based approaches, we are able to fit substantially more challenging statistical models than would otherwise be possible. We demonstrate the sampler’s performance via two simulated examples, and a real analysis of Airbnb rental prices using a intractable high-dimensional multivariate nonlinear state-space model with a 36-dimensional latent state observed on 365 time points, which presents a real challenge to standard ABC techniques. 相似文献
Motivated by a breast cancer research program, this paper is concerned with the joint survivor function of multiple event times when their observations are subject to informative censoring caused by a terminating event. We formulate the correlation of the multiple event times together with the time to the terminating event by an Archimedean copula to account for the informative censoring. Adapting the widely used two-stage procedure under a copula model, we propose an easy-to-implement pseudo-likelihood based procedure for estimating the model parameters. The approach yields a new estimator for the marginal distribution of a single event time with semicompeting-risks data. We conduct both asymptotics and simulation studies to examine the proposed approach in consistency, efficiency, and robustness. Data from the breast cancer program are employed to illustrate this research.
As important members of research teams, statisticians bear an ethical responsibility to analyze, interpret, and report data honestly and objectively. One way of reinforcing ethical responsibilities is through required courses covering a variety of ethics-related topics at the graduate level. We assessed ethics requirements for graduate-level statistics training programs in the United States for the 2013–2014 academic year using the websites of 88 universities, examining 103 biostatistics programs, and 136 statistics degree programs. We categorized programs’ ethics training requirements as required or not required. Thirty-one (35.1%) universities required an ethics course for at least some degree students. Sixty-two (25.5%) degree programs required an ethics course for at least some students. The majority (77.4%) of required courses were worth 0 or 1 credit. Of the 177 programs without an ethics requirement, 19 (10.7%) listed an ethics elective. Although a single ethics course is insufficient for instilling an ethical approach to science, degree programs that model expectations through coursework point to the value of ethics in science. More training programs should prepare statisticians to consider the ethical dimensions of their work through required coursework. Supplementary materials for this article are available online. 相似文献
In computational sciences, including computational statistics, machine learning, and bioinformatics, it is often claimed in articles presenting new supervised learning methods that the new method performs better than existing methods on real data, for instance in terms of error rate. However, these claims are often not based on proper statistical tests and, even if such tests are performed, the tested hypothesis is not clearly defined and poor attention is devoted to the Type I and Type II errors. In the present article, we aim to fill this gap by providing a proper statistical framework for hypothesis tests that compare the performances of supervised learning methods based on several real datasets with unknown underlying distributions. After giving a statistical interpretation of ad hoc tests commonly performed by computational researchers, we devote special attention to power issues and outline a simple method of determining the number of datasets to be included in a comparison study to reach an adequate power. These methods are illustrated through three comparison studies from the literature and an exemplary benchmarking study using gene expression microarray data. All our results can be reproduced using R codes and datasets available from the companion website http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020_professuren/boulesteix/compstud2013. 相似文献
A multivariate normal mean–variance mixture based on a Birnbaum–Saunders (NMVMBS) distribution is introduced and several properties of this new distribution are discussed. A new robust non-Gaussian ARCH-type model is proposed in which there exists a relation between the variance of the observations, and the marginal distributions are NMVMBS. A simple EM-based maximum likelihood estimation procedure to estimate the parameters of this normal mean–variance mixture distribution is given. A simulation study and some real data are used to demonstrate the modelling strength of this new model. 相似文献
In this paper, we extend the censored linear regression model with normal errors to Student-t errors. A simple EM-type algorithm for iteratively computing maximum-likelihood estimates of the parameters is presented. To examine the performance of the proposed model, case-deletion and local influence techniques are developed to show its robust aspect against outlying and influential observations. This is done by the analysis of the sensitivity of the EM estimates under some usual perturbation schemes in the model or data and by inspecting some proposed diagnostic graphics. The efficacy of the method is verified through the analysis of simulated data sets and modelling a real data set first analysed under normal errors. The proposed algorithm and methods are implemented in the R package CensRegMod. 相似文献