首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We introduce a robust clustering procedure for parsimonious model-based clustering. The classical mclust framework is robustified through impartial trimming and eigenvalue-ratio constraints (the tclust framework, which is robust but not affine invariant). An advantage of our resulting mtclust approach is that eigenvalue-ratio constraints are not needed for certain model formulations, leading to affine invariant robust parsimonious clustering. We illustrate the approach via simulations and a benchmark real data example. R code for the proposed method is available at https://github.com/afarcome/mtclust.  相似文献   

2.
Parametric and semiparametric mixture models have been widely used in applications from many areas, and it is often of interest to test the homogeneity in these models. However, hypothesis testing is non standard due to the fact that several regularity conditions do not hold under the null hypothesis. We consider a semiparametric mixture case–control model, in the sense that the density ratio of two distributions is assumed to be of an exponential form, while the baseline density is unspecified. This model was first considered by Qin and Liang (2011 Qin, J., Liang, K.Y. (2011). Hypothesis testing in a mixture case–control model. Biometrics 67(1):182198.[Crossref], [PubMed], [Web of Science ®] [Google Scholar], biometrics), and they proposed a modified score statistic for testing homogeneity. In this article, we consider alternative testing procedures based on supremum statistics, which could improve power against certain types of alternatives. We demonstrate the connection and comparison among the proposed and existing approaches. In addition, we provide a unified theoretical justification of the supremum test and other existing test statistics from an empirical likelihood perspective. The finite-sample performance of the supremum test statistics was evaluated in simulation studies.  相似文献   

3.
In this paper we consider an acceptance-rejection (AR) sampler based on deterministic driver sequences. We prove that the discrepancy of an N element sample set generated in this way is bounded by \(\mathcal {O} (N^{-2/3}\log N)\), provided that the target density is twice continuously differentiable with non-vanishing curvature and the AR sampler uses the driver sequence \(\mathcal {K}_M= \{( j \alpha , j \beta ) ~~ mod~~1 \mid j = 1,\ldots ,M\},\) where \(\alpha ,\beta \) are real algebraic numbers such that \(1,\alpha ,\beta \) is a basis of a number field over \(\mathbb {Q}\) of degree 3. For the driver sequence \(\mathcal {F}_k= \{ ({j}/{F_k}, \{{jF_{k-1}}/{F_k}\} ) \mid j=1,\ldots , F_k\},\) where \(F_k\) is the k-th Fibonacci number and \(\{x\}=x-\lfloor x \rfloor \) is the fractional part of a non-negative real number x, we can remove the \(\log \) factor to improve the convergence rate to \(\mathcal {O}(N^{-2/3})\), where again N is the number of samples we accepted. We also introduce a criterion for measuring the goodness of driver sequences. The proposed approach is numerically tested by calculating the star-discrepancy of samples generated for some target densities using \(\mathcal {K}_M\) and \(\mathcal {F}_k\) as driver sequences. These results confirm that achieving a convergence rate beyond \(N^{-1/2}\) is possible in practice using \(\mathcal {K}_M\) and \(\mathcal {F}_k\) as driver sequences in the acceptance-rejection sampler.  相似文献   

4.
Abstract

In this article two methods are proposed to make inferences about the parameters of a finite mixture of distributions in the context of partially identifiable censored data. The first method focuses on a mixture of location and scale models and relies on an asymptotic approximation to a suitably constructed augmented likelihood; the second method provides a full Bayesian analysis of the mixture based on a Gibbs sampler. Both methods make explicit use of latent variables and provide computationally efficient procedures compared to other methods which deal directly with the likelihood of the mixture. This may be crucial if the number of components in the mixture is not small. Our proposals are illustrated on a classical example on failure times for communication devices first studied by Mendenhall and Hader (Mendenhall, W., Hader, R. J. (1958 Mendenhall, W. and Hader, R. J. 1958. Estimation of parameters of mixed exponentially distributed failure time distributions from censored life test data. Biometrika, 45: 504520. [Crossref], [Web of Science ®] [Google Scholar]). Estimation of parameters of mixed exponentially distributed failure time distributions from censored life test data. Biometrika 45:504–520.). In addition, we study the coverage of the confidence intervals obtained from each of the methods by means of a small simulation exercise.  相似文献   

5.
Abstract

In this article, we revisit the problem of fitting a mixture model under the assumption that the mixture components are symmetric and log-concave. To this end, we first study the nonparametric maximum likelihood estimation (MLE) of a monotone log-concave probability density. To fit the mixture model, we propose a semiparametric EM (SEM) algorithm, which can be adapted to other semiparametric mixture models. In our numerical experiments, we compare our algorithm to that of Balabdaoui and Doss (2018 Balabdaoui, F., and C. R. Doss. 2018. Inference for a two-component mixture of symmetric distributions under log-concavity. Bernoulli 24 (2):105371.[Crossref], [Web of Science ®] [Google Scholar], Inference for a two-component mixture of symmetric distributions under log-concavity. Bernoulli 24 (2):1053–71) and other mixture models both on simulated and real-world datasets.  相似文献   

6.
This article addresses the density estimation problem using nonparametric Bayesian approach. It is considered hierarchical mixture models where the uncertainty about the mixing measure is modeled using the Dirichlet process. The main goal is to build more flexible models for density estimation. Extensions of the normal mixture model via Dirichlet process previously introduced in the literature are twofold. First, Dirichlet mixtures of skew-normal distributions are considered, say, in the first stage of the hierarchical model, the normal distribution is replaced by the skew-normal one. We also assume a skew-normal distribution as the center measure in the Dirichlet mixture of normal distributions. Some important results related to Bayesian inference in the location-scale skew-normal family are introduced. In particular, we obtain the stochastic representations for the full conditional distributions of the location and skewness parameters. The algorithm introduced by MacEachern and Müller in 1998 MacEachern, S.N., Müller, P. (1998). Estimating mixture of Dirichlet Process models. J. Computat. Graph. Statist. 7(2):223238.[Taylor & Francis Online], [Web of Science ®] [Google Scholar] is used to sample from the posterior distributions. The models are compared considering simulated data sets. Finally, the well-known Old Faithful Geyser data set is analyzed using the proposed models and the Dirichlet mixture of normal distributions. The model based on Dirichlet mixture of skew-normal distributions captured the data bimodality and skewness shown in the empirical distribution.  相似文献   

7.
Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However, for datasets where the number of variables p is large the algorithm can become inefficient and computationally expensive. Another method which is popular for high-dimensional data is random forests, a machine learning algorithm which grows trees using a greedy search for the best split points. However, its default implementation does not produce probabilistic estimates or predictions. We propose an alternative fitting algorithm for BART called BART-BMA, which uses Bayesian model averaging and a greedy search algorithm to obtain a posterior distribution more efficiently than BART for datasets with large p. BART-BMA incorporates elements of both BART and random forests to offer a model-based algorithm which can deal with high-dimensional data. We have found that BART-BMA can be run in a reasonable time on a standard laptop for the “small n large p” scenario which is common in many areas of bioinformatics. We showcase this method using simulated data and data from two real proteomic experiments, one to distinguish between patients with cardiovascular disease and controls and another to classify aggressive from non-aggressive prostate cancer. We compare our results to their main competitors. Open source code written in R and Rcpp to run BART-BMA can be found at: https://github.com/BelindaHernandez/BART-BMA.git.  相似文献   

8.
This paper is based on the application of a Bayesian model to a clinical trial study to determine a more effective treatment to lower mortality rates and consequently to increase survival times among patients with lung cancer. In this study, Qian et al. [13 J. Qian, D.K. Stangl, and S. George, A Weibull model for survival data: Using prediction to decide when to stop a clinical trial, in Bayesian Biostatistics, D. Berry and D. Stangl, eds., Marcel Dekker, New York, 1996, pp. 187205. [Google Scholar]] strived to determine if a Weibull survival model can be used to decide whether to stop a clinical trial. The traditional Gibbs sampler was used to estimate the model parameters. This paper proposes to use the independent steady-state Gibbs sampling (ISSGS) approach, introduced by Dunbar et al. [3 M. Dunbar, H.M. Samawi, R. Vogel, and L. Yu, A more efficient Gibbs sampler estimation using steady state simulation: Application to public health studies, J. Stat. Simul. Comput. 10.1080/00949655.2013.770857.[Taylor &; Francis Online] [Google Scholar]], to improve the original Gibbs sampler in multidimensional problems. It is demonstrated that ISSGS provides accuracy with unbiased estimation and improves the performance and convergence of the Gibbs sampler in this application.  相似文献   

9.
The purpose of mixture experiments is to explore the optimum blends of mixture components, which will provide the desirable response characteristics in finished products. D-optimal minimal designs have been considered for a variety of mixture models, including Scheffé's linear, quadratic, and cubic models. Usually, these D-optimal designs are minimally supported since they have just as many design points as the number of parameters. Thus, they lack the degrees of freedom to perform the lack-of-fit (LOF) tests. Also, the majority of the design points in D-optimal minimal designs are on the boundary: vertices, edges, or faces of the design simplex. In this article, extensions of the D-optimal minimal designs are developed for a general mixture model to allow additional interior points in the design space to enable prediction of the entire response surface. Also a new strategy for adding multiple interior points for symmetric mixture models is proposed. We compare the proposed designs with Cornell (1986 Cornell, J.A. (1986). A comparison between two ten-point designs for studying three-component mixture systems. J. Qual. Technol. 18(1):115.[Taylor &; Francis Online], [Web of Science ®] [Google Scholar]) two 10-point designs for the LOF test by simulations.  相似文献   

10.
《统计学通讯:理论与方法》2012,41(16-17):2879-2895
In statistical surveys, people are often asked to express evaluations on several topics or to make an ordered arrangement in a list of objects (items, services, sentences, etc.); thus, the analysis of ratings and rankings is receiving a growing interest in many fields. In this framework, we develop a testing procedure for a class of mixture models with covariates (defined as CUB models), proposed by Piccolo (2003 Piccolo , D. ( 2003 ). On the moments of a mixture of uniform and shifted binomial random variables . Quaderni di Statistica 5 : 85104 . [Google Scholar]) and D'Elia and Piccolo (2005 D'Elia , A. , Piccolo , D. ( 2005 ). A mixture model for preference data analysis . Computat. Statist. Data Anal. 49 : 917934 .[Crossref], [Web of Science ®] [Google Scholar]) and generally developed in a parametric context. Instead, we propose a nonparametric solution to perform inference on CUB models, specifically on the coefficients of the covariates. A simulation study proves that this approach is more appropriate in some specific data settings, mostly for small sample sizes.  相似文献   

11.
In this article, we study the moment-based test procedure for a mixture distribution for the Natural exponential family with quadratic variance functions (NEF-QVF) family proposed by Ning et al. (2009b Ning, W., Zhang, S. G. and Yu, C. 2009b. A moment-based test for the homogeneity in mixture natural exponential family with quadratic variance functions. Statistical and Probability Letters, 79: 828834. [Crossref] [Google Scholar]) in the small sample size scenario. We derive the approximation for the null distribution of the test statistic by the Edgeworth expansion. The simulations are conducted for a binomial mixture distribution, which includes the situation corresponding to the detection of the linkage in the genetic analysis, with different sample sizes and family sizes at various significance levels. The simulation results show that our test performs reasonably well. We also apply the proposed method to the real clinical data to verify the significant difference between two drug treatments. The critical values associated with a binomial mixture distribution are also provided.  相似文献   

12.
In a mixture experiment the measured response is assumed to depend only on the relative proportion of ingredients or components present in the mixture. Scheffe (1958 Scheffe , H. ( 1958 ). Experiments with mixtures . J. R. Statist. Soc. B 20 : 344360 . [Google Scholar], 1963 Scheffe , H. ( 1963 ). Simplex – centroid design for experiments with mixtures . J. R. Statist. Soc. B 25 : 235263 . [Google Scholar]) first systematically considered this problem and introduced different models and designs suitable in such situations. Optimum designs for the estimation of parameters of different mixture models are available in the literature. The problem of estimating the optimum proportion of mixture components is of great practical importance. Pal and Mandal (2006 Pal , M. , Mandal , N. K. ( 2006 ). Optimum designs for optimum mixtures . Statist. Probab. Lett. 76 ( 13 ): 13691379 .[Crossref], [Web of Science ®] [Google Scholar], 2007 Pal , M. , Mandal , N. K. (2007). Optimum mixture design via equivalence theory. communicated. JCISS 32:107126. [Google Scholar]) attempted to find a solution to this problem by adopting a pseudo-Bayesian approach and using the trace criterion. Subsequently, Pal and Mandal (2008 Pal , M. , Mandal , N. K. ( 2008 ). Minimax designs for optimum mixtures . To appear in Statist. Probab. Lett.  [Google Scholar]) solved the problem using minimax criterion. In this article, the deficiency criterion due to Chatterjee and Mandal (1981 Chatterjee , S. K. , Mandal , N. K. ( 1981 ). Response surface designs for estimating the optimal point . Calcutta Statist. Assoc. Bull. . 30 : 145169 .[Crossref] [Google Scholar]) has been used as a measure for comparing the performance of competing designs.  相似文献   

13.
The article investigates diagnostic procedures for finite mixture models. The problem is to decide whether given data stem from an exponential distribution or a finite mixture of such distributions. Recently, three new test approaches have been proposed, the modified likelihood ratio test (MLRT) by Chen et al. (2001 Chen , H. , Chen , J. , Kalbfleisch , J. D. ( 2001 ). A modified likelihood ratio test for homogeneity in finite mixture models . Journal of the Royal Statistical Society, B 63 : 1929 .[Crossref] [Google Scholar]), the ADDS test by Mosler and Seidel (2001 Mosler , K. , Seidel , W. ( 2001 ). Testing for homogeneity in an exponential mixture model . Australian and New Zealand Journal of Statistics 43 : 231247 . [Google Scholar]), and the D-test by Charnigo and Sun (2004 Charnigo , R. , Sun , J. ( 2004 ). Testing homogeneity in a mixture distribution via the l 2 distance between competing models . Journal of the American Statistical Society 99 : 488498 .[Taylor & Francis Online], [Web of Science ®] [Google Scholar]). The size and power of these tests are determined by Monte Carlo simulation and their relative merits are evaluated. We conclude that the ADDS test shows always not much less and under some alternatives, in particular lower contaminations, considerably more power than its competitors. Also, new tables for the ADDS test are provided.  相似文献   

14.
This study considers efficient mixture designs for the approximation of the response surface of a quantile regression model, which is a second degree polynomial, by a first degree polynomial in the proportions of q components. Instead of least squares estimation in the traditional regression analysis, the objective function in quantile regression models is a weighted sum of absolute deviations and the least absolute deviations (LAD) estimation technique should be used (Bassett and Koenker, 1982 Bassett, G., Koenker, R. (1982). An empirical quantile function for linear models with i.i.d. errors. Journal of the American Statistical Association 77:407415.[Taylor &; Francis Online], [Web of Science ®] [Google Scholar]; Koenker and Bassett, 1978 Koenker, R., Bassett, G. (1978). Regression quantiles. Econometrica 46(1):3350.[Crossref], [Web of Science ®] [Google Scholar]). Therefore, the standard optimal mixture designs like the D-optimal or A-optimal mixture designs for the least squared estimation are not appropriate. This study explores mixture designs that minimize the bias between the approximated 1st-degree polynomial and a 2nd-degree polynomial response surfaces by the LAD estimation. In contrast to the standard optimal mixture designs for the least squared estimation, the efficient designs might contain elementary centroid design points of degrees higher than two. An example of a portfolio with five assets is given to illustrate the proposed efficient mixture designs in determining the marginal contribution of risks by individual assets in the portfolio.  相似文献   

15.
In a mixture experiment, the response depends on the mixing proportions of the components present in the mixture. Optimum designs are available for the estimation of parameters of the models proposed in such situations. However, these designs are found to include the vertex points of the simplex Ξ defining the experimental region, which are not mixtures in the true sense. Recently, Mandal et al. (2015 Mandal, N.K., Pal, M., Sinha, B.K., and Das, P. (2015). Optimum mixture designs in a restricted region. Stat. Pap. 56(1):105119.[Crossref], [Web of Science ®] [Google Scholar]) derived optimum designs when the experiment is confined to an ellipsoidal region within Ξ, which does not include the vertices of Ξ. In this paper, an attempt has been made to find optimum designs when the experimental region is a simplex or is cuboidal inside Ξ and does not contain the extreme points.  相似文献   

16.
We investigate the issue of bandwidth estimation in a functional nonparametric regression model with function-valued, continuous real-valued and discrete-valued regressors under the framework of unknown error density. Extending from the recent work of Shang (2013 Shang, H.L. (2013), ‘Bayesian Bandwidth Estimation for a Nonparametric Functional Regression Model with Unknown Error Density’, Computational Statistics &; Data Analysis, 67, 185198. doi: 10.1016/j.csda.2013.05.006[Crossref], [Web of Science ®] [Google Scholar]) [‘Bayesian Bandwidth Estimation for a Nonparametric Functional Regression Model with Unknown Error Density’, Computational Statistics &; Data Analysis, 67, 185–198], we approximate the unknown error density by a kernel density estimator of residuals, where the regression function is estimated by the functional Nadaraya–Watson estimator that admits mixed types of regressors. We derive a likelihood and posterior density for the bandwidth parameters under the kernel-form error density, and put forward a Bayesian bandwidth estimation approach that can simultaneously estimate the bandwidths. Simulation studies demonstrated the estimation accuracy of the regression function and error density for the proposed Bayesian approach. Illustrated by a spectroscopy data set in the food quality control, we applied the proposed Bayesian approach to select the optimal bandwidths in a functional nonparametric regression model with mixed types of regressors.  相似文献   

17.
A blocked Gibbs sampler for NGG-mixture models via a priori truncation   总被引:1,自引:0,他引:1  
We define a new class of random probability measures, approximating the well-known normalized generalized gamma (NGG) process. Our new process is defined from the representation of NGG processes as discrete measures where the weights are obtained by normalization of the jumps of Poisson processes and the support consists of independent identically distributed location points, however considering only jumps larger than a threshold \(\varepsilon \). Therefore, the number of jumps of the new process, called \(\varepsilon \)-NGG process, is a.s. finite. A prior distribution for \(\varepsilon \) can be elicited. We assume such a process as the mixing measure in a mixture model for density and cluster estimation, and build an efficient Gibbs sampler scheme to simulate from the posterior. Finally, we discuss applications and performance of the model to two popular datasets, as well as comparison with competitor algorithms, the slice sampler and a posteriori truncation.  相似文献   

18.
Johns (1988 Johns , M. V. (1988). Importance sampling for bootstrap confidence intervals. Journal of the American Statistical Association 83:709714.[Taylor & Francis Online], [Web of Science ®] [Google Scholar]), Davison (1988 Davison , A. C. ( 1988 ). Discussion of paper by D. V. Hinkley . Journal of the Royal Statistical Society Series B 50 : 356357 . [Google Scholar]), and Do and Hall (1991 Do , K. A. , Hall , P. ( 1991 ). On importance sampling for the bootstrap . Biometrika 78 : 161167 .[Crossref], [Web of Science ®] [Google Scholar]) used importance sampling for calculating bootstrap distributions of one-dimensional statistics. Realizing that their methods can not be extended easily to multi-dimensional statistics, Fuh and Hu (2004 Fuh , C. D. , Hu , I. ( 2004 ). Efficient importance sampling for events of moderate deviations with applications . Biometrika 91 : 471490 .[Crossref], [Web of Science ®] [Google Scholar]) proposed an exponential tilting formula for statistics of multi-dimension, which is optimal in the sense that the asymptotic variance is minimized for estimating tail probabilities of asymptotically normal statistics. For one-dimensional statistics, Hu and Su (2008 Hu , J. , Su , Z. ( 2008 ). Adaptive resampling algorithms for estimating bootstrap distributions . Journal of Statistical Planning and Inference 138 ( 6 ): 17631777 .[Crossref], [Web of Science ®] [Google Scholar]) proposed a multi-step variance minimization approach that can be viewed as a generalization of the two-step variance minimization approach proposed by Do and Hall (1991 Do , K. A. , Hall , P. ( 1991 ). On importance sampling for the bootstrap . Biometrika 78 : 161167 .[Crossref], [Web of Science ®] [Google Scholar]). In this article, we generalize the approach of Hu and Su (2008 Hu , J. , Su , Z. ( 2008 ). Adaptive resampling algorithms for estimating bootstrap distributions . Journal of Statistical Planning and Inference 138 ( 6 ): 17631777 .[Crossref], [Web of Science ®] [Google Scholar]) to multi-dimensional statistics, which applies to general statistics and does not resort to asymptotics. Empirical results on a real survival data set show that the proposed algorithm provides significant computational efficiency gains.  相似文献   

19.
What population does the sample represent? The answer to this question is of crucial importance when estimating a survivor function in duration studies. As is well-known, in a stationary population, survival data obtained from a cross-sectional sample taken from the population at time $t_0$ represents not the target density $f(t)$ but its length-biased version proportional to $tf(t)$ , for $t>0$ . The problem of estimating survivor function from such length-biased samples becomes more complex, and interesting, in presence of competing risks and censoring. This paper lays out a sampling scheme related to a mixed Poisson process and develops nonparametric estimators of the survivor function of the target population assuming that the two independent competing risks have proportional hazards. Two cases are considered: with and without independent censoring before length biased sampling. In each case, the weak convergence of the process generated by the proposed estimator is proved. A well-known study of the duration in power for political leaders is used to illustrate our results. Finally, a simulation study is carried out in order to assess the finite sample behaviour of our estimators.  相似文献   

20.
To better understand the power shift and the U.S. role compared to China and others regional actors, the Chicago Council on Global Affairs and the East Asia Institute (EAI) surveyed people in six countries - China, Japan, South Korea, Vietnam, Indonesian, and the United States - in the first half of 2008 about regional security and economic integration in Asia and about how these nations perceive each other (Bouton et al., 2010 Bouton, M., Steven, K., Benjamin, P., and Gregory, H. (2010). Soft power in Asia survey, 2008. ICPSR25342-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2010-04-05. doi:10.3886/ICPSR25342.v1[Crossref] [Google Scholar]). There exists latent variance that cannot be adequately explained by parametric models. This is, in large part, due to the hidden structures and latent stories that from in unexpected ways. Therefore, a new Gibbs sampler is developed here in order to reveal preciously unseen structures and latent variances found in the survey dataset of Bouton et al. This new sampler is based upon the semiparametric regression, a well-known tool frequently utilized in order to capture the functional dependence between variables with fixed effect parametric and nonlinear regression. This is then extended to a generalized semiparametric regression for binary responses with logit and probit link function. The new sampler is then developed for the generalized linear mixed model with a nonparametric random effect. It is expressed as nonparametric regression with the multinomial-Dirichlet distribution for the number and positions of knots.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号