共查询到20条相似文献,搜索用时 31 毫秒
1.
Karlis and Santourian [14] proposed a model-based clustering algorithm, the expectation–maximization (EM) algorithm, to fit the mixture of multivariate normal-inverse Gaussian (NIG) distribution. However, the EM algorithm for the mixture of multivariate NIG requires a set of initial values to begin the iterative process, and the number of components has to be given a priori. In this paper, we present a learning-based EM algorithm: its aim is to overcome the aforementioned weaknesses of Karlis and Santourian's EM algorithm [14]. The proposed learning-based EM algorithm was first inspired by Yang et al. [24]: the process of how they perform self-clustering was then simulated. Numerical experiments showed promising results compared to Karlis and Santourian's EM algorithm. Moreover, the methodology is applicable to the analysis of extrasolar planets. Our analysis provides an understanding of the clustering results in the ln?P?ln?M and ln?P?e spaces, where M is the planetary mass, P is the orbital period and e is orbital eccentricity. Our identified groups interpret two phenomena: (1) the characteristics of two clusters in ln?P?ln?M space might be related to the tidal and disc interactions (see [9]); and (2) there are two clusters in ln?P?e space. 相似文献
2.
3.
《统计学通讯:理论与方法》2012,41(1):78-87
AbstractIn this article, we revisit the problem of fitting a mixture model under the assumption that the mixture components are symmetric and log-concave. To this end, we first study the nonparametric maximum likelihood estimation (MLE) of a monotone log-concave probability density. To fit the mixture model, we propose a semiparametric EM (SEM) algorithm, which can be adapted to other semiparametric mixture models. In our numerical experiments, we compare our algorithm to that of Balabdaoui and Doss (2018, Inference for a two-component mixture of symmetric distributions under log-concavity. Bernoulli 24 (2):1053–71) and other mixture models both on simulated and real-world datasets. 相似文献
4.
Wen-Liang Hung Shou-Jen Chang-Chien Miin-Shen Yang 《Journal of applied statistics》2015,42(10):2220-2232
This paper proposes an intuitive clustering algorithm capable of automatically self-organizing data groups based on the original data structure. Comparisons between the propopsed algorithm and EM [1] and spherical k-means [7] algorithms are given. These numerical results show the effectiveness of the proposed algorithm, using the correct classification rate and the adjusted Rand index as evaluation criteria [5,6]. In 1995, Mayor and Queloz announced the detection of the first extrasolar planet (exoplanet) around a Sun-like star. Since then, observational efforts of astronomers have led to the detection of more than 1000 exoplanets. These discoveries may provide important information for understanding the formation and evolution of planetary systems. The proposed clustering algorithm is therefore used to study the data gathered on exoplanets. Two main implications are also suggested: (1) there are three major clusters, which correspond to the exoplanets in the regimes of disc, ongoing tidal and tidal interactions, respectively, and (2) the stellar metallicity does not play a key role in exoplanet migration. 相似文献
5.
Feng-shou Ko 《统计学通讯:理论与方法》2013,42(21):3824-3838
Quality of life (QOL) is looked upon as a multidimensional entity comprising physical, psychological, social, and medical parameters. QOL is a good prognostic factor for the cancer patients. In this article, we want to determine if QOL is a good biomarker as a surrogate to indicate the survival time of gastric cancer patients. We conducted a single institutional trial and examines QOL of gastric cancer patients receiving the different surgery. In this trial, QOL is a longitudinal measurement. The accelerated failure time model can be used to deal with survival data when the proportionality assumption fails to capture the relationship between the survival time and covariates. In this article, similar to Henderson et al. (2000, 2002), a joint likelihood function combines the likelihood functions of the longitudinal biomarkers and the survival times under the accelerated failure time assumption. We introduce a method employing a frailty model to identify longitudinal biomarkers or surrogates for a time to event outcome. We allow random effects to be present in both the longitudinal biomarker and underlying survival function. The random effects in the biomarker are introduced via an explicit term while the random effect in the underlying survival function is introduced by the inclusion of frailty into the model. We will introduce a method to identify longitudinal biomarkers or surrogates for a time to event outcome based on the accelerated failure time assumption. 相似文献
6.
Analysis of discrete lifetime data under middle-censoring and in the presence of covariates 总被引:1,自引:0,他引:1
S. Rao Jammalamadaka 《Journal of applied statistics》2015,42(4):905-913
‘Middle censoring’ is a very general censoring scheme where the actual value of an observation in the data becomes unobservable if it falls inside a random interval (L, R) and includes both left and right censoring. In this paper, we consider discrete lifetime data that follow a geometric distribution that is subject to middle censoring. Two major innovations in this paper, compared to the earlier work of Davarzani and Parsian [3], include (i) an extension and generalization to the case where covariates are present along with the data and (ii) an alternate approach and proofs which exploit the simple relationship between the geometric and the exponential distributions, so that the theory is more in line with the work of Iyer et al. [6]. It is also demonstrated that this kind of discretization of life times gives results that are close to the original data involving exponential life times. Maximum likelihood estimation of the parameters is studied for this middle-censoring scheme with covariates and their large sample distributions discussed. Simulation results indicate how well the proposed estimation methods work and an illustrative example using time-to-pregnancy data from Baird and Wilcox [1] is included. 相似文献
7.
AbstractIn this paper, we discuss how to model the mean and covariancestructures in linear mixed models (LMMs) simultaneously. We propose a data-driven method to modelcovariance structures of the random effects and random errors in the LMMs. Parameter estimation in the mean and covariances is considered by using EM algorithm, and standard errors of the parameter estimates are calculated through Louis’ (1982) information principle. Kenward’s (1987) cattle data sets are analyzed for illustration,and comparison to the literature work is made through simulation studies. Our numerical analysis confirms the superiority of the proposed method to existing approaches in terms of Akaike information criterion. 相似文献
8.
9.
This article deals with the study of some properties of a mixture periodically correlated n-variate vector autoregressive (MPVAR) time series model, which extends the mixture time invariant parameter n-vector autoregressive (MVAR) model that has been recently studied by Fong et al. (2007). Our main contributions here are, on the one side, the obtaining of the second moment periodically stationary condition for a n-variate MPVARS(n; K; 2, …, 2) model; furthermore, the closed-form of the second moment is obtained and, on the other side, the estimation, via the Expectation-Maximization (EM) algorithm, of the coefficient matrices and the error variance matrix. 相似文献
10.
In this article, we consider a parametric survival model that is appropriate when the population of interest contains long-term survivors or immunes. The model referred to as the cure rate model was introduced by Boag 1 in terms of a mixture model that included a component representing the proportion of immunes and a distribution representing the life times of the susceptible population. We propose a cure rate model based on the generalized exponential distribution that incorporates the effects of risk factors or covariates on the probability of an individual being a long-time survivor. Maximum likelihood estimators of the model parameters are obtained using the the expectation-maximisation (EM) algorithm. A graphical method is also provided for assessing the goodness-of-fit of the model. We present an example to illustrate the fit of this model to data that examines the effects of different risk factors on relapse time for drug addicts. 相似文献
11.
This paper presents a new variable weight method, called the singular value decomposition (SVD) approach, for Kohonen competitive learning (KCL) algorithms based on the concept of Varshavsky et al. [18]. Integrating the weighted fuzzy c-means (FCM) algorithm with KCL, in this paper, we propose a weighted fuzzy KCL (WFKCL) algorithm. The goal of the proposed WFKCL algorithm is to reduce the clustering error rate when data contain some noise variables. Compared with the k-means, FCM and KCL with existing variable-weight methods, the proposed WFKCL algorithm with the proposed SVD's weight method provides a better clustering performance based on the error rate criterion. Furthermore, the complexity of the proposed SVD's approach is less than Pal et al. [17], Wang et al. [19] and Hung et al. [9]. 相似文献
12.
Recently, the topic of extreme value under random censoring has been considered. Different estimators for the index have been proposed (see Beirlant et al., 2007). All of them are constructed as the classical estimators (without censoring) divided by the proportion of non censored observations above a certain threshold. Their asymptotic normality was established by Einmahl et al. (2008). An alternative approach consists of using the Peaks-Over-Threshold method (Balkema and de Haan, 1974; Smith, 1987) and to adapt the likelihood to the context of censoring. This leads to ML-estimators whose asymptotic properties are still unknown. The aim of this article is to propose one-step approximations, based on the Newton-Raphson algorithm. Based on a small simulation study, the one-step estimators are shown to be close approximations to the ML-estimators. Also, the asymptotic normality of the one-step estimators has been established, whereas in case of the ML-estimators it is still an open problem. The proof of our result, whose approach is new in the Peaks-Over-Threshold context, is in the spirit of Lehmann's theory (1991). 相似文献
13.
In this article, several methods to make inferences about the parameters of a finite mixture of distributions in the context of centrally censored data with partial identification are revised. These methods are an adaptation of the work in Contreras-Cristán, Gutiérrez-Peña, and O'Reilly (2003) in the case of right censoring. The first method focuses on an asymptotic approximation to a suitably simplified likelihood using some latent quantities; the second method is based on the expectation-maximization (EM) algorithm. Both methods make explicit use of latent variables and provide computationally efficient procedures compared to non-Bayesian methods that deal directly with the full likelihood of the mixture appealing to its asymptotic approximation. The third method, from a Bayesian perspective, uses data augmentation to work with an uncensored sample. This last method is related to a recently proposed Bayesian method in Baker, Mengersen, and Davis (2005). Our proposal of the three adapted methods is shown to provide similar inferential answers, thus offering alternative analyses. 相似文献
14.
S. Shams Harandi 《统计学通讯:理论与方法》2013,42(8):2204-2227
AbstractIn this article, we introduce a new class of lifetime distributions. This new class includes several previously known distributions such as those of Chahkandi and Ganjali (2009), Mahmoudi and Jafari (2012), and Nadarajah et al. (2012). This new class of four-parameter distributions allows for flexible failure rate behavior. Indeed, the failure rate function here can be increasing, decreasing, bathtub-shaped or upside-down bathtub-shaped. Several distributional properties of the new class including moments, quantiles and order statistics are studied. An EM algorithm for computing the estimates of the parameters involved is proposed and some maximum entropy characterizations are discussed. Finally, to show the flexibility and potential of the new class of distributions, applications to two real data sets are provided. 相似文献
15.
Classification and regression tree has been useful in medical research to construct algorithms for disease diagnosis or prognostic prediction. Jin et al. 7 developed a robust and cost-saving tree (RACT) algorithm with application in classification of hip fracture risk after 5-year follow-up based on the data from the Study of Osteoporotic Fractures (SOF). Although conventional recursive partitioning algorithms have been well developed, they still have some limitations. Binary splits may generate a big tree with many layers, but trinary splits may produce too many nodes. In this paper, we propose a classification approach combining trinary splits and binary splits to generate a trinary–binary tree. A new non-inferiority test of entropy is used to select the binary or trinary splits. We apply the modified method in SOF to construct a trinary–binary classification rule for predicting risk of osteoporotic hip fracture. Our new classification tree has good statistical utility: it is statistically non-inferior to the optimum binary tree and the RACT based on the testing sample and is also cost-saving. It may be useful in clinical applications: femoral neck bone mineral density, age, height loss and weight gain since age 25 can identify subjects with elevated 5-year hip fracture risk without loss of statistical efficiency. 相似文献
16.
The t-distribution (univariate and multivariate) has many useful applications in robust statistical analysis. The parameter estimation of the t-distribution is carried out using maximum likelihood (ML) estimation method, and the ML estimates are obtained via the Expectation-Maximization (EM) algorithm. In this article, we will use the maximum Lq-likelihood (MLq) estimation method introduced by Ferrari and Yang (2010) to estimate all the parameters of the multivariate t-distribution. We modify the EM algorithm to obtain the MLq estimates. We provide a simulation study and a real data example to illustrate the performance of the MLq estimators over the ML estimators. 相似文献
17.
This article extends the correlation methodology developed by Chinchilli et al. (2005) for the 2 × 2 crossover design to more complex crossover designs for clinical trials. We describe how the methodology can be adapted to a general type of two-treatment crossover design which includes either at least two sequences or at least two treatment periods or both. We then derive the asymptotic theory for the corresponding correlation statistics, investigate the statistical accuracy of the estimators via bootstrap analyses, and demonstrate their use with two real data examples. 相似文献
18.
Considering the Wald, score, and likelihood ratio asymptotic test statistics, we analyze a multivariate null intercept errors-in-variables regression model, where the explanatory and the response variables are subject to measurement errors, and a possible structure of dependency between the measurements taken within the same individual are incorporated, representing a longitudinal structure. This model was proposed by Aoki et al. (2003b) and analyzed under the bayesian approach. In this article, considering the classical approach, we analyze asymptotic test statistics and present a simulation study to compare the behavior of the three test statistics for different sample sizes, parameter values and nominal levels of the test. Also, closed form expressions for the score function and the Fisher information matrix are presented. We consider two real numerical illustrations, the odontological data set from Hadgu and Koch (1999), and a quality control data set. 相似文献
19.
Liew (1976a) introduced generalized inequality constrained least squares (GICLS) estimator and inequality constrained two-stage and three-stage least squares estimators by reducing primal–dual relation to problem of Dantzig and Cottle (1967), Cottle and Dantzig (1974) and solving with Lemke (1962) algorithm. The purpose of this article is to present inequality constrained ridge regression (ICRR) estimator with correlated errors and inequality constrained two-stage and three-stage ridge regression estimators in the presence of multicollinearity. Untruncated variance–covariance matrix and mean square error are derived for the ICRR estimator with correlated errors, and its superiority over the GICLS estimator is examined via Monte Carlo simulation. 相似文献
20.
ABSTRACTThe present article is an attempt to explore the rotation patterns using exponential ratio type estimators for the estimation of finite population median at current occasion in two occasion rotation sampling. Properties of the proposed estimators including the optimum replacement strategies have been elaborated. The proposed estimators have been compared with sample median estimator when there is no matching from previous occasion as well with the ratio type estimator proposed by Singh et al. (2007) for second quantile. The behaviors of the proposed estimators are justified by empirical interpretations and validated by means of simulation study with the help of some natural populations. 相似文献