首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Karlis and Santourian [14 D. Karlis and A. Santourian, Model-based clustering with non-elliptically contoured distribution, Stat. Comput. 19 (2009), pp. 7383. doi: 10.1007/s11222-008-9072-0[Crossref], [Web of Science ®] [Google Scholar]] proposed a model-based clustering algorithm, the expectation–maximization (EM) algorithm, to fit the mixture of multivariate normal-inverse Gaussian (NIG) distribution. However, the EM algorithm for the mixture of multivariate NIG requires a set of initial values to begin the iterative process, and the number of components has to be given a priori. In this paper, we present a learning-based EM algorithm: its aim is to overcome the aforementioned weaknesses of Karlis and Santourian's EM algorithm [14 D. Karlis and A. Santourian, Model-based clustering with non-elliptically contoured distribution, Stat. Comput. 19 (2009), pp. 7383. doi: 10.1007/s11222-008-9072-0[Crossref], [Web of Science ®] [Google Scholar]]. The proposed learning-based EM algorithm was first inspired by Yang et al. [24 M.-S. Yang, C.-Y. Lai, and C.-Y. Lin, A robust EM clustering algorithm for Gaussian mixture models, Pattern Recognit. 45 (2012), pp. 39503961. doi: 10.1016/j.patcog.2012.04.031[Crossref], [Web of Science ®] [Google Scholar]]: the process of how they perform self-clustering was then simulated. Numerical experiments showed promising results compared to Karlis and Santourian's EM algorithm. Moreover, the methodology is applicable to the analysis of extrasolar planets. Our analysis provides an understanding of the clustering results in the ln?P?ln?M and ln?P?e spaces, where M is the planetary mass, P is the orbital period and e is orbital eccentricity. Our identified groups interpret two phenomena: (1) the characteristics of two clusters in ln?P?ln?M space might be related to the tidal and disc interactions (see [9 I.G. Jiang, W.H. Ip, and L.C. Yeh, On the fate of close-in extrasolar planets, Astrophys. J. 582 (2003), pp. 449454. doi: 10.1086/344590[Crossref], [Web of Science ®] [Google Scholar]]); and (2) there are two clusters in ln?P?e space.  相似文献   

2.
3.
Abstract

In this article, we revisit the problem of fitting a mixture model under the assumption that the mixture components are symmetric and log-concave. To this end, we first study the nonparametric maximum likelihood estimation (MLE) of a monotone log-concave probability density. To fit the mixture model, we propose a semiparametric EM (SEM) algorithm, which can be adapted to other semiparametric mixture models. In our numerical experiments, we compare our algorithm to that of Balabdaoui and Doss (2018 Balabdaoui, F., and C. R. Doss. 2018. Inference for a two-component mixture of symmetric distributions under log-concavity. Bernoulli 24 (2):105371.[Crossref], [Web of Science ®] [Google Scholar], Inference for a two-component mixture of symmetric distributions under log-concavity. Bernoulli 24 (2):1053–71) and other mixture models both on simulated and real-world datasets.  相似文献   

4.
This paper proposes an intuitive clustering algorithm capable of automatically self-organizing data groups based on the original data structure. Comparisons between the propopsed algorithm and EM [1 A. Banerjee, I.S. Dhillon, J. Ghosh, and S. Sra, Clustering on the unit hypersphere using von Mises–Fisher distribution, J. Mach. Learn. Res. 6 (2005), pp. 139. [Google Scholar]] and spherical k-means [7 I.S. Dhillon and D.S. Modha, Concept decompositions for large sparse text data using clustering, Mach. Learn. 42 (2001), pp. 143175. doi: 10.1023/A:1007612920971[Crossref], [Web of Science ®] [Google Scholar]] algorithms are given. These numerical results show the effectiveness of the proposed algorithm, using the correct classification rate and the adjusted Rand index as evaluation criteria [5 J.-M. Chiou and P.-L. Li, Functional clustering and identifying substructures of longitudinal data, J. R. Statist. Soc. Ser. B. 69 (2007), pp. 679699. doi: 10.1111/j.1467-9868.2007.00605.x[Crossref] [Google Scholar],6 J.-M. Chiou and P.-L. Li, Correlation-based functional clustering via subspace projection, J. Am. Statist. Assoc. 103 (2008), pp. 16841692. doi: 10.1198/016214508000000814[Taylor &; Francis Online], [Web of Science ®] [Google Scholar]]. In 1995, Mayor and Queloz announced the detection of the first extrasolar planet (exoplanet) around a Sun-like star. Since then, observational efforts of astronomers have led to the detection of more than 1000 exoplanets. These discoveries may provide important information for understanding the formation and evolution of planetary systems. The proposed clustering algorithm is therefore used to study the data gathered on exoplanets. Two main implications are also suggested: (1) there are three major clusters, which correspond to the exoplanets in the regimes of disc, ongoing tidal and tidal interactions, respectively, and (2) the stellar metallicity does not play a key role in exoplanet migration.  相似文献   

5.
Quality of life (QOL) is looked upon as a multidimensional entity comprising physical, psychological, social, and medical parameters. QOL is a good prognostic factor for the cancer patients. In this article, we want to determine if QOL is a good biomarker as a surrogate to indicate the survival time of gastric cancer patients. We conducted a single institutional trial and examines QOL of gastric cancer patients receiving the different surgery. In this trial, QOL is a longitudinal measurement. The accelerated failure time model can be used to deal with survival data when the proportionality assumption fails to capture the relationship between the survival time and covariates. In this article, similar to Henderson et al. (2000 Henderson , R. , Diggle , P. , Dobson , A. ( 2000 ). Joint modelling of longitudinal measurements and event time data . Biostatistics 1 ( 4 ): 465480 .[Crossref], [PubMed] [Google Scholar], 2002 Henderson , R. , Diggle , P. J. , Dobson , A. ( 2002 ). Identification and efficacy of longitudinal markers for survival . Biostatistics 3 ( 1 ): 3350 .[Crossref], [PubMed], [Web of Science ®] [Google Scholar]), a joint likelihood function combines the likelihood functions of the longitudinal biomarkers and the survival times under the accelerated failure time assumption. We introduce a method employing a frailty model to identify longitudinal biomarkers or surrogates for a time to event outcome. We allow random effects to be present in both the longitudinal biomarker and underlying survival function. The random effects in the biomarker are introduced via an explicit term while the random effect in the underlying survival function is introduced by the inclusion of frailty into the model. We will introduce a method to identify longitudinal biomarkers or surrogates for a time to event outcome based on the accelerated failure time assumption.  相似文献   

6.
‘Middle censoring’ is a very general censoring scheme where the actual value of an observation in the data becomes unobservable if it falls inside a random interval (L, R) and includes both left and right censoring. In this paper, we consider discrete lifetime data that follow a geometric distribution that is subject to middle censoring. Two major innovations in this paper, compared to the earlier work of Davarzani and Parsian [3 N. Davarzani and A. Parsian, Statistical inference for discrete middle-censored data, J. Statist. Plan. Inference 141 (2011), pp. 14551462. doi: 10.1016/j.jspi.2010.10.012[Crossref], [Web of Science ®] [Google Scholar]], include (i) an extension and generalization to the case where covariates are present along with the data and (ii) an alternate approach and proofs which exploit the simple relationship between the geometric and the exponential distributions, so that the theory is more in line with the work of Iyer et al. [6 S.K. Iyer, S.R. Jammalamadaka, and D. Kundu, Analysis of middle censored data with exponential lifetime distributions, J. Statist. Plan. Inference 138 (2008), pp. 35503560. doi: 10.1016/j.jspi.2007.03.062[Crossref], [Web of Science ®] [Google Scholar]]. It is also demonstrated that this kind of discretization of life times gives results that are close to the original data involving exponential life times. Maximum likelihood estimation of the parameters is studied for this middle-censoring scheme with covariates and their large sample distributions discussed. Simulation results indicate how well the proposed estimation methods work and an illustrative example using time-to-pregnancy data from Baird and Wilcox [1 D.D. Baird and A.J. Wilcox, Cigarette smoking associated with delayed conception, J, Am. Med. Assoc. 253 (1985), pp. 29792983. doi: 10.1001/jama.1985.03350440057031[Crossref], [Web of Science ®] [Google Scholar]] is included.  相似文献   

7.
Abstract

In this paper, we discuss how to model the mean and covariancestructures in linear mixed models (LMMs) simultaneously. We propose a data-driven method to modelcovariance structures of the random effects and random errors in the LMMs. Parameter estimation in the mean and covariances is considered by using EM algorithm, and standard errors of the parameter estimates are calculated through Louis’ (1982 Louis, T.A. (1982). Finding observed information using the EM algorithm. J. Royal Stat. Soc. B 44:98130. [Google Scholar]) information principle. Kenward’s (1987 Kenward, M.G. (1987). A method for comparing profiles of repeated measurements. Appl. Stat. 36:296308.[Crossref], [Web of Science ®] [Google Scholar]) cattle data sets are analyzed for illustration,and comparison to the literature work is made through simulation studies. Our numerical analysis confirms the superiority of the proposed method to existing approaches in terms of Akaike information criterion.  相似文献   

8.
9.
This article deals with the study of some properties of a mixture periodically correlated n-variate vector autoregressive (MPVAR) time series model, which extends the mixture time invariant parameter n-vector autoregressive (MVAR) model that has been recently studied by Fong et al. (2007 Fong, P.W., Li, W.K., Yau, C.W., Wong, C.S. (2007). On a mixture vector autoregressive model. The Canadian Journal of Statistics 35:135150.[Crossref], [Web of Science ®] [Google Scholar]). Our main contributions here are, on the one side, the obtaining of the second moment periodically stationary condition for a n-variate MPVARS(n; K; 2, …, 2) model; furthermore, the closed-form of the second moment is obtained and, on the other side, the estimation, via the Expectation-Maximization (EM) algorithm, of the coefficient matrices and the error variance matrix.  相似文献   

10.
In this article, we consider a parametric survival model that is appropriate when the population of interest contains long-term survivors or immunes. The model referred to as the cure rate model was introduced by Boag 1 Boag, J. W. 1949. Maximum likelihood estimates of the proportion of patients cured by cancer therapy. J. R. Stat. Soc. Ser. B, 11: 1553.  [Google Scholar] in terms of a mixture model that included a component representing the proportion of immunes and a distribution representing the life times of the susceptible population. We propose a cure rate model based on the generalized exponential distribution that incorporates the effects of risk factors or covariates on the probability of an individual being a long-time survivor. Maximum likelihood estimators of the model parameters are obtained using the the expectation-maximisation (EM) algorithm. A graphical method is also provided for assessing the goodness-of-fit of the model. We present an example to illustrate the fit of this model to data that examines the effects of different risk factors on relapse time for drug addicts.  相似文献   

11.
This paper presents a new variable weight method, called the singular value decomposition (SVD) approach, for Kohonen competitive learning (KCL) algorithms based on the concept of Varshavsky et al. [18 R. Varshavsky, A. Gottlieb, M. Linial, and D. Horn, Novel unsupervised feature filtering of bilogical data, Bioinformatics 22 (2006), pp. 507513.[Crossref], [PubMed], [Web of Science ®] [Google Scholar]]. Integrating the weighted fuzzy c-means (FCM) algorithm with KCL, in this paper, we propose a weighted fuzzy KCL (WFKCL) algorithm. The goal of the proposed WFKCL algorithm is to reduce the clustering error rate when data contain some noise variables. Compared with the k-means, FCM and KCL with existing variable-weight methods, the proposed WFKCL algorithm with the proposed SVD's weight method provides a better clustering performance based on the error rate criterion. Furthermore, the complexity of the proposed SVD's approach is less than Pal et al. [17 S.K. Pal, R.K. De, and J. Basak, Unsupervised feature evaluation: a neuro-fuzzy approach, IEEE. Trans. Neural Netw. 11 (2000), pp. 366376.[Crossref], [PubMed], [Web of Science ®] [Google Scholar]], Wang et al. [19 X.Z. Wang, Y.D. Wang, and L.J. Wang, Improving fuzzy c-means clustering based on feature-weight learning, Pattern Recognit. Lett. 25 (2004), pp. 11231132.[Crossref], [Web of Science ®] [Google Scholar]] and Hung et al. [9 W. -L. Hung, M. -S. Yang, and D. -H. Chen, Bootstrapping approach to feature-weight selection in fuzzy c-means algorithms with an application in color image segmentation, Pattern Recognit. Lett. 29 (2008), pp. 13171325.[Crossref], [Web of Science ®] [Google Scholar]].  相似文献   

12.
Recently, the topic of extreme value under random censoring has been considered. Different estimators for the index have been proposed (see Beirlant et al., 2007 Beirlant , J. , Guillou , A. , Dierckx , G. , Fils-Villetard , A. ( 2007 ). Estimation of the extreme value index and extreme quantiles under random censoring . Extremes 10 : 151174 .[Crossref] [Google Scholar]). All of them are constructed as the classical estimators (without censoring) divided by the proportion of non censored observations above a certain threshold. Their asymptotic normality was established by Einmahl et al. (2008 Einmahl , J. H. J. , Fils-Villetard , A. , Guillou , A. ( 2008 ). Statistics of extremes under random censoring . Bernoulli 14 ( 1 ): 207227 . [Google Scholar]). An alternative approach consists of using the Peaks-Over-Threshold method (Balkema and de Haan, 1974 Balkema , A. , de Haan , L. ( 1974 ). Residual life at great age . Ann. Probab. 2 : 792804 .[Crossref], [Web of Science ®] [Google Scholar]; Smith, 1987 Smith , R. L. ( 1987 ). Estimating tails of probability distributions . Ann. Statist. 15 : 11741207 .[Crossref], [Web of Science ®] [Google Scholar]) and to adapt the likelihood to the context of censoring. This leads to ML-estimators whose asymptotic properties are still unknown. The aim of this article is to propose one-step approximations, based on the Newton-Raphson algorithm. Based on a small simulation study, the one-step estimators are shown to be close approximations to the ML-estimators. Also, the asymptotic normality of the one-step estimators has been established, whereas in case of the ML-estimators it is still an open problem. The proof of our result, whose approach is new in the Peaks-Over-Threshold context, is in the spirit of Lehmann's theory (1991 Lehmann , E. L. ( 1991 ). Theory of Point Estimation . Pacific Grove , CA : Wadsworth & Brooks/Cole Advanced Books & Software .[Crossref] [Google Scholar]).  相似文献   

13.
In this article, several methods to make inferences about the parameters of a finite mixture of distributions in the context of centrally censored data with partial identification are revised. These methods are an adaptation of the work in Contreras-Cristán, Gutiérrez-Peña, and O'Reilly (2003 Contreras-Cristán , A. , Gutiérrez-Peña , E. , O'Reilly , F. ( 2003 ). Inferences using latent variables for mixtures of distributions for censored data with partial identification . Comm. Stat. Theor. Meth. 32 ( 4 ): 749774 .[Taylor & Francis Online], [Web of Science ®] [Google Scholar]) in the case of right censoring. The first method focuses on an asymptotic approximation to a suitably simplified likelihood using some latent quantities; the second method is based on the expectation-maximization (EM) algorithm. Both methods make explicit use of latent variables and provide computationally efficient procedures compared to non-Bayesian methods that deal directly with the full likelihood of the mixture appealing to its asymptotic approximation. The third method, from a Bayesian perspective, uses data augmentation to work with an uncensored sample. This last method is related to a recently proposed Bayesian method in Baker, Mengersen, and Davis (2005 Baker , P. , Mengersen , K. , Davis , G. ( 2005 ). A Bayesian solution to reconstructing centrally censored distributions . J. Agr. Biol. Environ. Stat. 1 : 6184 . [Google Scholar]). Our proposal of the three adapted methods is shown to provide similar inferential answers, thus offering alternative analyses.  相似文献   

14.
Abstract

In this article, we introduce a new class of lifetime distributions. This new class includes several previously known distributions such as those of Chahkandi and Ganjali (2009 Chahkandi, M., Ganjali, M. (2009). On some lifetime distributions with decreasing failure rate. Computat. Statist. Data Anal. 53:44334440.[Crossref], [Web of Science ®] [Google Scholar]), Mahmoudi and Jafari (2012 Mahmoudi, E., Jafari, A.A. (2012). Generalized exponential power series distributions. Comput. Statist. Data Anal. 56(12):40474066.[Crossref], [Web of Science ®] [Google Scholar]), and Nadarajah et al. (2012 Nadarajah, S., Shahsanaei, F., Rezaei, S. (2012). A new four-parameter lifetime distribution. J. Statist. Computat. Simul.. ifirst, 116. [Google Scholar]). This new class of four-parameter distributions allows for flexible failure rate behavior. Indeed, the failure rate function here can be increasing, decreasing, bathtub-shaped or upside-down bathtub-shaped. Several distributional properties of the new class including moments, quantiles and order statistics are studied. An EM algorithm for computing the estimates of the parameters involved is proposed and some maximum entropy characterizations are discussed. Finally, to show the flexibility and potential of the new class of distributions, applications to two real data sets are provided.  相似文献   

15.
Classification and regression tree has been useful in medical research to construct algorithms for disease diagnosis or prognostic prediction. Jin et al. 7 Jin, H., Lu, Y., Harris, R. T., Black, D., Stone, K., Hochberg, M. and Genant, H. 2004. Classification algorithms for hip fracture prediction base on recursive partitioning methods. Med. Decis. Mak., 24: 386398. (doi:10.1177/0272989X04267009)[Crossref], [PubMed], [Web of Science ®] [Google Scholar] developed a robust and cost-saving tree (RACT) algorithm with application in classification of hip fracture risk after 5-year follow-up based on the data from the Study of Osteoporotic Fractures (SOF). Although conventional recursive partitioning algorithms have been well developed, they still have some limitations. Binary splits may generate a big tree with many layers, but trinary splits may produce too many nodes. In this paper, we propose a classification approach combining trinary splits and binary splits to generate a trinary–binary tree. A new non-inferiority test of entropy is used to select the binary or trinary splits. We apply the modified method in SOF to construct a trinary–binary classification rule for predicting risk of osteoporotic hip fracture. Our new classification tree has good statistical utility: it is statistically non-inferior to the optimum binary tree and the RACT based on the testing sample and is also cost-saving. It may be useful in clinical applications: femoral neck bone mineral density, age, height loss and weight gain since age 25 can identify subjects with elevated 5-year hip fracture risk without loss of statistical efficiency.  相似文献   

16.
The t-distribution (univariate and multivariate) has many useful applications in robust statistical analysis. The parameter estimation of the t-distribution is carried out using maximum likelihood (ML) estimation method, and the ML estimates are obtained via the Expectation-Maximization (EM) algorithm. In this article, we will use the maximum Lq-likelihood (MLq) estimation method introduced by Ferrari and Yang (2010 Ferrari, D., and Y. Yang. 2010. Maximum lq-likelihood estimation. The Annals of Statistics 38 (2):75383.[Crossref], [Web of Science ®] [Google Scholar]) to estimate all the parameters of the multivariate t-distribution. We modify the EM algorithm to obtain the MLq estimates. We provide a simulation study and a real data example to illustrate the performance of the MLq estimators over the ML estimators.  相似文献   

17.
This article extends the correlation methodology developed by Chinchilli et al. (2005 Chinchilli , V. M. , Phillips , B. R. , Mauger , D. T. , Szefler , S. J. ( 2005 ). A general class of correlation coefficients for the 2 × 2 crossover design . Biometr. J. 47 : 110 . [Google Scholar]) for the 2 × 2 crossover design to more complex crossover designs for clinical trials. We describe how the methodology can be adapted to a general type of two-treatment crossover design which includes either at least two sequences or at least two treatment periods or both. We then derive the asymptotic theory for the corresponding correlation statistics, investigate the statistical accuracy of the estimators via bootstrap analyses, and demonstrate their use with two real data examples.  相似文献   

18.
Considering the Wald, score, and likelihood ratio asymptotic test statistics, we analyze a multivariate null intercept errors-in-variables regression model, where the explanatory and the response variables are subject to measurement errors, and a possible structure of dependency between the measurements taken within the same individual are incorporated, representing a longitudinal structure. This model was proposed by Aoki et al. (2003b Aoki , R. , Bolfarine , H. , Achcar , J. A. , Pinto Jr. , D. L. ( 2003b ). Bayesian analysis of a multivariate null intercept errors-in-variables regression model . Journal of Biopharmaceutical Statistics 13 : 767775 .[Taylor & Francis Online] [Google Scholar]) and analyzed under the bayesian approach. In this article, considering the classical approach, we analyze asymptotic test statistics and present a simulation study to compare the behavior of the three test statistics for different sample sizes, parameter values and nominal levels of the test. Also, closed form expressions for the score function and the Fisher information matrix are presented. We consider two real numerical illustrations, the odontological data set from Hadgu and Koch (1999 Hadgu , A. , Koch , G. ( 1999 ). Application of generalized estimating equations to a dental randomized clinical trial . Journal of Biopharmaceutical Statistics 9 ( 1 ): 161178 .[Taylor & Francis Online] [Google Scholar]), and a quality control data set.  相似文献   

19.
Liew (1976a Liew, C.K. (1976a). A two-stage least-squares estimation with inequality restrictions on parameters. Rev. Econ. Stat. LVIII(2):234238.[Crossref], [Web of Science ®] [Google Scholar]) introduced generalized inequality constrained least squares (GICLS) estimator and inequality constrained two-stage and three-stage least squares estimators by reducing primal–dual relation to problem of Dantzig and Cottle (1967 Dantzig, G.B., Cottle, R.W. (1967). Positive (semi-) definite matrices and mathematical programming. In: Abadie, J., ed. Nonlinear Programming (pp. 55–73). Amsterdam: North Holland Publishing Co. [Google Scholar]), Cottle and Dantzig (1974 Cottle, R.W., Dantzig, G.B. (1974). Complementary pivot of mathematical programming. In: Dantzig, G.B., Eaves, B.C., eds. Studies in OptimizationVol. 10. Washington: Mathematical Association of America. [Google Scholar]) and solving with Lemke (1962 Lemke, C.E. (1962). A method of solution for quadratic programs. Manage. Sci. 8(4):442453.[Crossref], [Web of Science ®] [Google Scholar]) algorithm. The purpose of this article is to present inequality constrained ridge regression (ICRR) estimator with correlated errors and inequality constrained two-stage and three-stage ridge regression estimators in the presence of multicollinearity. Untruncated variance–covariance matrix and mean square error are derived for the ICRR estimator with correlated errors, and its superiority over the GICLS estimator is examined via Monte Carlo simulation.  相似文献   

20.
ABSTRACT

The present article is an attempt to explore the rotation patterns using exponential ratio type estimators for the estimation of finite population median at current occasion in two occasion rotation sampling. Properties of the proposed estimators including the optimum replacement strategies have been elaborated. The proposed estimators have been compared with sample median estimator when there is no matching from previous occasion as well with the ratio type estimator proposed by Singh et al. (2007 Singh, H.P., Tailor, R., Singh, S., Kim, Jong-Min. (2007). Quintile estimation in successive sampling. J. Kor. Stat. Soc. 36(4):543556. [Google Scholar]) for second quantile. The behaviors of the proposed estimators are justified by empirical interpretations and validated by means of simulation study with the help of some natural populations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号