首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this article, we propose a new criterion to evaluate the similarity of probability density functions (pdfs). We call this the criterion on similar coefficient of cluster (SCC) and use it as a tool to deal with overlap coefficients of pdfs in normal standard on [0;1]. With the support of the self-update algorithm for determining the suitable number of clusters, SCC then becomes a criterion to establish the corresponding cluster for pdfs. Moreover, some results on determination of SCC in case of two and more than two pdfs as well as relations of different SCCs and other measures are presented. The numerical examples in both synthetic data and real data are given not only to illustrate the suitability of proposed theories and algorithms but also to demonstrate the applicability and innovation of the proposed algorithm.  相似文献   

2.
We propose an intuitive and computationally simple algorithm for clustering the probability density functions (pdfs). A data-driven learning mechanism is incorporated in the algorithm in order to determine the suitable widths of the clusters. The clustering results prove that the proposed algorithm is able to automatically group the pdfs and provide the optimal cluster number without any a priori information. The performance study also shows that the proposed algorithm is more efficient than existing ones. In addition, the clustering can serve as the intermediate compression tool in content-based multimedia retrieval that we apply the proposed algorithm to categorize a subset of COREL image database. And the clustering results indicate that the proposed algorithm performs well in colour image categorization.  相似文献   

3.
Mixed-Weibull distribution has been used to model a wide range of failure data sets, and in many practical situations the number of components in a mixture model is unknown. Thus, the parameter estimation of a mixed-Weibull distribution is considered and the important issue of how to determine the number of components is discussed. Two approaches are proposed to solve this problem. One is the method of moments and the other is a regularization type of fuzzy clustering algorithm. Finally, numerical examples and two real data sets are given to illustrate the features of the proposed approaches.  相似文献   

4.
Under the assumption of multivariate normality the likelihood ratio test is derived to test a hypothesis for Kronecker product structure on a covariance matrix in the context of multivariate repeated measures data. Although the proposed hypothesis testing can be computationally performed by indirect use of Proc Mixed of SAS, the Proc Mixed algorithm often fails to converge. We provide an alternative algorithm. The algorithm is illustrated with two real data sets. A simulation study is also conducted for the purpose of sample size consideration.  相似文献   

5.
The EM algorithm is the standard method for estimating the parameters in finite mixture models. Yang and Pan [25] proposed a generalized classification maximum likelihood procedure, called the fuzzy c-directions (FCD) clustering algorithm, for estimating the parameters in mixtures of von Mises distributions. Two main drawbacks of the EM algorithm are its slow convergence and the dependence of the solution on the initial value used. The choice of initial values is of great importance in the algorithm-based literature as it can heavily influence the speed of convergence of the algorithm and its ability to locate the global maximum. On the other hand, the algorithmic frameworks of EM and FCD are closely related. Therefore, the drawbacks of FCD are the same as those of the EM algorithm. To resolve these problems, this paper proposes another clustering algorithm, which can self-organize local optimal cluster numbers without using cluster validity functions. These numerical results clearly indicate that the proposed algorithm is superior in performance of EM and FCD algorithms. Finally, we apply the proposed algorithm to two real data sets.  相似文献   

6.
This paper addresses the problem of identifying groups that satisfy the specific conditions for the means of feature variables. In this study, we refer to the identified groups as “target clusters” (TCs). To identify TCs, we propose a method based on the normal mixture model (NMM) restricted by a linear combination of means. We provide an expectation–maximization (EM) algorithm to fit the restricted NMM by using the maximum-likelihood method. The convergence property of the EM algorithm and a reasonable set of initial estimates are presented. We demonstrate the method's usefulness and validity through a simulation study and two well-known data sets. The proposed method provides several types of useful clusters, which would be difficult to achieve with conventional clustering or exploratory data analysis methods based on the ordinary NMM. A simple comparison with another target clustering approach shows that the proposed method is promising in the identification.  相似文献   

7.
The Receiver Operating Characteristic (ROC) curve and the Area Under the ROC Curve (AUC) are effective statistical tools for evaluating the accuracy of diagnostic tests for binary‐class medical data. However, many real‐world biomedical problems involve more than two categories. The Volume Under the ROC Surface (VUS) and Hypervolume Under the ROC Manifold (HUM) measures are extensions for the AUC under three‐class and multiple‐class models. Inference methods for such measures have been proposed recently. We develop a method of constructing a linear combination of markers for which the VUS or HUM of the combined markers is maximized. Asymptotic validity of the estimator is justified by extending the results for maximum rank correlation estimation that are well known in econometrics. A bootstrap resampling method is then applied to estimate the sampling variability. Simulations and examples are provided to demonstrate our methods.  相似文献   

8.
In this paper, we propose a new Bayesian inference approach for classification based on the traditional hinge loss used for classical support vector machines, which we call the Bayesian Additive Machine (BAM). Unlike existing approaches, the new model has a semiparametric discriminant function where some feature effects are nonlinear and others are linear. This separation of features is achieved automatically during model fitting without user pre-specification. Following the literature on sparse regression of high-dimensional models, we can also identify the irrelevant features. By introducing spike-and-slab priors using two sets of indicator variables, these multiple goals are achieved simultaneously and automatically, without any parameter tuning such as cross-validation. An efficient partially collapsed Markov chain Monte Carlo algorithm is developed for posterior exploration based on a data augmentation scheme for the hinge loss. Our simulations and three real data examples demonstrate that the new approach is a strong competitor to some approaches that were proposed recently for dealing with challenging classification examples with high dimensionality.  相似文献   

9.
The mixture distribution models are more useful than pure distributions in modeling of heterogeneous data sets. The aim of this paper is to propose mixture of Weibull–Poisson (WP) distributions to model heterogeneous data sets for the first time. So, a powerful alternative mixture distribution is created for modeling of the heterogeneous data sets. In the study, many features of the proposed mixture of WP distributions are examined. Also, the expectation maximization (EM) algorithm is used to determine the maximum-likelihood estimates of the parameters, and the simulation study is conducted for evaluating the performance of the proposed EM scheme. Applications for two real heterogeneous data sets are given to show the flexibility and potentiality of the new mixture distribution.  相似文献   

10.
In this paper, a new hybrid model of vector autoregressive moving average (VARMA) models and Bayesian networks is proposed to improve the forecasting performance of multivariate time series. In the proposed model, the VARMA model, which is a popular linear model in time series forecasting, is specified to capture the linear characteristics. Then the errors of the VARMA model are clustered into some trends by K-means algorithm with Krzanowski–Lai cluster validity index determining the number of trends, and a Bayesian network is built to learn the relationship between the data and the trend of its corresponding VARMA error. Finally, the estimated values of the VARMA model are compensated by the probabilities of their corresponding VARMA errors belonging to each trend, which are obtained from the Bayesian network. Compared with VARMA models, the experimental results with a simulation study and two multivariate real-world data sets indicate that the proposed model can effectively improve the prediction performance.  相似文献   

11.
Based on progressively type-II censored data, the maximum-likelihood estimators (MLEs) for the Lomax parameters are derived using the expectation–maximization (EM) algorithm. Moreover, the expected Fisher information matrix based on the missing value principle is computed. Using extensive simulation and three criteria, namely, bias, root mean squared error and Pitman closeness measures, we compare the performance of the MLEs via the EM algorithm and the Newton–Raphson (NR) method. It is concluded that the EM algorithm outperforms the NR method in all the cases. Two real data examples are used to illustrate our proposed estimators.  相似文献   

12.
Combining data of several tests or markers for the classification of patients according to their health status for assigning better treatments is a major issue in the study of diseases such as cancer. In order to tackle this problem, several approaches have been proposed in the literature. In this paper, a step-by-step algorithm for estimating the parameters of a linear classifier that combines several measures is considered. The optimization criterion is to maximize the area under the receiver operating characteristic curve. The algorithm is applied to different simulated data sets and its performance is evaluated. Finally, the method is illustrated with a prostate cancer staging database.  相似文献   

13.
This paper contrasts two approaches to estimating quantile regression models: traditional semi-parametric methods and partially adaptive estimators using flexible probability density functions (pdfs). While more general pdfs could have been used, the skewed Laplace was selected for pedagogical purposes. Monte Carlo simulations are used to compare the behavior of the semi-parametric and partially adaptive quantile estimators in the presence of possibly skewed and heteroskedastic data. Both approaches accommodate skewness and heteroskedasticity which are consistent with linear quantiles; however, the partially adaptive estimator considered allows for non linear quantiles and also provides simple tests for symmetry and heteroskedasticity. The methods are applied to the problem of estimating conditional quantile functions for wages corresponding to different levels of education.  相似文献   

14.
针对传统的MGM(1,m)模型存在模拟精度和预测精度不高的问题,文章给出了改进的初值和背景值优化的MGM(1,m)模型。在模型初值的选取上,选取使得模拟值的平均相对误差达到最小的向量X(1)(i)作为初值;在模型背景值的构造上,提出结合辛普森3/8公式的动态序列模型来求解背景值的方法。最后以两组指数型数据序列为例建立了传统MGM(1,2)模型及改进后的模型,并进行数据模拟和预测。结果表明,改进后的MGM(1,m)模型的模拟精度和预测精度均有显著地提高,从而验证了模型的有效性和可行性。  相似文献   

15.
《统计学通讯:理论与方法》2012,41(13-14):2342-2355
We propose a distance-based method to relate two data sets. We define and study some measures of multivariate association based on distances between observations. The proposed approach can be used to deal with general data sets (e.g., observations on continuous, categorical or mixed variables). An application, using Hellinger distance, provides the relationships between two regions of hyperspectral images.  相似文献   

16.
In this article, we present a new efficient iteration estimation approach based on local modal regression for single-index varying-coefficient models. The resulted estimators are shown to be robust with regardless of outliers and error distributions. The asymptotic properties of the estimators are established under some regularity conditions and a practical modified EM algorithm is proposed for the new method. Moreover, to achieve sparse estimator when there exists irrelevant variables in the index parameters, a variable selection procedure based on SCAD penalty is developed to select significant parametric covariates and the well-known oracle properties are also derived. Finally, some numerical examples with various distributed errors and a real data analysis are conducted to illustrate the validity and feasibility of our proposed method.  相似文献   

17.
For the data from multivariate t distributions, it is very hard to make an influence analysis based on the probability density function since its expression is intractable. In this paper, we present a technique for influence analysis based on the mixture distribution and EM algorithm. In fact, the multivariate t distribution can be considered as a particular Gaussian mixture by introducing the weights from the Gamma distribution. We treat the weights as the missing data and develop the influence analysis for the data from multivariate t distributions based on the conditional expectation of the complete-data log-likelihood function in the EM algorithm. Several case-deletion measures are proposed for detecting influential observations from multivariate t distributions. Two numerical examples are given to illustrate our methodology.  相似文献   

18.
Eunju Hwang 《Statistics》2017,51(4):904-920
In long-memory data sets such as the realized volatilities of financial assets, a sequential test is developed for the detection of structural mean breaks. The long memory, if any, is adjusted by fitting an HAR (heterogeneous autoregressive) model to the data sets and taking the residuals. Our test consists of applying the sequential test of Bai and Perron [Estimating and testing linear models with multiple structural changes. Econometrica. 1998;66:47–78] to the residuals. The large-sample validity of the proposed test is investigated in terms of the consistency of the estimated number of breaks and the asymptotic null distribution of the proposed test. A finite-sample Monte-Carlo experiment reveals that the proposed test tends to produce an unbiased break time estimate, while the usual sequential test of Bai and Perron tends to produce biased break times in the case of long memory. The experiment also reveals that the proposed test has a more stable size than the Bai and Perron test. The proposed test is applied to two realized volatility data sets of the S&P index and the Korea won-US dollar exchange rate for the past 7 years and finds 2 or 3 breaks, while the Bai and Perron test finds 8 or more breaks.  相似文献   

19.
Abstract

A new parametric hypothesis test of mean interval for interval-valued data set, which can deal with massive information contained in nowadays massive data “Big data” sets, is proposed. An approach using an orthogonal transformation is introduced to obtain an equivalent hypothesis test of mean interval in terms of the mid-point and mid-range of the interval-valued variable. The new test is very efficient in small interval-valued sample scenarios. Some simulation studies are conducted for the investigation of the sample size and the power of test. The performance of the proposed test is illustrated with two real-life examples.  相似文献   

20.
ABSTRACT

In this paper, we propose a parameter estimation method for the three-parameter lognormal distribution based on Type-II right censored data. In the proposed method, under mild conditions, the estimates always exist uniquely in the entire parameter space, and the estimators also have consistency over the entire parameter space. Through Monte Carlo simulations, we further show that the proposed method performs very well compared to a prominent method of estimation in terms of bias and root mean squared error (RMSE) in small-sample situations. Finally, two examples based on real data sets are presented for illustrating the proposed method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号