首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
马少沛等 《统计研究》2021,38(2):114-134
在大数据时代,金融学、基因组学和图像处理等领域产生了大量的张量数据。Zhong等(2015)提出了张量充分降维方法,并给出了处理二阶张量的序列迭代算法。鉴于高阶张量在实际生活中的广泛应用,本文将Zhong等(2015)的算法推广到高阶,以三阶张量为例,提出了两种不同的算法:结构转换算法和结构保持算法。两种算法都能够在不同程度上保持张量原有结构信息,同时有效降低变量维度和计算复杂度,避免协方差矩阵奇异的问题。将两种算法应用于人像彩图的分类识别,以二维和三维点图等形式直观展现了算法分类结果。将本文的结构保持算法与K-means聚类方法、t-SNE非线性降维方法、多维主成分分析、多维判别分析和张量切片逆回归共五种方法进行对比,结果表明本文所提方法在分类精度方面有明显优势,因此在图像识别及相关应用领域具有广阔的发展前景。  相似文献   

2.
Abstract.  The Andersson–Madigan–Perlman (AMP) Markov property is a recently proposed alternative Markov property (AMP) for chain graphs. In the case of continuous variables with a joint multivariate Gaussian distribution, it is the AMP rather than the earlier introduced Lauritzen–Wermuth–Frydenberg Markov property that is coherent with data-generation by natural block-recursive regressions. In this paper, we show that maximum likelihood estimates in Gaussian AMP chain graph models can be obtained by combining generalized least squares and iterative proportional fitting to an iterative algorithm. In an appendix, we give useful convergence results for iterative partial maximization algorithms that apply in particular to the described algorithm.  相似文献   

3.
Boltzmann machines (BM), a type of neural networking algorithm, have been proven to be useful in pattern recognition. Patterns on quality control charts have long been recognized as providing useful information for correcting process performance problems. In computer-integrated manufacturing environments, where the control charts are monitored by computer algorithms, the potential for using pattern-recognition algorithms is considerable. The main purpose of this paper is to formulate a Boltzmann machine pattern recognizer (BMPR) and demonstrate its utility in control chart pattern recognition. It is not the intent of this paper to make comparisons between existing related algorithms. A factorial design of experiments was conducted to study the effects of numerous factors on the convergence behavior and performance of these BMPRs. These factors include the number of hidden nodes used in the network and the annealing schedule. Simulations indicate that the temperature level of the annealing schedule significantly affects the convergence behavior of the training process and that, to achieve a balanced performance of these BMPRs, a medium to high level of annealing temperatures is recommended. Numerical results for cyclical and stratification patterns illustrate that the classification capability of these BMPRs is quite powerful.  相似文献   

4.
Currently, extreme large-scale genetic data present significant challenges for cluster analysis. Most of the existing clustering methods are typically built on the Euclidean distance and geared toward analyzing continuous response. They work well for clustering, e.g. microarray gene expression data, but often perform poorly for clustering, e.g. large-scale single nucleotide polymorphism (SNP) data. In this paper, we study the penalized latent class model for clustering extremely large-scale discrete data. The penalized latent class model takes into account the discrete nature of the response using appropriate generalized linear models and adopts the lasso penalized likelihood approach for simultaneous model estimation and selection of important covariates. We develop very efficient numerical algorithms for model estimation based on the iterative coordinate descent approach and further develop the expectation–maximization algorithm to incorporate and model missing values. We use simulation studies and applications to the international HapMap SNP data to illustrate the competitive performance of the penalized latent class model.  相似文献   

5.
Dissemination of information derived from large contingency tables formed from confidential data is a major responsibility of statistical agencies. In this paper we present solutions to several computational and algorithmic problems that arise in the dissemination of cross-tabulations (marginal sub-tables) from a single underlying table. These include data structures that exploit sparsity to support efficient computation of marginals and algorithms such as iterative proportional fitting, as well as a generalized form of the shuttle algorithm that computes sharp bounds on (small, confidentiality threatening) cells in the full table from arbitrary sets of released marginals. We give examples illustrating the techniques.  相似文献   

6.
The problem is to classify an individual into one of two populations based on an observation on the individual which follows a stationary Gaussian process and the populations are two distinct time points. Plug-in likelihood ratio rules are considered using samples from the process. The distribution of associated classification statistics are derived. For the special case when the mis-classification probabilities are equal, the nature of dependence between the population distributions on the probability of correct classification is studied. Lower bounds and iterative method of evaluation of the optimal correlation between the populations are obtained.  相似文献   

7.
This paper introduces a new Laplace transform inversion method designed specifically for when the target function is a probability distribution function. In particular, we use fixed point theory and Mann type iterative algorithms to provide a means by which to estimate and sample from the target probability distribution.  相似文献   

8.
The KL-optimality criterion has been recently proposed to discriminate between any two statistical models. However, designs which are optimal for model discrimination may be inadequate for parameter estimation. In this paper, the DKL-optimality criterion is proposed which is useful for the dual problem of model discrimination and parameter estimation. An equivalence theorem and a stopping rule for the corresponding iterative algorithms are provided. A pharmacokinetics application and a bioassay example are given to show the good properties of a DKL-optimum design.  相似文献   

9.
Parameters of a finite mixture model are often estimated by the expectation–maximization (EM) algorithm where the observed data log-likelihood function is maximized. This paper proposes an alternative approach for fitting finite mixture models. Our method, called the iterative Monte Carlo classification (IMCC), is also an iterative fitting procedure. Within each iteration, it first estimates the membership probabilities for each data point, namely the conditional probability of a data point belonging to a particular mixing component given that the data point value is obtained, it then classifies each data point into a component distribution using the estimated conditional probabilities and the Monte Carlo method. It finally updates the parameters of each component distribution based on the classified data. Simulation studies were conducted to compare IMCC with some other algorithms for fitting mixture normal, and mixture t, densities.  相似文献   

10.
In this paper, we present an algorithm for clustering based on univariate kernel density estimation, named ClusterKDE. It consists of an iterative procedure that in each step a new cluster is obtained by minimizing a smooth kernel function. Although in our applications we have used the univariate Gaussian kernel, any smooth kernel function can be used. The proposed algorithm has the advantage of not requiring a priori the number of cluster. Furthermore, the ClusterKDE algorithm is very simple, easy to implement, well-defined and stops in a finite number of steps, namely, it always converges independently of the initial point. We also illustrate our findings by numerical experiments which are obtained when our algorithm is implemented in the software Matlab and applied to practical applications. The results indicate that the ClusterKDE algorithm is competitive and fast when compared with the well-known Clusterdata and K-means algorithms, used by Matlab to clustering data.  相似文献   

11.
函数型数据的稀疏性和无穷维特性使得传统聚类分析失效。针对此问题,本文在界定函数型数据概念与内涵的基础上提出了一种自适应迭代更新聚类分析。首先,基于数据参数信息实现无穷维函数空间向有限维多元空间的过渡;在此基础上,依据变量信息含量的差异构建了自适应赋权聚类统计量,并依此为函数型数据的相似性测度进行初始类别划分;进一步地,在给定阈值限制下,对所有函数的初始类别归属进行自适应迭代更新,将收敛的优化结果作为最终的类别划分。随机模拟和实证检验表明,与现有的同类函数型聚类分析相比,文中方法的分类正确率显著提高,体现了新方法的相对优良性和实际问题应用中的有效性。  相似文献   

12.
We discuss Bayesian analyses of traditional normal-mixture models for classification and discrimination. The development involves application of an iterative resampling approach to Monte Carlo inference, commonly called Gibbs sampling, and demonstrates routine application. We stress the benefits of exact analyses over traditional classification and discrimination techniques, including the ease with which such analyses may be performed in a quite general setting, with possibly several normal-mixture components having different covariance matrices, the computation of exact posterior classification probabilities for observed data and for future cases to be classified, and posterior distributions for these probabilities that allow for assessment of second-level uncertainties in classification.  相似文献   

13.
Real world applications of association rule mining have well-known problems of discovering a large number of rules, many of which are not interesting or useful for the application at hand. The algorithms for closed and maximal itemsets mining significantly reduce the volume of rules discovered and complexity associated with the task, but the implications of their use and important differences with respect to the generalization power, precision and recall when used in the classification problem have not been examined. In this paper, we present a systematic evaluation of the association rules discovered from frequent, closed and maximal itemset mining algorithms, combining common data mining and statistical interestingness measures, and outline an appropriate sequence of usage. The experiments are performed using a number of real-world datasets that represent diverse characteristics of data/items, and detailed evaluation of rule sets is provided as a whole and w.r.t individual classes. Empirical results confirm that with a proper combination of data mining and statistical analysis, a large number of non-significant, redundant and contradictive rules can be eliminated while preserving relatively high precision and recall. More importantly, the results reveal the important characteristics and differences between using frequent, closed and maximal itemsets for the classification task, and the effect of incorporating statistical/heuristic measures for optimizing such rule sets. With closed itemset mining already being a preferred choice for complexity and redundancy reduction during rule generation, this study has further confirmed that overall closed itemset based association rules are also of better quality in terms of classification precision and recall, and precision and recall on individual class examples. On the other hand maximal itemset based association rules, that are a subset of closed itemset based rules, show to be insufficient in this regard, and typically will have worse recall and generalization power. Empirical results also show the downfall of using the confidence measure at the start to generate association rules, as typically done within the association rule framework. Removing rules that occur below a certain confidence threshold, will also remove the knowledge of existence of any contradictions in the data to the relatively higher confidence rules, and thus precision can be increased by disregarding contradictive rules prior to application of confidence constraint.  相似文献   

14.
A quasi-Newton acceleration for high-dimensional optimization algorithms   总被引:1,自引:0,他引:1  
In many statistical problems, maximum likelihood estimation by an EM or MM algorithm suffers from excruciatingly slow convergence. This tendency limits the application of these algorithms to modern high-dimensional problems in data mining, genomics, and imaging. Unfortunately, most existing acceleration techniques are ill-suited to complicated models involving large numbers of parameters. The squared iterative methods (SQUAREM) recently proposed by Varadhan and Roland constitute one notable exception. This paper presents a new quasi-Newton acceleration scheme that requires only modest increments in computation per iteration and overall storage and rivals or surpasses the performance of SQUAREM on several representative test problems.  相似文献   

15.
The self-updating process (SUP) is a clustering algorithm that stands from the viewpoint of data points and simulates the process how data points move and perform self-clustering. It is an iterative process on the sample space and allows for both time-varying and time-invariant operators. By simulations and comparisons, this paper shows that SUP is particularly competitive in clustering (i) data with noise, (ii) data with a large number of clusters, and (iii) unbalanced data. When noise is present in the data, SUP is able to isolate the noise data points while performing clustering simultaneously. The property of the local updating enables SUP to handle data with a large number of clusters and data of various structures. In this paper, we showed that the blurring mean-shift is a static SUP. Therefore, our discussions on the strengths of SUP also apply to the blurring mean-shift.  相似文献   

16.
In this article, maximum likelihood estimates of an exchangeable multinomial distribution using a parametric form to model the parameters as functions of covariates are derived. The non linearity of the exchangeable multinomial distribution and the parametric model make direct application of Newton Rahpson and Fisher's scoring algorithms computationally infeasible. Instead parameter estimates are obtained as solutions to an iterative weighted least-squares algorithm. A completely monotonic parametric form is proposed for defining the marginal probabilities that results in a valid probability model.  相似文献   

17.
In this paper, we reconsider the mixture vector autoregressive model, which was proposed in the literature for modelling non‐linear time series. We complete and extend the stationarity conditions, derive a matrix formula in closed form for the autocovariance function of the process and prove a result on stable vector autoregressive moving‐average representations of mixture vector autoregressive models. For these results, we apply techniques related to a Markovian representation of vector autoregressive moving‐average processes. Furthermore, we analyse maximum likelihood estimation of model parameters by using the expectation–maximization algorithm and propose a new iterative algorithm for getting the maximum likelihood estimates. Finally, we study the model selection problem and testing procedures. Several examples, simulation experiments and an empirical application based on monthly financial returns illustrate the proposed procedures.  相似文献   

18.
In this paper, we present an auxiliary function approach to solve the overlap group Lasso problem. Our goal is to solve a more general structure with overlapping groups, which is suitable to be used in cellular automata (CA). The CA were introduced to the algorithmic composition which is based on the development and classification. At the same time, concrete algorithm and mapping from CA to music series are given. Experimental simulations show the effectiveness of our algorithms, and using the auxiliary function approach to solve Lasso with CA is a potentially useful music automatic-generation algorithm.  相似文献   

19.
日益膨胀的股票市场信息远超出人们的处理能力,股票价格变得越来越难以预测。神经网络方法可以模拟人工智能处理海量信息。提高对股票市场的预测水平。运用中国1998-2005年股票市场数据,利用梯度下降法拟合了一个BP神经网络模型,在实证过程中重点讨论预测过程中出现的分类标准、过抽样、过度训练等问题。认为正确运用神经网络方法可以提高预测分析效果,神经网络模型可以谨慎地作为一种股票投资分析方法加以运用。  相似文献   

20.
Usual fitting methods for the nested error linear regression model are known to be very sensitive to the effect of even a single outlier. Robust approaches for the unbalanced nested error model with proved robustness and efficiency properties, such as M-estimators, are typically obtained through iterative algorithms. These algorithms are often computationally intensive and require robust estimates of the same parameters to start the algorithms, but so far no robust starting values have been proposed for this model. This paper proposes computationally fast robust estimators for the variance components under an unbalanced nested error model, based on a simple robustification of the fitting-of-constants method or Henderson method III. These estimators can be used as starting values for other iterative methods. Our simulations show that they are highly robust to various types of contamination of different magnitude.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号