共查询到20条相似文献,搜索用时 0 毫秒
1.
In this paper, a new hybrid model of vector autoregressive moving average (VARMA) models and Bayesian networks is proposed to improve the forecasting performance of multivariate time series. In the proposed model, the VARMA model, which is a popular linear model in time series forecasting, is specified to capture the linear characteristics. Then the errors of the VARMA model are clustered into some trends by K-means algorithm with Krzanowski–Lai cluster validity index determining the number of trends, and a Bayesian network is built to learn the relationship between the data and the trend of its corresponding VARMA error. Finally, the estimated values of the VARMA model are compensated by the probabilities of their corresponding VARMA errors belonging to each trend, which are obtained from the Bayesian network. Compared with VARMA models, the experimental results with a simulation study and two multivariate real-world data sets indicate that the proposed model can effectively improve the prediction performance. 相似文献
2.
Pulak Ghosh Paramjit Gill Saman Muthukumarana Tim Swartz 《Australian & New Zealand Journal of Statistics》2010,52(3):289-302
This paper considers the use of Dirichlet process prior distributions in the statistical analysis of network data. Dirichlet process prior distributions have the advantages of avoiding the parametric specifications for distributions, which are rarely known, and of facilitating a clustering effect, which is often applicable to network nodes. The approach is highlighted for two network models and is conveniently implemented using WinBUGS software. 相似文献
3.
基于模糊聚类的省域物流发展评价研究 总被引:1,自引:1,他引:1
采用模糊聚类方法对省域物流发展综合实力进行评价,利用建立的省域物流发展评价指标体系,运用模糊聚类方法对黑龙江省域物流发展进行聚类分析,提出黑龙江省省域物流发展可划分为哈大齐、哈牡绥、佳木斯城市群三个物流区域,根据其不同的特点制定相应的发展策略。 相似文献
4.
Number of connected devices is steadily increasing and this trend is expected to continue in the near future. Connected devices continuously generate data streams and the data streams may often be high dimensional and contain concept drift. Clustering is one of the most suitable methods for real-time data stream processing, since clustering can be applied with less prior information about the data. Also, data embedding makes the visualization of high dimensional data possible and may simplify clustering process. There exist several data stream clustering algorithms in the literature; however, no data stream embedding method exists. Uniform Manifold Approximation and Projection (UMAP) is a data embedding algorithm that is suitable to be applied on stationary (stable) data streams, though it cannot adapt concept drift. In this study, we describe a novel method EmCStream, to apply UMAP on evolving (nonstationary) data streams, to detect and adapt concept drift and to cluster embedded data instances using a distance or partitioning-based clustering algorithm. We have evaluated EmCStream against the state-of-the-art stream clustering algorithms using both synthetic and real data streams containing concept drift. EmCStream outperforms DenStream and CluStream, in terms of clustering quality, on both synthetic and real evolving data streams. Datasets and code of this study are available online at https://gitlab.com/alaettinzubaroglu/emcstream . 相似文献
5.
6.
Ali. S. Gargoum 《统计学通讯:理论与方法》2014,43(19):4179-4186
The purpose of this article is to provide validation for the approximate algebraic propagation algorithms to accommodate non-Gaussian dynamic processes. These algorithms have been developed to carry out Bayesian analysis based on conjugate forms and presented with detailed examples of response distributions such as Poisson and Lognormal. The validity of the approximation algorithms can be checked by introducing a metric (Hellinger divergence measure) over the distribution of the states (parameters) and use it to judge the approximation. Theoretical bounds for the efficacy of such procedure are discussed. 相似文献
7.
A variety of artificial neural networks are reviewed, including feed‐forward networks, recurrent networks, associative memories such as the Hopfield network, and the self‐organizing map. Their architectures are described, as are methods that have been developed for training them. Particular emphasis is placed on links with statistical activities, especially regression, classification, and clustering, in contexts such as graphical models and latent structure models. In terms of the training procedures, attention is drawn to the implicit or explicit implementation of statistical methodological approaches such as likelihood and Bayesian methods. Copyright © 2009 John Wiley & Sons, Inc. This article is categorized under:
- Statistical Learning and Exploratory Methods of the Data Sciences > Neural Networks
8.
K. Fernández-Aguirre M. I. Landaluce-Calvo A. Martín-Arroyuelos J. I. Modroño-Herrán 《Journal of applied statistics》2011,38(11):2661-2679
For a higher education public institution, young in relative terms, featuring local competition with another private and both long-established and reputed one, it is of great importance to become a reference university institution to be better known and felt with identification in the society it belongs to and ultimately to reach a good position within the European Higher Education Area. These considerations have made the university governors setting up the objective of achieving an adequate management of the university institutional brand focused on its logo and on image promotion, leading to the establishment of a university shop as it is considered a highly adequate instrument for such promotion. In this context, an on-line survey is launched on three different kinds of members of the institution, resulting in a large data sample. Different kinds of variables are analysed through appropriate exploratory multivariate techniques (symmetrical methods) and regression-related techniques (non-symmetrical methods). An advocacy for such combination is given as a conclusion. The application of statistical techniques of data and text mining provides us with empirical insights about the institution members’ perceptions and helps us to extract some facts valuable to establish policies that would improve the corporate identity and the success of the corporate shop. 相似文献
9.
The Box–Jenkins methodology for modeling and forecasting from univariate time series models has long been considered a standard to which other forecasting techniques have been compared. To a Bayesian statistician, however, the method lacks an important facet—a provision for modeling uncertainty about parameter estimates. We present a technique called sampling the future for including this feature in both the estimation and forecasting stages. Although it is relatively easy to use Bayesian methods to estimate the parameters in an autoregressive integrated moving average (ARIMA) model, there are severe difficulties in producing forecasts from such a model. The multiperiod predictive density does not have a convenient closed form, so approximations are needed. In this article, exact Bayesian forecasting is approximated by simulating the joint predictive distribution. First, parameter sets are randomly generated from the joint posterior distribution. These are then used to simulate future paths of the time series. This bundle of many possible realizations is used to project the future in several ways. Highest probability forecast regions are formed and portrayed with computer graphics. The predictive density's shape is explored. Finally, we discuss a method that allows the analyst to subjectively modify the posterior distribution on the parameters and produce alternate forecasts. 相似文献
10.
M.Bolstad William 《统计学通讯:理论与方法》2013,42(12):4179-4204
The dynamic generalized linear model and the dynamic discount Bayesian model have been used to describe processes involving time-varying parameters. This paper develops an estimation algorithm for the multiprocess extension of these model. These algorithms have the same characteristics as Harrison-Steven forecasting, namely insensitivity to outliers and quick reaction to real change in the parameters. 相似文献
11.
12.
Sow farm management requires appropriate methods to forecast the sow population structure evolution. We describe two models for such purpose. The first is a semi-Markov process model, used for long-term predictions and strategic management. The second is a state-space model for continuous proportions, used for short-term predictions and operational management. 相似文献
13.
《统计学通讯:理论与方法》2012,41(16-17):3179-3197
Text clustering is an unsupervised process of classifying texts and words into different groups. In literature, many algorithms use a bag of words model to represent texts and classify contents. The bag of words model assumes that word order has no signicance. The aim of this article is to propose a new method of text clustering, considering links between terms and documents. We use centrality measures to assess word/text importance in a corpus and to sequentially classify documents. 相似文献
14.
Ufuk Yolcu Erol Egrioglu Vedide R. Uslu 《Journal of Statistical Computation and Simulation》2013,83(4):599-612
Artificial intelligence procedures such as artificial neural networks (ANNs), genetic algorithms and particle swarm optimization and other procedures such as fuzzy clustering have been successfully used in the various stages of different fuzzy time-series forecasting approaches. Fuzzy clustering, genetic algorithm and particle swarm optimization are generally used in the fuzzification stage, and this simplifies the applicability of this stage and makes the fuzzy time-series approach more systematic. ANNs have also been applied successfully in the fuzzy relationship determination stage. In this study, we propose a new hybrid fuzzy time-series approach in which fuzzy c-means clustering procedure is employed in the fuzzification stage and feed-forward neural networks are used in the fuzzy relationship determination stage. This study also includes an empirical analysis pertaining to the forecasting of Index 100 for the stocks and bonds exchange market of Istanbul. 相似文献
15.
Chronic kidney disease is a progressive loss of renal function which results in the inability of the kidneys to properly filter waste from the blood. Renal function is usually estimated by the glomerular filtration rate (eGFR), which decreases with the worsening of the disease. Bayesian longitudinal models with covariates, random effects, serial correlation and measurement error are discussed to analyse the progression of eGFR in first transplanted children taken from a study in València, Spain. 相似文献
16.
George Leckie 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2009,172(3):537-554
Summary. Traditional studies of school differences in educational achievement use multilevel modelling techniques to take into account the nesting of pupils within schools. However, educational data are known to have more complex non-hierarchical structures. The potential importance of such structures is apparent when considering the effect of pupil mobility during secondary schooling on educational achievement. Movements of pupils between schools suggest that we should model pupils as belonging to the series of schools that are attended and not just their final school. Since these school moves are strongly linked to residential moves, it is important to explore additionally whether achievement is also affected by the history of neighbourhoods that are lived in. Using the national pupil database, this paper combines multiple membership and cross-classified multilevel models to explore simultaneously the relationships between secondary school, primary school, neighbourhood and educational achievement. The results show a negative relationship between pupil mobility and achievement, the strength of which depends greatly on the nature and timing of these moves. Accounting for pupil mobility also reveals that schools and neighbourhoods are more important than shown by previous analysis. A strong primary school effect appears to last long after a child has left that phase of schooling. The additional effect of neighbourhoods, in contrast, is small. Crucially, the rank order of school effects across all types of pupil is sensitive to whether we account for the complexity of the multilevel data structure. 相似文献
17.
Coppi et al. [7] applied Yang and Wu's [20] idea to propose a possibilistic k-means (PkM) clustering algorithm for LR-type fuzzy numbers. The memberships in the objective function of PkM no longer need to satisfy the constraint in fuzzy k-means that of a data point across classes sum to one. However, the clustering performance of PkM depends on the initializations and weighting exponent. In this paper, we propose a robust clustering method based on a self-updating procedure. The proposed algorithm not only solves the initialization problems but also obtains a good clustering result. Several numerical examples also demonstrate the effectiveness and accuracy of the proposed clustering method, especially the robustness to initial values and noise. Finally, three real fuzzy data sets are used to illustrate the superiority of this proposed algorithm. 相似文献
18.
Steven B. Caudill 《Journal of applied statistics》2016,43(7):1253-1261
Hedonic price models are commonly used in the study of markets for various goods, most notably those for wine, art, and jewelry. These models were developed to estimate implicit prices of product attributes within a given product class, where in the case of some goods, such as wine, substantial product differentiation exists. To address this issue, recent research on wine prices employs local polynomial regression clustering (LPRC) for estimating regression models under class uncertainty. This study demonstrates that a superior empirical approach – estimation of a mixture model – is applicable to a hedonic model of wine prices, provided only that the dependent variable in the model is rescaled. The present study also catalogues several of the advantages over LPRC modeling of estimating mixture models. 相似文献
19.
The present investigation was undertaken to study the gillnet catch efficiency of sardines in the coastal waters of Sri Lanka using commercial catch and effort data. Commercial catch and effort data of small mesh gillnet fishery were collected in five fisheries districts during the period May 1999–August 2002. Gillnet catch efficiency of sardines was investigated by developing catch rates predictive models using data on commercial fisheries and environmental variables. Three statistical techniques [multiple linear regression, generalized additive model and regression tree model (RTM)] were employed to predict the catch rates of trenched sardine Amblygaster sirm (key target species of small mesh gillnet fishery) and other sardines (Sardinella longiceps, S. gibbosa, S. albella and S. sindensis). The data collection programme was conducted for another six months and the models were tested on new data. RTMs were found to be the strongest in terms of reliability and accuracy of the predictions. The two operational characteristics used here for model formulation (i.e. depth of fishing and number of gillnet pieces used per fishing operation) were more useful as predictor variables than the environmental variables. The study revealed a rapid tendency of increasing the catch rates of A. sirm with increased sea depth up to around 32 m. 相似文献
20.
We derive forecasts for Markov switching models that are optimal in the mean square forecast error (MSFE) sense by means of weighting observations. We provide analytic expressions of the weights conditional on the Markov states and conditional on state probabilities. This allows us to study the effect of uncertainty around states on forecasts. It emerges that, even in large samples, forecasting performance increases substantially when the construction of optimal weights takes uncertainty around states into account. Performance of the optimal weights is shown through simulations and an application to U.S. GNP, where using optimal weights leads to significant reductions in MSFE. Supplementary materials for this article are available online. 相似文献