首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 468 毫秒
1.
Dealing with incomplete data is a pervasive problem in statistical surveys. Bayesian networks have been recently used in missing data imputation. In this research, we propose a new methodology for the multivariate imputation of missing data using discrete Bayesian networks and conditional Gaussian Bayesian networks. Results from imputing missing values in coronary artery disease data set and milk composition data set as well as a simulation study from cancer-neapolitan network are presented to demonstrate and compare the performance of three Bayesian network-based imputation methods with those of multivariate imputation by chained equations (MICE) and the classical hot-deck imputation method. To assess the effect of the structure learning algorithm on the performance of the Bayesian network-based methods, two methods called Peter-Clark algorithm and greedy search-and-score have been applied. Bayesian network-based methods are: first, the method introduced by Di Zio et al. [Bayesian networks for imputation, J. R. Stat. Soc. Ser. A 167 (2004), 309–322] in which, each missing item of a variable is imputed using the information given in the parents of that variable; second, the method of Di Zio et al. [Multivariate techniques for imputation based on Bayesian networks, Neural Netw. World 15 (2005), 303–310] which uses the information in the Markov blanket set of the variable to be imputed and finally, our new proposed method which applies the whole available knowledge of all variables of interest, consisting the Markov blanket and so the parent set, to impute a missing item. Results indicate the high quality of our new proposed method especially in the presence of high missingness percentages and more connected networks. Also the new method have shown to be more efficient than the MICE method for small sample sizes with high missing rates.  相似文献   

2.

This paper is motivated by our collaborative research and the aim is to model clinical assessments of upper limb function after stroke using 3D-position and 4D-orientation movement data. We present a new nonlinear mixed-effects scalar-on-function regression model with a Gaussian process prior focusing on the variable selection from a large number of candidates including both scalar and function variables. A novel variable selection algorithm has been developed, namely functional least angle regression. As it is essential for this algorithm, we studied the representation of functional variables with different methods and the correlation between a scalar and a group of mixed scalar and functional variables. We also propose a new stopping rule for practical use. This algorithm is efficient and accurate for both variable selection and parameter estimation even when the number of functional variables is very large and the variables are correlated. And thus the prediction provided by the algorithm is accurate. Our comprehensive simulation study showed that the method is superior to other existing variable selection methods. When the algorithm was applied to the analysis of the movement data, the use of the nonlinear random-effect model and the function variables significantly improved the prediction accuracy for the clinical assessment.

  相似文献   

3.
Neural networks are a popular machine learning tool, particularly in applications such as protein structure prediction; however, overfitting can pose an obstacle to their effective use. Due to the large number of parameters in a typical neural network, one may obtain a network fit that perfectly predicts the learning data, yet fails to generalize to other data sets. One way of reducing the size of the parmeter space is to alter the network topology so that some edges are removed; however it is often not immediately apparent which edges should be eliminated. We propose a data-adaptive method of selecting an optimal network architecture using a deletion/substitution/addition algorithm. Results of this approach to classification are presented on simulated data and the breast cancer data of Wolberg and Mangasarian [1990. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc. Nat. Acad. Sci. 87, 9193–9196].  相似文献   

4.
It is essential to reduce data latency and guarantee quality of service for modern computer networks. The emerging networking protocol, Multipath Transmission Control Protocol, can reduce data latency by transmitting data through multiple minimal paths (MPs) and ensure data integrity by the packets retransmission mechanism. The bandwidth of each edge can be considered as multi-state in computer networks because different situations, such as failures, partial failures and maintenance, exist. We evaluate network reliability for a multi-state retransmission flow network through which the data can be successfully transmitted by means of multiple MPs under the time constraint. By generating all minimal bandwidth patterns, the proposed algorithm can satisfy these requirements to calculate network reliability. An example and a practical case of the Pan-European Research and Education Network are applied to demonstrate the proposed algorithm.  相似文献   

5.
Two statistical applications for estimation and prediction of flows in traffic networks are presented. In the first, the number of route users are assumed to be independent α-shifted gamma Γ(θ, λ0) random variables denoted H(α, θ, λ0), with common λ0. As a consequence, the link, OD (origin-destination) and node flows are also H(α, θ, λ0) variables. We assume that the main source of information is plate scanning, which permits us to identify, totally or partially, the vehicle route, OD and link flows by scanning their corresponding plate numbers at an adequately selected subset of links. A Bayesian approach using conjugate families is proposed that allows us to estimate different traffic flows. In the second application, a stochastic demand dynamic traffic model to predict some traffic variables and their time evolution in real networks is presented. The Bayesian network model considers that the variables are generalized Beta variables such that when marginally transformed to standard normal become multivariate normal. The model is able to provide a point estimate, a confidence interval or the density of the variable being predicted. Finally, the models are illustrated by their application to the Nguyen Dupuis network and the Vermont-State example. The resulting traffic predictions seem to be promising for real traffic networks and can be done in real time.  相似文献   

6.
The stochastic block model (SBM) is widely used for modelling network data by assigning individuals (nodes) to communities (blocks) with the probability of an edge existing between individuals depending upon community membership. In this paper, we introduce an autoregressive extension of the SBM, based on continuous-time Markovian edge dynamics. The model is appropriate for networks evolving over time and allows for edges to turn on and off. Moreover, we allow for the movement of individuals between communities. An effective reversible-jump Markov chain Monte Carlo algorithm is introduced for sampling jointly from the posterior distribution of the community parameters and the number and location of changes in community membership. The algorithm is successfully applied to a network of mice.  相似文献   

7.
贺建风  李宏煜 《统计研究》2021,38(4):131-144
数字经济时代,社交网络作为数字化平台经济的重要载体,受到了国内外学者的广泛关注。大数据背景下,社交网络的商业应用价值巨大,但由于其网络规模空前庞大,传统的网络分析方法 因计算成本过高而不再适用。而通过网络抽样算法获取样本网络,再推断整体网络,可节约计算资源, 因此抽样算法的好坏将直接影响社交网络分析结论的准确性。现有社交网络抽样算法存在忽略网络内部拓扑结构、容易陷入局部网络、抽样效率过低等缺陷。为了弥补现有社交网络抽样算法的缺陷,本文结合大数据社交网络的社区特征,提出了一种聚类随机游走抽样算法。该方法首先使用社区聚类算法将原始网络节点进行社区划分,得到多个社区网络,然后分别对每个社区进行随机游走抽样获取样本网 络。数值模拟和案例应用的结果均表明,聚类随机游走抽样算法克服了传统网络抽样算法的缺点,能够在降低网络规模的同时较好地保留原始网络的结构特征。此外,该抽样算法还可以并行运算,有效提升抽样效率,对于大数据背景下大规模社交网络的抽样实践具有重大现实意义。  相似文献   

8.
Bayesian neural networks for nonlinear time series forecasting   总被引:3,自引:0,他引:3  
In this article, we apply Bayesian neural networks (BNNs) to time series analysis, and propose a Monte Carlo algorithm for BNN training. In addition, we go a step further in BNN model selection by putting a prior on network connections instead of hidden units as done by other authors. This allows us to treat the selection of hidden units and the selection of input variables uniformly. The BNN model is compared to a number of competitors, such as the Box-Jenkins model, bilinear model, threshold autoregressive model, and traditional neural network model, on a number of popular and challenging data sets. Numerical results show that the BNN model has achieved a consistent improvement over the competitors in forecasting future values. Insights on how to improve the generalization ability of BNNs are revealed in many respects of our implementation, such as the selection of input variables, the specification of prior distributions, and the treatment of outliers.  相似文献   

9.
NETWORK EXPLORATION VIA THE ADAPTIVE LASSO AND SCAD PENALTIES   总被引:1,自引:0,他引:1  
Graphical models are frequently used to explore networks, such as genetic networks, among a set of variables. This is usually carried out via exploring the sparsity of the precision matrix of the variables under consideration. Penalized likelihood methods are often used in such explorations. Yet, positive-definiteness constraints of precision matrices make the optimization problem challenging. We introduce non-concave penalties and the adaptive LASSO penalty to attenuate the bias problem in the network estimation. Through the local linear approximation to the non-concave penalty functions, the problem of precision matrix estimation is recast as a sequence of penalized likelihood problems with a weighted L(1) penalty and solved using the efficient algorithm of Friedman et al. (2008). Our estimation schemes are applied to two real datasets. Simulation experiments and asymptotic theory are used to justify our proposed methods.  相似文献   

10.
More recently a large amount of interest has been devoted to the use of Bayesian methods for deriving parameter estimates of the stochastic frontier analysis. Bayesian stochastic frontier analysis (BSFA) seems to be a useful method to assess the efficiency in energy sector. However, BSFA results do not expose the multiple relationships between input and output variables and energy efficiency. This study proposes a framework to make inferences about BSFA efficiencies, recognizing the underlying relationships between variables and efficiency, using Bayesian network (BN) approach. BN classifiers are proposed as a method to analyze the results obtained from BSFA.  相似文献   

11.
A Bayesian network (BN) is a probabilistic graphical model that represents a set of variables and their probabilistic dependencies. Formally, BNs are directed acyclic graphs whose nodes represent variables, and whose arcs encode the conditional dependencies among the variables. Nodes can represent any kind of variable, be it a measured parameter, a latent variable, or a hypothesis. They are not restricted to represent random variables, which form the “Bayesian” aspect of a BN. Efficient algorithms exist that perform inference and learning in BNs. BNs that model sequences of variables are called dynamic BNs. In this context, [A. Harel, R. Kenett, and F. Ruggeri, Modeling web usability diagnostics on the basis of usage statistics, in Statistical Methods in eCommerce Research, W. Jank and G. Shmueli, eds., Wiley, 2008] provide a comparison between Markov Chains and BNs in the analysis of web usability from e-commerce data. A comparison of regression models, structural equation models, and BNs is presented in Anderson et al. [R.D. Anderson, R.D. Mackoy, V.B. Thompson, and G. Harrell, A bayesian network estimation of the service–profit Chain for transport service satisfaction, Decision Sciences 35(4), (2004), pp. 665–689]. In this article we apply BNs to the analysis of customer satisfaction surveys and demonstrate the potential of the approach. In particular, BNs offer advantages in implementing models of cause and effect over other statistical techniques designed primarily for testing hypotheses. Other advantages include the ability to conduct probabilistic inference for prediction and diagnostic purposes with an output that can be intuitively understood by managers.  相似文献   

12.
A nonparametric inference algorithm developed by Davis and Geman (1983) is extended problem. The algorithm and applied to a medical prediction employs an estimation procedure for acquiring pairwise statistics among variables of a binary data set, allows for the data-driven creation of interaction terms among the variables, and employs a decision rule which asymptotically gives the minimum expected error. The inference procedure was designed for large data sets but has been extended via the method of cross-validation to encompass smaller data sets.  相似文献   

13.
Analyzing incomplete data for inferring the structure of gene regulatory networks (GRNs) is a challenging task in bioinformatic. Bayesian network can be successfully used in this field. k-nearest neighbor, singular value decomposition (SVD)-based and multiple imputation by chained equations are three fundamental imputation methods to deal with missing values. Path consistency (PC) algorithm based on conditional mutual information (PCA–CMI) is a famous algorithm for inferring GRNs. This algorithm needs the data set to be complete. However, the problem is that PCA–CMI is not a stable algorithm and when applied on permuted gene orders, different networks are obtained. We propose an order independent algorithm, PCA–CMI–OI, for inferring GRNs. After imputation of missing data, the performances of PCA–CMI and PCA–CMI–OI are compared. Results show that networks constructed from data imputed by the SVD-based method and PCA–CMI–OI algorithm outperform other imputation methods and PCA–CMI. An undirected or partially directed network is resulted by PC-based algorithms. Mutual information test (MIT) score, which can deal with discrete data, is one of the famous methods for directing the edges of resulted networks. We also propose a new score, ConMIT, which is appropriate for analyzing continuous data. Results shows that the precision of directing the edges of skeleton is improved by applying the ConMIT score.  相似文献   

14.
上市公司往往存在粉饰财务数据来美化企业经营状况的动机,这会降低财务风险预警模型预测的准确性。文章利用Benford律和Myer指数两种数据质量评估方法,构建Benford和Myer质量因子,引入BP神经网络模型,构造BM-BP神经网络财务风险预警模型;并进一步利用2000—2019年中国A股上市公司数据,评价数据质量因子对财务风险预警模型预测准确性的影响,分析新模型预测准确性的稳定性。实证分析结果显示:Benford和Myer质量因子提高了BP神经网络财务风险预警模型预测的准确性;在不同质量因子的比较结果中,包含评选指标Benford和Myer质量因子的BP神经网络财务风险预警模型具有较高的预测准确率和较低的二类误判率,稳定性良好;利用决策树算法筛选指标有效提高了新模型的预测准确性。  相似文献   

15.
We develop our previous works concerning the identification of the collection of significant factors determining some, in general, nonbinary random response variable. Such identification is important, e.g., in biological and medical studies. Our approach is to examine the quality of response variable prediction by functions in (certain part of) the factors. The prediction error estimation requires some cross-validation procedure, certain prediction algorithm, and estimation of the penalty function. Using simulated data, we demonstrate the efficiency of our method. We prove a new central limit theorem for introduced regularized estimates under some natural conditions for arrays of exchangeable random variables.  相似文献   

16.
Bayesian networks are not well-formulated for continuous variables. The majority of recent works dealing with Bayesian inference are restricted only to special types of continuous variables such as the conditional linear Gaussian model for Gaussian variables. In this context, an exact Bayesian inference algorithm for clusters of continuous variables which may be approximated by independent component analysis models is proposed. The complexity in memory space is linear and the overfitting problem is attenuated, while the inference time is still exponential. Experiments for multibiometric score fusion with quality estimates are conducted, and it is observed that the performances are satisfactory compared to some known fusion techniques.  相似文献   

17.
A method is given of choosing k-way partitions (where 2 ≤ k ≤ (number of categories of predictor variable)) in classification or decision tree analyses. The method, like that proposed by Kass, chooses the best partition on the basis of statistical significanceand uses the Bonferroni inequality to calculate the significance. Unlike Kass's algorithm, the algorithm does not favour simple partitions (low values of k) nor does it discriminate against free-type (no restriction on order of values) predictor variables with many categories. A method of adjusting the significance for the number of predictor variables and of using multiple comparisons to put an upper bound on the significance is given. Monte Carlo tests show that the algorithm gives slightly conservative tests of significance for both small and large samples and does not favour one type of predictor variable over another. The algorithm is incoroporated in a PC software program, Knowledgeseeker,which is briefly described.  相似文献   

18.
黄丹阳  张力文 《统计研究》2021,38(12):131-144
随着互联网产业的高速发展,双模符号网络已经成为一类常见的复杂网络,然而针对此 类网络的分析较少。本文在传统非符号网络局部社团理论和符号网络结构平衡理论的基础上,首次提出了双模符号网络下的局部社团理论。这一理论不仅考虑了符号网络中共同邻居的信息,还引入了共同邻居间存在的连接。进一步地,本文推导出符号网络中基于局部社团信息的加权平衡回路增益指数,该指标可以表示双模符号网络中用户节点和产品节点间的符号关系。为了将该指标更好地应用于双模 符号网络链路预测问题,本文提出了加权平衡回路增益分类器算法。实验结果表明,相比其他经典链路预测算法,新算法具有更好的预测能力。  相似文献   

19.
In this paper, a new hybrid model of vector autoregressive moving average (VARMA) models and Bayesian networks is proposed to improve the forecasting performance of multivariate time series. In the proposed model, the VARMA model, which is a popular linear model in time series forecasting, is specified to capture the linear characteristics. Then the errors of the VARMA model are clustered into some trends by K-means algorithm with Krzanowski–Lai cluster validity index determining the number of trends, and a Bayesian network is built to learn the relationship between the data and the trend of its corresponding VARMA error. Finally, the estimated values of the VARMA model are compensated by the probabilities of their corresponding VARMA errors belonging to each trend, which are obtained from the Bayesian network. Compared with VARMA models, the experimental results with a simulation study and two multivariate real-world data sets indicate that the proposed model can effectively improve the prediction performance.  相似文献   

20.
Many tree algorithms have been developed for regression problems. Although they are regarded as good algorithms, most of them suffer from loss of prediction accuracy when there are many irrelevant variables and the number of predictors exceeds the number of observations. We propose the multistep regression tree with adaptive variable selection to handle this problem. The variable selection step and the fitting step comprise the multistep method.

The multistep generalized unbiased interaction detection and estimation (GUIDE) with adaptive forward selection (fg) algorithm, as a variable selection tool, performs better than some of the well-known variable selection algorithms such as efficacy adaptive regression tube hunting (EARTH), FSR (false selection rate), LSCV (least squares cross-validation), and LASSO (least absolute shrinkage and selection operator) for the regression problem. The results based on simulation study show that fg outperforms other algorithms in terms of selection result and computation time. It generally selects the important variables correctly with relatively few irrelevant variables, which gives good prediction accuracy with less computation time.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号