期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Analysing exponential random graph (p-star) models with missing data using Bayesian data augmentation

Johan H. Koskinen Garry L. Robins Philippa E. Pattison 《Statistical Methodology》2010,7(3):366-384

Missing data are often problematic in social network analysis since what is missing may potentially alter the conclusions about what we have observed as tie-variables need to be interpreted in relation to their local neighbourhood and the global structure. Some ad hoc methods for dealing with missing data in social networks have been proposed but here we consider a model-based approach. We discuss various aspects of fitting exponential family random graph (or p-star) models (ERGMs) to networks with missing data and present a Bayesian data augmentation algorithm for the purpose of estimation. This involves drawing from the full conditional posterior distribution of the parameters, something which is made possible by recently developed algorithms. With ERGMs already having complicated interdependencies, it is particularly important to provide inference that adequately describes the uncertainty, something that the Bayesian approach provides. To the extent that we wish to explore the missing parts of the network, the posterior predictive distributions, immediately available at the termination of the algorithm, are at our disposal, which allows us to explore the distribution of what is missing unconditionally on any particular parameter values. Some important features of treating missing data and of the implementation of the algorithm are illustrated using a well-known collaboration network and a variety of missing data scenarios. 相似文献

2.

The Landscape of Causal Inference: Perspective From Citation Network Analysis

Weihua An Ying Ding 《The American statistician》2018,72(3):265-277

相似文献

3.

Spatial autoregression with repeated measurements for social networks

Danyang Huang Hansheng Wang 《统计学通讯:理论与方法》2018,47(15):3715-3727

Spatial autoregressive model (SAR) is found useful to estimate the social autocorrelation in social networks recently. However, the rapid development of information technology enables researchers to collect repeated measurements for a given social network. The SAR model for social networks is designed for cross-sectional data and is thus not feasible. In this article, we propose a new model which is referred to as SAR with random effects (SARRE) for social networks. It could be considered as a natural combination of two types of models, the SAR model for social networks and a particular type of mixed model. To solve the problem of high computational complexity in large social networks, a pseudo-maximum likelihood estimate (PMLE) is proposed. The asymptotic properties of the estimate are established. We demonstrate the performance of the proposed method by extensive numerical studies and a real data example. 相似文献

4.

Identification and Efficient Estimation of Simultaneous Equations Network Models

Xiaodong Liu 《商业与经济统计学杂志》2014,32(4):516-536

This article considers identification and estimation of social network models in a system of simultaneous equations. We show that, with or without row-normalization of the social adjacency matrix, the network model has different equilibrium implications, needs different identification conditions, and requires different estimation strategies. When the adjacency matrix is not row-normalized, the variation in the Bonacich centrality across nodes in a network can be used as an IV to identify social interaction effects and improve estimation efficiency. The number of such IVs depends on the number of networks. When there are many networks in the data, the proposed estimators may have an asymptotic bias due to the presence of many IVs. We propose a bias-correction procedure for the many-instrument bias. Simulation experiments show that the bias-corrected estimators perform well in finite samples. We also provide an empirical example to illustrate the proposed estimation procedure. 相似文献

5.

大数据背景下基于社交网络的聚类随机游走抽样算法研究

贺建风李宏煜《统计研究》2021,38(4):131-144

数字经济时代,社交网络作为数字化平台经济的重要载体,受到了国内外学者的广泛关注。大数据背景下,社交网络的商业应用价值巨大,但由于其网络规模空前庞大,传统的网络分析方法因计算成本过高而不再适用。而通过网络抽样算法获取样本网络,再推断整体网络,可节约计算资源, 因此抽样算法的好坏将直接影响社交网络分析结论的准确性。现有社交网络抽样算法存在忽略网络内部拓扑结构、容易陷入局部网络、抽样效率过低等缺陷。为了弥补现有社交网络抽样算法的缺陷,本文结合大数据社交网络的社区特征,提出了一种聚类随机游走抽样算法。该方法首先使用社区聚类算法将原始网络节点进行社区划分,得到多个社区网络,然后分别对每个社区进行随机游走抽样获取样本网络。数值模拟和案例应用的结果均表明,聚类随机游走抽样算法克服了传统网络抽样算法的缺点,能够在降低网络规模的同时较好地保留原始网络的结构特征。此外,该抽样算法还可以并行运算,有效提升抽样效率,对于大数据背景下大规模社交网络的抽样实践具有重大现实意义。相似文献

6.

Affiliation discrete weighted networks with an increasing degree sequence

Jing Luo Shan Duan 《统计学通讯:理论与方法》2018,47(24):6079-6094

Affiliation network is one kind of two-mode social network with two different sets of nodes (namely, a set of actors and a set of social events) and edges representing the affiliation of the actors with the social events. The connections in many affiliation networks are only binary weighted between actors and social events that can not reveal the affiliation strength relationship. Although a number of statistical models are proposed to analyze affiliation binary weighted networks, the asymptotic behaviors of the maximum likelihood estimator (MLE) are still unknown or have not been properly explored in affiliation weighted networks. In this paper, we study an affiliation model with the degree sequence as the exclusively natural sufficient statistic in the exponential family distributions. We derive the consistency and asymptotic normality of the maximum likelihood estimator in affiliation finite discrete weighted networks when the numbers of actors and events both go to infinity. Simulation studies and a real data example demonstrate our theoretical results. 相似文献

7.

A statistical social network model for consumption data in trophic food webs

《Statistical Methodology》2014

We adapt existing statistical modeling techniques for social networks to study consumption data observed in trophic food webs. These data describe the feeding volume (non-negative) among organisms grouped into nodes, called trophic species, that form the food web. Model complexity arises due to the extensive amount of zeros in the data, as each node in the web is predator/prey to only a small number of other trophic species. Many of the zeros are regarded as structural (non-random) in the context of feeding behavior. The presence of basal prey and top predator nodes (those who never consume and those who are never consumed, with probability 1) creates additional complexity to the statistical modeling. We develop a special statistical social network model to account for such network features. The model is applied to two empirical food webs; focus is on the web for which the population size of seals is of concern to various commercial fisheries. 相似文献

8.

Affiliation networks with an increasing degree sequence

Yong Zhang Xiaodi Qian Hong Qin Ting Yan 《统计学通讯:理论与方法》2017,46(22):11163-11180

Affiliation network is one kind of two-mode social network with two different sets of nodes (namely, a set of actors and a set of social events) and edges representing the affiliation of the actors with the social events. Although a number of statistical models are proposed to analyze affiliation networks, the asymptotic behaviors of the estimator are still unknown or have not been properly explored. In this article, we study an affiliation model with the degree sequence as the exclusively natural sufficient statistic in the exponential family distributions. We establish the uniform consistency and asymptotic normality of the maximum likelihood estimator when the numbers of actors and events both go to infinity. Simulation studies and a real data example demonstrate our theoretical results. 相似文献

9.

A mixture model for random graphs 总被引：1，自引：0，他引：1

J.-J. Daudin F. Picard S. Robin 《Statistics and Computing》2008,18(2):173-183

The Erdös–Rényi model of a network is simple and possesses many explicit expressions for average and asymptotic properties, but it does not fit well to real-world networks. The vertices of those networks are often structured in unknown classes (functionally related proteins or social communities) with different connectivity properties. The stochastic block structures model was proposed for this purpose in the context of social sciences, using a Bayesian approach. We consider the same model in a frequentest statistical framework. We give the degree distribution and the clustering coefficient associated with this model, a variational method to estimate its parameters and a model selection criterion to select the number of classes. This estimation procedure allows us to deal with large networks containing thousands of vertices. The method is used to uncover the modular structure of a network of enzymatic reactions. 相似文献

10.

Predicting profitability using advice branch bank networks

Avranil Sarkar Stephen E. Fienberg David Krackhardt 《Statistical Methodology》2010,7(3):429-444

The literature on social networks and their analysis has undergone explosive growth in the past decade. Network models have been used to study structures as diverse as the interaction of monks in a monastery, the links across the World Wide Web, and the structure of organizations. In much of this literature the network itself is viewed as the object of interest, and models are used to elucidate its structure. In this paper, we adopt a different perspective and we explore the role of network structure of organizations for prediction purposes. In particular, we work with data gathered on the advice-seeking habits of employees in 52 branches of a major North American bank corporation. We then use the network structure within each branch discovered via various exploratory analyses to predict the profitability of the individual branches. 相似文献

11.

Bias–variance and breadth–depth tradeoffs in respondent-driven sampling

Sergiy Nesterko Joseph Blitzstein 《Journal of Statistical Computation and Simulation》2015,85(1):89-102

Respondent-driven sampling (RDS) is a link-tracing network sampling strategy for collecting data from hard-to-reach populations, such as injection drug users or individuals at high risk of being infected with HIV. The mechanism is to find initial participants (seeds), and give each of them a fixed number of coupons allowing them to recruit people they know from the population of interest, with a mutual financial incentive. The new participants are again given coupons and the process repeats. Currently, the standard RDS estimator used in practice is known as the Volz–Heckathorn (VH) estimator. It relies on strong assumptions about the underlying social network and the RDS process. Via simulation, we study the relative performance of the plain mean and VH estimators when assumptions of the latter are not satisfied, under different network types (including homophily and rich-get-richer networks), participant referral patterns, and varying number of coupons. The analysis demonstrates that the plain mean outperforms the VH estimator in many but not all of the simulated settings, including homophily networks. Also, we highlight the implications of multiple recruitment and varying referral patterns on the depth of RDS process. We develop interactive visualizations of the findings and RDS process to further build insight into the various factors contributing to the performance of current RDS estimation techniques. 相似文献

12.

Bayesian inference for dynamic social network data

Johan H. Koskinen Tom A.B. Snijders 《Journal of statistical planning and inference》2007

We consider a continuous-time model for the evolution of social networks. A social network is here conceived as a (di-) graph on a set of vertices, representing actors, and the changes of interest are creation and disappearance over time of (arcs) edges in the graph. Hence we model a collection of random edge indicators that are not, in general, independent. We explicitly model the interdependencies between edge indicators that arise from interaction between social entities. A Markov chain is defined in terms of an embedded chain with holding times and transition probabilities. Data are observed at fixed points in time and hence we are not able to observe the embedded chain directly. Introducing a prior distribution for the parameters we may implement an MCMC algorithm for exploring the posterior distribution of the parameters by simulating the evolution of the embedded process between observations. 相似文献

13.

Frailty effects in networks: comparison and identification of individual heterogeneity versus preferential attachment in evolving networks

de Blasio BF Seierstad TG Aalen OO 《Journal of the Royal Statistical Society. Series C, Applied statistics》2011,60(2):239-259

Preferential attachment is a proportionate growth process in networks, where nodes receive new links in proportion to their current degree. Preferential attachment is a popular generative mechanism to explain the widespread observation of power-law-distributed networks. An alternative explanation for the phenomenon is a randomly grown network with large individual variation in growth rates among the nodes (frailty). We derive analytically the distribution of individual rates, which will reproduce the connectivity distribution that is obtained from a general preferential attachment process (Yule process), and the structural differences between the two types of graphs are examined by simulations. We present a statistical test to distinguish the two generative mechanisms from each other and we apply the test to both simulated data and two real data sets of scientific citation and sexual partner networks. The findings from the latter analyses argue for frailty effects as an important mechanism underlying the dynamics of complex networks. 相似文献

14.

城市群经济网络与经济增长——基于大数据与网络分析方法的研究

种照辉覃成林叶信岳《统计研究》2018,35(1):13-21

借助大数据时代下获得的海量数据,本文分析了长三角城市群的经济网络特征,重点研究了城市群经济网络的增长效应。首先,构建了长三角城市群的人口流动网络、企业组织网络与电子商务网络,对其各自的网络结构特征进行了对比。其次,将网络分析方法与空间计量模型结合起来,使用扩展的J检验方法对不同网络结构下的模型设定方法进行了识别,考察了经济网络带来的溢出效应对于城市群经济增长的影响。分析结果显示,三种经济网络下长三角城市群均呈现出了“中心-外围”的网络结构,其中上海、杭州、苏州、南京及无锡位于城市群经济网络的核心圈层。对网络结构的模型识别结果显示,中心城市在长三角城市群经济网络的溢出效应中扮演着重要角色。具体而言,在人口流动网络下,资本、政府行为存在显著为负的网络溢出效应;在企业组织网络下,人口规模、对外开放呈现出显著为正的网络溢出效应;在电子商务网络下,政府行为存在显著为负的网络溢出效应,对外开放呈现出显著为正的网络溢出效应。相似文献

15.

Model-based clustering for social networks 总被引：5，自引：0，他引：5

Mark S. Handcock Adrian E. Raftery Jeremy M. Tantrum 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2007,170(2):301-354

Summary. Network models are widely used to represent relations between interacting units or actors. Network data often exhibit transitivity, meaning that two actors that have ties to a third actor are more likely to be tied than actors that do not, homophily by attributes of the actors or dyads, and clustering. Interest often focuses on finding clusters of actors or ties, and the number of groups in the data is typically unknown. We propose a new model, the latent position cluster model , under which the probability of a tie between two actors depends on the distance between them in an unobserved Euclidean 'social space', and the actors' locations in the latent social space arise from a mixture of distributions, each corresponding to a cluster. We propose two estimation methods: a two-stage maximum likelihood method and a fully Bayesian method that uses Markov chain Monte Carlo sampling. The former is quicker and simpler, but the latter performs better. We also propose a Bayesian way of determining the number of clusters that are present by using approximate conditional Bayes factors. Our model represents transitivity, homophily by attributes and clustering simultaneously and does not require the number of clusters to be known. The model makes it easy to simulate realistic networks with clustering, which are potentially useful as inputs to models of more complex systems of which the network is part, such as epidemic models of infectious disease. We apply the model to two networks of social relations. A free software package in the R statistical language, latentnet, is available to analyse data by using the model. 相似文献

16.

Gender composition of friendship networks and age at first intercourse: a life-course data analysis

Francesco?C.?Billari Email author Letizia?Mencarini 《Statistical Methods and Applications》2004,12(3):377-390

We investigate the impact of some characteristics of friendship networks on the timing of the first sexual intercourse. We assume that the gender-segregated composition of such networks explains part of the particularly late age at first intercourse in Italy. We use new data from a survey on sexual behavior and reproductive health of Italian first and second-year university students. The survey has been carried out in 15 different universities in 2000-2001 and it includes retrospective data on age at first intercourse, as well as retrospectively-collected time-varying measures for the gender composition of the friendship network at different ages, for almost 5,000 cases. After having described the data as transition frequencies, we use a Cox proportional hazards model with time-varying covariates. Results are in accordance with the hypothesis that having friendship networks that include more members of the other gender and talking about sex with friends increases the relative risk of first sexual intercourse. 相似文献

17.

A mixture of experts latent position cluster model for social network data

Isobel Claire Gormley Thomas Brendan Murphy 《Statistical Methodology》2010,7(3):385-405

Social network data represent the interactions between a group of social actors. Interactions between colleagues and friendship networks are typical examples of such data.The latent space model for social network data locates each actor in a network in a latent (social) space and models the probability of an interaction between two actors as a function of their locations. The latent position cluster model extends the latent space model to deal with network data in which clusters of actors exist — actor locations are drawn from a finite mixture model, each component of which represents a cluster of actors.A mixture of experts model builds on the structure of a mixture model by taking account of both observations and associated covariates when modeling a heterogeneous population. Herein, a mixture of experts extension of the latent position cluster model is developed. The mixture of experts framework allows covariates to enter the latent position cluster model in a number of ways, yielding different model interpretations.Estimates of the model parameters are derived in a Bayesian framework using a Markov Chain Monte Carlo algorithm. The algorithm is generally computationally expensive — surrogate proposal distributions which shadow the target distributions are derived, reducing the computational burden.The methodology is demonstrated through an illustrative example detailing relationships between a group of lawyers in the USA. 相似文献

18.

Phase I monitoring of social network with baseline periods using poisson regression

Ebrahim Mazrae Farahani 《统计学通讯:理论与方法》2019,48(2):311-331

Social network analysis is an important analytic tool to forecast social trends by modeling and monitoring the interactions between network members. This paper proposes an extension of a statistical process control method to monitor social networks by determining the baseline periods when the reference network set is collected. We consider probability density profile (PDP) to identify baseline periods using Poisson regression to model the communications between members. Also, Hotelling T² and likelihood ratio test (LRT) statistics are developed to monitor the network in Phase I. The results based on signal probability indicate a satisfactory performance for the proposed method. 相似文献

19.

Network reliability for multipath TCP networks with a retransmission mechanism under the time constraint

Yi-Kuei Lin Chih-Li Pan Louis Cheng-Lu Yeng 《Journal of Statistical Computation and Simulation》2018,88(12):2273-2286

It is essential to reduce data latency and guarantee quality of service for modern computer networks. The emerging networking protocol, Multipath Transmission Control Protocol, can reduce data latency by transmitting data through multiple minimal paths (MPs) and ensure data integrity by the packets retransmission mechanism. The bandwidth of each edge can be considered as multi-state in computer networks because different situations, such as failures, partial failures and maintenance, exist. We evaluate network reliability for a multi-state retransmission flow network through which the data can be successfully transmitted by means of multiple MPs under the time constraint. By generating all minimal bandwidth patterns, the proposed algorithm can satisfy these requirements to calculate network reliability. An example and a practical case of the Pan-European Research and Education Network are applied to demonstrate the proposed algorithm. 相似文献

20.

Inferring gene regulatory networks by an order independent algorithm using incomplete data sets

Rosa Aghdam Parisa Niloofar Changiz Eslahchi 《Journal of applied statistics》2016,43(5):893-913

Analyzing incomplete data for inferring the structure of gene regulatory networks (GRNs) is a challenging task in bioinformatic. Bayesian network can be successfully used in this field. k-nearest neighbor, singular value decomposition (SVD)-based and multiple imputation by chained equations are three fundamental imputation methods to deal with missing values. Path consistency (PC) algorithm based on conditional mutual information (PCA–CMI) is a famous algorithm for inferring GRNs. This algorithm needs the data set to be complete. However, the problem is that PCA–CMI is not a stable algorithm and when applied on permuted gene orders, different networks are obtained. We propose an order independent algorithm, PCA–CMI–OI, for inferring GRNs. After imputation of missing data, the performances of PCA–CMI and PCA–CMI–OI are compared. Results show that networks constructed from data imputed by the SVD-based method and PCA–CMI–OI algorithm outperform other imputation methods and PCA–CMI. An undirected or partially directed network is resulted by PC-based algorithms. Mutual information test (MIT) score, which can deal with discrete data, is one of the famous methods for directing the edges of resulted networks. We also propose a new score, ConMIT, which is appropriate for analyzing continuous data. Results shows that the precision of directing the edges of skeleton is improved by applying the ConMIT score. 相似文献