首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This article considers identification and estimation of social network models in a system of simultaneous equations. We show that, with or without row-normalization of the social adjacency matrix, the network model has different equilibrium implications, needs different identification conditions, and requires different estimation strategies. When the adjacency matrix is not row-normalized, the variation in the Bonacich centrality across nodes in a network can be used as an IV to identify social interaction effects and improve estimation efficiency. The number of such IVs depends on the number of networks. When there are many networks in the data, the proposed estimators may have an asymptotic bias due to the presence of many IVs. We propose a bias-correction procedure for the many-instrument bias. Simulation experiments show that the bias-corrected estimators perform well in finite samples. We also provide an empirical example to illustrate the proposed estimation procedure.  相似文献   

2.
A mixture model for random graphs   总被引:1,自引:0,他引:1  
The Erdös–Rényi model of a network is simple and possesses many explicit expressions for average and asymptotic properties, but it does not fit well to real-world networks. The vertices of those networks are often structured in unknown classes (functionally related proteins or social communities) with different connectivity properties. The stochastic block structures model was proposed for this purpose in the context of social sciences, using a Bayesian approach. We consider the same model in a frequentest statistical framework. We give the degree distribution and the clustering coefficient associated with this model, a variational method to estimate its parameters and a model selection criterion to select the number of classes. This estimation procedure allows us to deal with large networks containing thousands of vertices. The method is used to uncover the modular structure of a network of enzymatic reactions.  相似文献   

3.
贺建风  李宏煜 《统计研究》2021,38(4):131-144
数字经济时代,社交网络作为数字化平台经济的重要载体,受到了国内外学者的广泛关注。大数据背景下,社交网络的商业应用价值巨大,但由于其网络规模空前庞大,传统的网络分析方法 因计算成本过高而不再适用。而通过网络抽样算法获取样本网络,再推断整体网络,可节约计算资源, 因此抽样算法的好坏将直接影响社交网络分析结论的准确性。现有社交网络抽样算法存在忽略网络内部拓扑结构、容易陷入局部网络、抽样效率过低等缺陷。为了弥补现有社交网络抽样算法的缺陷,本文结合大数据社交网络的社区特征,提出了一种聚类随机游走抽样算法。该方法首先使用社区聚类算法将原始网络节点进行社区划分,得到多个社区网络,然后分别对每个社区进行随机游走抽样获取样本网 络。数值模拟和案例应用的结果均表明,聚类随机游走抽样算法克服了传统网络抽样算法的缺点,能够在降低网络规模的同时较好地保留原始网络的结构特征。此外,该抽样算法还可以并行运算,有效提升抽样效率,对于大数据背景下大规模社交网络的抽样实践具有重大现实意义。  相似文献   

4.
Social network analysis is an important analytic tool to forecast social trends by modeling and monitoring the interactions between network members. This paper proposes an extension of a statistical process control method to monitor social networks by determining the baseline periods when the reference network set is collected. We consider probability density profile (PDP) to identify baseline periods using Poisson regression to model the communications between members. Also, Hotelling T2 and likelihood ratio test (LRT) statistics are developed to monitor the network in Phase I. The results based on signal probability indicate a satisfactory performance for the proposed method.  相似文献   

5.
Affiliation network is one kind of two-mode social network with two different sets of nodes (namely, a set of actors and a set of social events) and edges representing the affiliation of the actors with the social events. The connections in many affiliation networks are only binary weighted between actors and social events that can not reveal the affiliation strength relationship. Although a number of statistical models are proposed to analyze affiliation binary weighted networks, the asymptotic behaviors of the maximum likelihood estimator (MLE) are still unknown or have not been properly explored in affiliation weighted networks. In this paper, we study an affiliation model with the degree sequence as the exclusively natural sufficient statistic in the exponential family distributions. We derive the consistency and asymptotic normality of the maximum likelihood estimator in affiliation finite discrete weighted networks when the numbers of actors and events both go to infinity. Simulation studies and a real data example demonstrate our theoretical results.  相似文献   

6.
Randomized response is a misclassification design to estimate the prevalence of sensitive behaviour. Respondents who do not follow the instructions of the design are considered to be cheating. A mixture model is proposed to estimate the prevalence of sensitive behaviour and cheating in the case of a dual sampling scheme with direct questioning and randomized response. The mixing weight is the probability of cheating, where cheating is modelled separately for direct questioning and randomized response. For Bayesian inference, Markov chain Monte Carlo sampling is applied to sample parameter values from the posterior. The model makes it possible to analyse dual sample scheme data in a unified way and to assess cheating for direct questions as well as for randomized response questions. The research is illustrated with randomized response data concerning violations of regulations for social benefit.  相似文献   

7.
Acute respiratory diseases are transmitted over networks of social contacts. Large-scale simulation models are used to predict epidemic dynamics and evaluate the impact of various interventions, but the contact behavior in these models is based on simplistic and strong assumptions which are not informed by survey data. These assumptions are also used for estimating transmission measures such as the basic reproductive number and secondary attack rates. Development of methodology to infer contact networks from survey data could improve these models and estimation methods. We contribute to this area by developing a model of within-household social contacts and using it to analyze the Belgian POLYMOD data set, which contains detailed diaries of social contacts in a 24-hour period. We model dependency in contact behavior through a latent variable indicating which household members are at home. We estimate age-specific probabilities of being at home and age-specific probabilities of contact conditional on two members being at home. Our results differ from the standard random mixing assumption. In addition, we find that the probability that all members contact each other on a given day is fairly low: 0.49 for households with two 0-5 year olds and two 19-35 year olds, and 0.36 for households with two 12-18 year olds and two 36+ year olds. We find higher contact rates in households with 2-3 members, helping explain the higher influenza secondary attack rates found in households of this size.  相似文献   

8.
Exponential-family random graph models (ERGMs) provide a principled way to model and simulate features common in human social networks, such as propensities for homophily and friend-of-a-friend triad closure. We show that, without adjustment, ERGMs preserve density as network size increases. Density invariance is often not appropriate for social networks. We suggest a simple modification based on an offset which instead preserves the mean degree and accommodates changes in network composition asymptotically. We demonstrate that this approach allows ERGMs to be applied to the important situation of egocentrically sampled data. We analyze data from the National Health and Social Life Survey (NHSLS).  相似文献   

9.
This paper deals with a Bayesian analysis of a finite Beta mixture model. We present approximation method to evaluate the posterior distribution and Bayes estimators by Gibbs sampling, relying on the missing data structure of the mixture model. Experimental results concern contextual and non-contextual evaluations. The non-contextual evaluation is based on synthetic histograms, while the contextual one model the class-conditional densities of pattern-recognition data sets. The Beta mixture is also applied to estimate the parameters of SAR images histograms.  相似文献   

10.
Summary. This paper introduces the paired comparison model as a suitable approach for the analysis of partially ranked data. For example, the Inglehart index, collected in international social surveys to examine shifts in post-materialistic values, generates such data on a set of attitude items. However, current analysis methods have failed to account for the complex shifts in individual item values, or to incorporate subject covariates. The paired comparison model is thus developed to allow for covariate subject effects at the individual level, and a reparameterization allows the inclusion of smooth non-linear effects of continuous covariates. The Inglehart index collected in the 1993 International Social Science Programme survey is analysed, and complex non-linear changes of item values with age, level of education and religion are identified. The model proposed provides a powerful tool for social scientists.  相似文献   

11.
This paper examines both theoretically and empirically whether the common practice of using OLS multivariate regression models to estimate average treatment effects (ATEs) under experimental designs is justified by the Neyman model for causal inference. Using data from eight large U.S. social policy experiments, the paper finds that estimated standard errors and significance levels for ATE estimators are similar under the OLS and Neyman models when baseline covariates are included in the models, even though theory suggests that this may not have been the case. This occurs primarily because treatment effects do not appear to vary substantially across study subjects.  相似文献   

12.
Affiliation network is one kind of two-mode social network with two different sets of nodes (namely, a set of actors and a set of social events) and edges representing the affiliation of the actors with the social events. Although a number of statistical models are proposed to analyze affiliation networks, the asymptotic behaviors of the estimator are still unknown or have not been properly explored. In this article, we study an affiliation model with the degree sequence as the exclusively natural sufficient statistic in the exponential family distributions. We establish the uniform consistency and asymptotic normality of the maximum likelihood estimator when the numbers of actors and events both go to infinity. Simulation studies and a real data example demonstrate our theoretical results.  相似文献   

13.
We adapt existing statistical modeling techniques for social networks to study consumption data observed in trophic food webs. These data describe the feeding volume (non-negative) among organisms grouped into nodes, called trophic species, that form the food web. Model complexity arises due to the extensive amount of zeros in the data, as each node in the web is predator/prey to only a small number of other trophic species. Many of the zeros are regarded as structural (non-random) in the context of feeding behavior. The presence of basal prey and top predator nodes (those who never consume and those who are never consumed, with probability 1) creates additional complexity to the statistical modeling. We develop a special statistical social network model to account for such network features. The model is applied to two empirical food webs; focus is on the web for which the population size of seals is of concern to various commercial fisheries.  相似文献   

14.
Summary.  A fully Bayesian analysis of directed graphs, with particular emphasis on applica- tions in social networks, is explored. The model is capable of incorporating the effects of covariates, within and between block ties and multiple responses. Inference is straightforward by using software that is based on Markov chain Monte Carlo methods. Examples are provided which highlight the variety of data sets that can be entertained and the ease with which they can be analysed.  相似文献   

15.
Abstract

In general, survival data are time-to-event data, such as time to death, time to appearance of a tumor, or time to recurrence of a disease. Models for survival data have frequently been based on the proportional hazards model, proposed by Cox. The Cox model has intensive application in the field of social, medical, behavioral and public health sciences. In this paper we propose a more efficient sampling method of recruiting subjects for survival analysis. We propose using a Moving Extreme Ranked Set Sampling (MERSS) scheme with ranking based on an easy-to-evaluate baseline auxiliary variable known to be associated with survival time. This paper demonstrates that this approach provides a more powerful testing procedure as well as a more efficient estimate of hazard ratio than that based on simple random sampling (SRS). Theoretical derivation and simulation studies are provided. The Iowa 65+ Rural study data are used to illustrate the methods developed in this paper.  相似文献   

16.
Frequentist and Bayesian methods differ in many aspects but share some basic optimal properties. In real-life prediction problems, situations exist in which a model based on one of the above paradigms is preferable depending on some subjective criteria. Nonparametric classification and regression techniques, such as decision trees and neural networks, have both frequentist (classification and regression trees (CARTs) and artificial neural networks) as well as Bayesian counterparts (Bayesian CART and Bayesian neural networks) to learning from data. In this paper, we present two hybrid models combining the Bayesian and frequentist versions of CART and neural networks, which we call the Bayesian neural tree (BNT) models. BNT models can simultaneously perform feature selection and prediction, are highly flexible, and generalise well in settings with limited training observations. We study the statistical consistency of the proposed approaches and derive the optimal value of a vital model parameter. The excellent performance of the newly proposed BNT models is shown using simulation studies. We also provide some illustrative examples using a wide variety of standard regression datasets from a public available machine learning repository to show the superiority of the proposed models in comparison to popularly used Bayesian CART and Bayesian neural network models.  相似文献   

17.
Missing data are often problematic in social network analysis since what is missing may potentially alter the conclusions about what we have observed as tie-variables need to be interpreted in relation to their local neighbourhood and the global structure. Some ad hoc methods for dealing with missing data in social networks have been proposed but here we consider a model-based approach. We discuss various aspects of fitting exponential family random graph (or p-star) models (ERGMs) to networks with missing data and present a Bayesian data augmentation algorithm for the purpose of estimation. This involves drawing from the full conditional posterior distribution of the parameters, something which is made possible by recently developed algorithms. With ERGMs already having complicated interdependencies, it is particularly important to provide inference that adequately describes the uncertainty, something that the Bayesian approach provides. To the extent that we wish to explore the missing parts of the network, the posterior predictive distributions, immediately available at the termination of the algorithm, are at our disposal, which allows us to explore the distribution of what is missing unconditionally on any particular parameter values. Some important features of treating missing data and of the implementation of the algorithm are illustrated using a well-known collaboration network and a variety of missing data scenarios.  相似文献   

18.
Family survival data can be used to estimate the degree of genetic and environmental contributions to the age at onset of a disease or of a specific event in life. The data can be modeled with a correlated frailty model in which the frailty variable accounts for the degree of kinship within the family. The heritability (degree of heredity) of the age at a specific event in life (or the onset of a disease) is usually defined as the proportion of variance of the survival age that is associated with genetic effects. If the survival age is (interval) censored, heritability as usually defined cannot be estimated. Instead, it is defined as the proportion of variance of the frailty associated with genetic effects. In this paper we describe a correlated frailty model to estimate the heritability and the degree of environmental effects on the age at which individuals contact a social worker for the first time and to test whether there is a difference between the survival functions of this age for twins and non-twins.  相似文献   

19.
Social network data represent the interactions between a group of social actors. Interactions between colleagues and friendship networks are typical examples of such data.The latent space model for social network data locates each actor in a network in a latent (social) space and models the probability of an interaction between two actors as a function of their locations. The latent position cluster model extends the latent space model to deal with network data in which clusters of actors exist — actor locations are drawn from a finite mixture model, each component of which represents a cluster of actors.A mixture of experts model builds on the structure of a mixture model by taking account of both observations and associated covariates when modeling a heterogeneous population. Herein, a mixture of experts extension of the latent position cluster model is developed. The mixture of experts framework allows covariates to enter the latent position cluster model in a number of ways, yielding different model interpretations.Estimates of the model parameters are derived in a Bayesian framework using a Markov Chain Monte Carlo algorithm. The algorithm is generally computationally expensive — surrogate proposal distributions which shadow the target distributions are derived, reducing the computational burden.The methodology is demonstrated through an illustrative example detailing relationships between a group of lawyers in the USA.  相似文献   

20.
Two statistical applications for estimation and prediction of flows in traffic networks are presented. In the first, the number of route users are assumed to be independent α-shifted gamma Γ(θ, λ0) random variables denoted H(α, θ, λ0), with common λ0. As a consequence, the link, OD (origin-destination) and node flows are also H(α, θ, λ0) variables. We assume that the main source of information is plate scanning, which permits us to identify, totally or partially, the vehicle route, OD and link flows by scanning their corresponding plate numbers at an adequately selected subset of links. A Bayesian approach using conjugate families is proposed that allows us to estimate different traffic flows. In the second application, a stochastic demand dynamic traffic model to predict some traffic variables and their time evolution in real networks is presented. The Bayesian network model considers that the variables are generalized Beta variables such that when marginally transformed to standard normal become multivariate normal. The model is able to provide a point estimate, a confidence interval or the density of the variable being predicted. Finally, the models are illustrated by their application to the Nguyen Dupuis network and the Vermont-State example. The resulting traffic predictions seem to be promising for real traffic networks and can be done in real time.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号