期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Kernel partial correlation: a novel approach to capturing conditional independence in graphical models for noisy data

Jihwan Oh Faye Zheng R. W. Doerge 《Journal of applied statistics》2018,45(14):2677-2696

Graphical models capture the conditional independence structure among random variables via existence of edges among vertices. One way of inferring a graph is to identify zero partial correlation coefficients, which is an effective way of finding conditional independence under a multivariate Gaussian setting. For more general settings, we propose kernel partial correlation which extends partial correlation with a combination of two kernel methods. First, a nonparametric function estimation is employed to remove effects from other variables, and then the dependence between remaining random components is assessed through a nonparametric association measure. The proposed approach is not only flexible but also robust under high levels of noise owing to the robustness of the nonparametric approaches. 相似文献

2.

Analysis of mixed correlated bivariate zero-inflated count and (k,l)-inflated beta responses with application to social network datasets

E. Tabrizi M. Ganjali 《统计学通讯:理论与方法》2019,48(7):1651-1681

This paper presents a new model that monitors the basic network formation mechanisms via the attributes through time. It considers the issue of joint modeling of longitudinal inflated (0, 1)-support continuous and inflated count response variables. For joint model of mentioned response variables, a correlated generalized linear mixed model is studied. The fraction response is inflated in two points k and l (k < l) and a k and l inflated beta distribution is introduced to use as its distribution. Also, the count response is inflated in zero and we use some members of zero-inflated power series distributions, hurdle-at-zero, members of zero-inflated double power series distributions and zero-inflated generalized Poisson distribution as our count response distribution. A full likelihood-based approach is used to yield maximum likelihood estimates of the model parameters and the model is applied to a real social network obtained from an observational study where the rate of the ith node’s responsiveness to the jth node and the number of arrows or edges with some specific characteristics from the ith node to the jth node are the correlated inflated (0, 1)-support continuous and inflated count response variables, respectively. The effect of the sender and receiver positions in an office environment on the responses are investigated simultaneously. 相似文献

3.

Edge Estimation in the Population of a Binary Tree Using Node-Sampling

D. Shukla Yashwant Singh Rajput 《统计学通讯:理论与方法》2014,43(13):2815-2829

Suppose a finite population of several vertices, each connected to single or multiple edges. This constitutes a structure of graphical population of vertices and edges. As a special case, the graphical population like a binary tree having only two child vertices associated to parent vertex is taken into consideration. The entire binary tree is divided into two sub-graphs such as a group of left-nodes and a group of right-nodes. This paper takes into account a mixture of graph structured and population sampling theory together and presents a methodology for mean-edge-length estimation of left sub-graph using right edge sub-graph as an auxiliary source of information. A node-sampling procedure is developed for this purpose and a class of estimators is proposed containing several good estimators. Mathematical conditions for minimum bias and optimum mean squared error of the class are derived and theoretical results are numerically supported with a test of 99% confidence intervals. It is shown that suggested class has a sub-class of optimum estimators, and sample-based estimates are closer to the true value of the population parameter. 相似文献

4.

Precise large deviations for aggregate claims

Yang Yang Liwei Sha 《统计学通讯:理论与方法》2013,42(10):2801-2809

Abstract

In this article, we consider a non standard renewal risk model, in which the claim sizes form a sequence of independent and identically distributed random variables; the inter-arrival times are negatively associated; and each pair of the claim size and its inter-arrival time follows negative association or arbitrary dependence structure. We establish some precise large-deviation formulas for the aggregate amount of claims in the heavy-tailed case. 相似文献

5.

Designs for two-level factorial experiments with linear models containing main effects and selected two-factor interactions

《Journal of statistical planning and inference》1997,64(1):109-124

An orthogonal polynomial model is used to model the response influenced by n two level factors. Such a model is represented by an undirected graph g with n vertices and e edges. The vertices identify the n main effects and the e edges identify the two-factor interactions of interest which together with the mean are the parameters of interest. A g-design is a saturated design which can provide an unbiased estimator for these parameters and its design matrix is called a g-matrix. The latter two concepts were introduced by Hedayat and Pesotan (Statistica Sinica 2 (1992), 453–464). In this paper methods of constructing g-matrices are studied since such constructions are equivalent to the construction of g-designs. Some bounds on the absolute value of a determinant of a g-matrix are given and D-optimality results on certain classes of g-matrices are presented. 相似文献

6.

A class of multivariate distribution-free tests of independence based on graphs

R. Heller M. Gorfine Y. Heller 《Journal of statistical planning and inference》2012

A class of distribution-free tests is proposed for the independence of two subsets of response coordinates. The tests are based on the pairwise distances across subjects within each subset of the response. A complete graph is induced by each subset of response coordinates, with the sample points as nodes and the pairwise distances as the edge weights. The proposed test statistic depends only on the rank order of edges in these complete graphs. The response vector may be of any dimensions. In particular, the number of samples may be smaller than the dimensions of the response. The test statistic is shown to have a normal limiting distribution with known expectation and variance under the null hypothesis of independence. The exact distribution free null distribution of the test statistic is given for a sample of size 14, and its Monte-Carlo approximation is considered for larger sample sizes. We demonstrate in simulations that this new class of tests has good power properties for very general alternatives. 相似文献

7.

On the Validity of the Markov Interpretation of Path Diagrams of Gaussian Structural Equations Systems with Correlated Errors

Jan T. A. Koster 《Scandinavian Journal of Statistics》1999,26(3):413-431

Pearl's d -separation concept and the ensuing Markov property is applied to graphs which may have, between each two different vertices i and j , any subset of { i ← j , i → j , i ↔ j } as edges. The class of graphs so obtained is closed under marginalization. Furthermore, the approach permits a direct proof of this theorem: "The distribution of a multivariate normal random vector satisfying a system of linear simultaneous equations is Markov w.r.t. the path diagram of the linear system". 相似文献

8.

Neighborhood graphs,stripes and shadow plots for cluster visualization

Friedrich Leisch 《Statistics and Computing》2010,20(4):457-469

Centroid-based partitioning cluster analysis is a popular method for segmenting data into more homogeneous subgroups. Visualization can help tremendously to understand the positions of these subgroups relative to each other in higher dimensional spaces and to assess the quality of partitions. In this paper we present several improvements on existing cluster displays using neighborhood graphs with edge weights based on cluster separation and convex hulls of inner and outer cluster regions. A new display called shadow-stars can be used to diagnose pairwise cluster separation with respect to the distribution of the original data. Artificial data and two case studies with real data are used to demonstrate the techniques. 相似文献

9.

Bayesian Model Choice in Exponential Survival Models

《统计学通讯:理论与方法》2013,42(12):2311-2330

ABSTRACT

Log-linear models for the distribution on a contingency table are represented as the intersection of only two kinds of log-linear models. One assuming that a certain group of the variables, if conditioned on all other variables, has a jointly independent distribution and another one assuming that a certain group of the variables, if conditioned on all other variables, has no highest order interaction. The subsets entering into these models are uniquely determined by the original log-linear model. This canonical representation suggests considering joint conditional independence and conditional no highest order association as the elementary building blocks of log-linear models. 相似文献

10.

The stability of several measures of association in small contingency tables

Thomas W. O'Gorman Robert F. Woolson 《统计学通讯:理论与方法》2013,42(3):1141-1155

Measures of association are often used to describe the relationship between row and column variables in two—dimensional contingency tables. It is not uncommon in biomedical research to categorize continuous variables to obtain a two—dimensional table. In these situations it is desirable that the measure of association not be too sensitive to changes in the number of categories or to the choice of cut points. To accomplish this objective we attempt to find a measure of association that closely approximates the corresponding measure of association for the underlying distribution.Measures that are close to the underlying measure for various table sizes andcutpoints are called stable measures. 相似文献

11.

Network inference and community detection,based on covariance matrices,correlations, and test statistics from arbitrary distributions

Thomas E. Bartlett 《统计学通讯:理论与方法》2017,46(18):9150-9165

In this article we propose methodology for inference of binary-valued adjacency matrices from various measures of the strength of association between pairs of network nodes, or more generally pairs of variables. This strength of association can be quantified by sample covariance and correlation matrices, and more generally by test-statistics and hypothesis test p-values from arbitrary distributions. Community detection methods such as block modeling typically require binary-valued adjacency matrices as a starting point. Hence, a main motivation for the methodology we propose is to obtain binary-valued adjacency matrices from such pairwise measures of strength of association between variables. The proposed methodology is applicable to large high-dimensional data sets and is based on computationally efficient algorithms. We illustrate its utility in a range of contexts and data sets. 相似文献

12.

Reliability evaluation subject to assured accuracy rate and time for stochastic unreliable-node computer networks

《Journal of Statistical Computation and Simulation》2012,82(7):1530-1542

In many real-life networks such as computer networks, branches and nodes have multi-state capacity, lead time, and accuracy rate. The network with unreliable nodes is more complex to evaluate the reliability because node failure results in the disabled of adjacent branches. Such a network is named a stochastic unreliable-node computer network (SUNCN). Under the strict assumption that each component (branch and node) has a deterministic capacity, the quickest path (QP) problem is to find a path sending a specific amount of data with minimum transmission time. The accuracy rate is a critical index to measure the performance of a computer network because some packets are damaged or lost due to voltage instability, magnetic field effects, lightning, etc. Subject to both assured accuracy rate and time constraints, this paper extends the QP problem to discuss the system reliability of an SUNCN. An efficient algorithm based on a graphic technique is proposed to find the minimal capacity vector meeting such constraints. System reliability, the probability to send a specific amount of data through multiple minimal paths subject to both assured accuracy rate and time constraints, can subsequently be computed. 相似文献

13.

LARS-type algorithm for group lasso

Chun Yip Yau Tsz Shing Hui 《Statistics and Computing》2017,27(4):1041-1048

The least absolute shrinkage and selection operator (lasso) has been widely used in regression analysis. Based on the piecewise linear property of the solution path, least angle regression provides an efficient algorithm for computing the solution paths of lasso. Group lasso is an important generalization of lasso that can be applied to regression with grouped variables. However, the solution path of group lasso is not piecewise linear and hence cannot be obtained by least angle regression. By transforming the problem into a system of differential equations, we develop an algorithm for efficient computation of group lasso solution paths. Simulation studies are conducted for comparing the proposed algorithm to the best existing algorithm: the groupwise-majorization-descent algorithm. 相似文献

14.

On the Multivariate Upcrossings Index

C. Viseu L. Pereira H. Ferreira 《统计学通讯:理论与方法》2014,43(6):1277-1292

The multivariate extremal index function is a measure of the clustering among the extreme values of a multivariate stationary sequence. In this article, we introduce a measure of the degree of clustering of upcrossings in a multivariate stationary sequence, called multivariate upcrossings index, which is a multivariate generalization of the concept of upcrossings index. We derive the main properties of this function, namely the relations with the multivariate extremal index and the clustering of upcrossings.

Imposing general local and asymptotic dependence restrictions on the sequence or on its marginals we compute the multivariate upcrossings index from the marginal upcrossings indices and from the joint distribution of a finite number of variables. A couple of illustrative examples are exploited. 相似文献

15.

Graphs for Margins of Bayesian Networks

下载免费PDF全文

Robin J. Evans 《Scandinavian Journal of Statistics》2016,43(3):625-648

Directed acyclic graph (DAG) models—also called Bayesian networks—are widely used in probabilistic reasoning, machine learning and causal inference. If latent variables are present, then the set of possible marginal distributions over the remaining (observed) variables is generally not represented by any DAG. Larger classes of mixed graphical models have been introduced to overcome this; however, as we show, these classes are not sufficiently rich to capture all the marginal models that can arise. We introduce a new class of hyper‐graphs, called mDAGs, and a latent projection operation to obtain an mDAG from the margin of a DAG. We show that each distinct marginal of a DAG model is represented by at least one mDAG and provide graphical results towards characterizing equivalence of these models. Finally, we show that mDAGs correctly capture the marginal structure of causally interpreted DAGs under interventions on the observed variables. 相似文献

16.

The enumeration of restricted random walks by Sheffer polynomials with applications to statistics

《Journal of statistical planning and inference》1986,14(1):95-114

Sheffer polynomials are solutions of certain systems of operator equations. Difference equations, which frequently occur in path enumeration, belong in that class. To find representations of the solutions, the restriction on the paths has to be in the form of boundaries. Such problems have applications in two-sample tests. We also consider paths with more than two step vectors. The gambler's ruin problem illustrates the method. If paths with a given area underneath are counted, q-binomial coefficients come into play. Eulerian Sheffer sequence solve some of such problems. 相似文献

17.

The stochastic topic block model for the clustering of vertices in networks with textual edges

C. Bouveyron P. Latouche R. Zreik 《Statistics and Computing》2018,28(1):11-31

Due to the significant increase of communications between individuals via social media (Facebook, Twitter, Linkedin) or electronic formats (email, web, e-publication) in the past two decades, network analysis has become an unavoidable discipline. Many random graph models have been proposed to extract information from networks based on person-to-person links only, without taking into account information on the contents. This paper introduces the stochastic topic block model, a probabilistic model for networks with textual edges. We address here the problem of discovering meaningful clusters of vertices that are coherent from both the network interactions and the text contents. A classification variational expectation-maximization algorithm is proposed to perform inference. Simulated datasets are considered in order to assess the proposed approach and to highlight its main features. Finally, we demonstrate the effectiveness of our methodology on two real-word datasets: a directed communication network and an undirected co-authorship network. 相似文献

18.

多图模型及其在宏观经济指标相关分析中的应用

高伟崔婉琪邓笑笑《统计与信息论坛》2020,(1):40-44

多图模型表示来自于不同类的同一组随机变量间的相关关系,结点表示随机变量,边表示变量之间的直接联系,各类的图模型反映了各自相关结构特征和类间共同的信息。用多图模型联合估计方法,将来自不同个体的数据按其特征分类,假设每类中各变量间的相依结构服从同一个高斯图模型,应用组Lasso方法和图Lasso方法联合估计每类的图模型结构。数值模拟验证了多图模型联合估计方法的有效性。用多图模型和联合估计方法对中国15个省份13个宏观经济指标进行相依结构分析,结果表明,不同经济发展水平省份的宏观经济变量间存在共同的相关联系,反映了中国现阶段经济发展的特征;每一类的相关结构反映了各类省份经济发展独有的特征。相似文献

19.

Random Accessibility as a Parallelism to Reliability Studies on Simple Graphs

《统计学通讯:理论与方法》2013,42(6):1423-1436

ABSTRACT

Graphs are classified according to the numbers of vertices and edges they contain. For simple graphs a concept of accessibility is defined and its relation to the graph reliability problem is investigated. Exact results are derived for special subclasses of bipartite and tripartite graphs in relation to accessibility. Several conjectures on the relation of accessibility and all-terminal reliability are numerically illustrated without proof. 相似文献

20.

A network analysis of student mobility patterns from high school to master’s

Genova Vincenzo G. Tumminello Michele Aiello Fabio Attanasio Massimo 《Statistical Methods and Applications》2021,30(5):1445-1464

Human migration involves the movement of people from one place to another. An example of undirected migration is Italian student mobility where students move from the South to the Center-North. This kind of mobility has become of general interest, and this work explores student mobility from Sicily towards universities outside the island. The data used in this paper regards six cohorts of students, from 2008/09 to 2013/14. In particular, our goal is to study the 3-step migration path: the area of origin (Sicilian provinces), the regional university for the bachelor’s degree, and the regional university for the master’s. Our analysis is conducted by building a multipartite network with four sets of nodes: students; Sicilian provinces; bachelor region of studies; and the master region of studies. By projecting the students’ set onto the others, we obtain a tripartite network where the number of students represents the link weight. Results show that the big Sicilian cities—Palermo, Catania, and Messina—have different preferential paths compared to small Sicilian cities. Furthermore, the results reveal preferential paths of 3-step mobility that only, in part, reflect a south-north orientation in the transition from the region of study for the bachelor degree to that for the master’s.

相似文献