Guided sampling for large graphs

Authors
Yousuf, Muhammad IrfanKim, Suhyun
Issue Date
2020-07
Publisher
SPRINGER
Citation
DATA MINING AND KNOWLEDGE DISCOVERY, v.34, no.4, pp.905 - 948
Abstract
Large real-world graphs claim lots of resources in terms of memory and computational power to study them and this makes their full analysis extremely challenging. In order to understand the structure and properties of these graphs, we intend to extract a small representative subgraph from a big graph while preserving its topology and characteristics. In this work, we aim at producing good samples with sample size as low as 0.1% while maintaining the structure and some of the key properties of a network. We exploit the fact that average values of degree and clustering coefficient of a graph can be estimated accurately and efficiently. We use the estimated values to guide the sampling process and extract tiny samples that preserve the properties of the graph and closely approximate their distributions in the original graph. The distinguishing feature of our work is that we apply traversal based sampling that utilizes only the local information of nodes as opposed to the global information of the network and this makes our approach a practical choice for crawling online networks. We evaluate the effectiveness of our sampling technique using real-world datasets and show that it surpasses the existing methods.
Keywords
RANDOM-WALKS; NETWORKS; RANDOM-WALKS; NETWORKS; Big graphs; Graph sampling; Social networks
ISSN
1384-5810
URI
https://pubs.kist.re.kr/handle/201004/118428
DOI
10.1007/s10618-020-00683-y
Appears in Collections:
KIST Article > 2020
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML

qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE