Guided sampling for large graphs
- Authors
- Yousuf, Muhammad Irfan; Kim, Suhyun
- Issue Date
- 2020-07
- Publisher
- SPRINGER
- Citation
- DATA MINING AND KNOWLEDGE DISCOVERY, v.34, no.4, pp.905 - 948
- Abstract
- Large real-world graphs claim lots of resources in terms of memory and computational power to study them and this makes their full analysis extremely challenging. In order to understand the structure and properties of these graphs, we intend to extract a small representative subgraph from a big graph while preserving its topology and characteristics. In this work, we aim at producing good samples with sample size as low as 0.1% while maintaining the structure and some of the key properties of a network. We exploit the fact that average values of degree and clustering coefficient of a graph can be estimated accurately and efficiently. We use the estimated values to guide the sampling process and extract tiny samples that preserve the properties of the graph and closely approximate their distributions in the original graph. The distinguishing feature of our work is that we apply traversal based sampling that utilizes only the local information of nodes as opposed to the global information of the network and this makes our approach a practical choice for crawling online networks. We evaluate the effectiveness of our sampling technique using real-world datasets and show that it surpasses the existing methods.
- Keywords
- RANDOM-WALKS; NETWORKS; RANDOM-WALKS; NETWORKS; Big graphs; Graph sampling; Social networks
- ISSN
- 1384-5810
- URI
- https://pubs.kist.re.kr/handle/201004/118428
- DOI
- 10.1007/s10618-020-00683-y
- Appears in Collections:
- KIST Article > 2020
- Files in This Item:
There are no files associated with this item.
- Export
- RIS (EndNote)
- XLS (Excel)
- XML
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.