Kamer Kaya | Influence Maximization

With their rapid growth, the study of effective information diffusion in networks becomes a fruitful area of research with several applications from many fields such as viral marketing, social media analysis, and recommendation systems. Since these networks have been used for educational, political, economical, and social purposes, the diffused information can have various importance levels. Furthermore, the diffusion can be a time-critical process, but it can be costly to increase its speed and coverage by other means. Hence, novel approaches to find good vertex sets which effectively spreads information are vital in practice.

The Influence Maximization (IM) problem is introduced by Kempe et al. Formally, it focuses on finding the most promising seed (vertex) set with a given cardinality that increases the expected number of influenced vertices. IM is proven to be NP-hard and there are various simplifications and heuristics proposed in the literature. It has also been shown that a greedy Monte-Carlo approach provides a constant approximation for the optimal solution.

For a graph with n vertices, the expected complexity of this greedy algorithm, estimating an influence score σ, running R simulations and selecting K seed vertices is O(KRnσ).Hence, for real-life networks with hundreds of thousands of vertices, the approach is expensive. However, these simulation-based, greedy algorithms provide the best possible approximation guarantees. Therefore they are considered as the gold standard for IM.

Performing the simulations of a greedy algorithm in parallel is an immediate and straightforward remedy to reduce the execution time of IM kernels and make them scalable for large-scale networks, but restructuring the kernels to leverage instruction-level parallelism has not been investigated before. Although modern compilers can efficiently and automatically utilize instruction-level parallelism for applications with regular memory access patterns, it is not a straightforward task for graph processing kernels due to their irregular memory accesses.

Fused sampling overlaps the sampling phase with the actual computation. Hence, although it involves a significant amount of recomputation, for probabilistic graph algorithms, one does not need to extract and store samples from graphs. When carefully used, sketch-based graph traversals can significantly reduce the complexity of many graph algorithms.

In addition, instead of performing graph traversals sketch-based approximate computing techniques can be used to find good seed sets for influence maximization. In this project, we aim to make the IM kernels faster on multicore CPUs and manycore GPUs by using various techniques such as fused sampling, memory access regularization, sketch-based computing and parallelization.

With error adaptive cascade and refreshing, one can ignore a significant amount of computation for a small error margin on the estimated influence score.

Related Publications:

Boosting parallel influence-maximization kernels for undirected networks with fusing and vectorization G Göktürk, K Kaya, IEEE Transactions on Parallel and Distributed Systems 32 (5), 1001-1013, 3, 2020
Fast and Error-Adaptive Influence Maximization based on Count-Distinct Sketches G Gokturk, K Kaya, arXiv preprint arXiv:2105.04023 (in submission)