Influence Maximization
How do you find the seed nodes in a network to increase the impact of a diffusion and the overall influence?
With their rapid growth, the study of effective information diffusion in networks becomes a fruitful area of research with several applications from many fields such as viral marketing, social media analysis, and recommendation systems. Since these networks have been used for educational, political, economical, and social purposes, the diffused information can have various importance levels. Furthermore, the diffusion can be a time-critical process, but it can be costly to increase its speed and coverage by other means. Hence, novel approaches to find good vertex sets which effectively spreads information are vital in practice.
The Influence Maximization (IM) problem is introduced by Kempe et al. Formally, it focuses on finding the most promising seed (vertex) set with a given cardinality that increases the expected number of influenced vertices. IM is proven to be NP-hard and there are various simplifications and heuristics proposed in the literature. It has also been shown that a greedy Monte-Carlo approach provides a constant approximation for the optimal solution.
For a graph with n vertices, the expected complexity of this greedy algorithm, estimating an influence score σ, running R simulations and selecting K seed vertices is O(KRnσ).Hence, for real-life networks with hundreds of thousands of vertices, the approach is expensive. However, these simulation-based, greedy algorithms provide the best possible approximation guarantees. Therefore they are considered as the gold standard for IM.
Performing the simulations of a greedy algorithm in parallel is an immediate and straightforward remedy to reduce the execution time of IM kernels and make them scalable for large-scale networks, but restructuring the kernels to leverage instruction-level parallelism has not been investigated before. Although modern compilers can efficiently and automatically utilize instruction-level parallelism for applications with regular memory access patterns, it is not a straightforward task for graph processing kernels due to their irregular memory accesses.


In addition, instead of performing graph traversals sketch-based approximate computing techniques can be used to find good seed sets for influence maximization. In this project, we aim to make the IM kernels faster on multicore CPUs and manycore GPUs by using various techniques such as fused sampling, memory access regularization, sketch-based computing and parallelization.

Related Publications:
-
Boosting parallel influence-maximization kernels for undirected networks with fusing and vectorization G Göktürk, K Kaya, IEEE Transactions on Parallel and Distributed Systems 32 (5), 1001-1013, 3, 2020
-
Fast and Error-Adaptive Influence Maximization based on Count-Distinct Sketches G Gokturk, K Kaya, arXiv preprint arXiv:2105.04023 (in submission)