streaming data proc.

On implementing high-performance data structures to process streaming data.

For many applications, data, by nature, is streaming; new data items arrive continuously, and the data grows with time. With paradigms such as Internet of Things and Edge Computing, such applications become more natural and more practical. This dynamic data can be in tabular form where the items are not related with each other. Furthermore, many well-known, real-world problems involve data which describe the relationship among the entities. Graphs and hypergraphs are powerful combinatorial structures that are frequently used to model such data.

Sketches are probabilistic data structures that can provide approximate results within mathematically proven error bounds while using orders of magnitude less memory than traditional approaches. They are tailored for streaming data analysis on architectures even with limitqed memory such as single-board computers (SBCs). Furthermore, even on high-end servers, a carefully tuned, high-performance design and implementations can significantly boost the performance. In this project, we are working on understanding the details of sketches and implementing them on SBCs, CPUs and GPUs.

Related Publications:

  • Streaming Hypergraph Partitioning Algorithms on Limited Memory Environments. Fatih Taşyaran, Berkay Demireller, Kamer Kaya, Bora Uçar. HPCS 2021
  • Differentially Private Frequency Sketches for Intermittent Queries on Large Data Streams. S Yıldırım, K Kaya, S Aydın, HB Erentuğ. IEEE International Conference on Big Data (Big Data), 4083-4092, 2020
  • One table to count them all: Parallel frequency estimation on single-board computers. Fatih Taşyaran, Kerem Yıldırır, Mustafa Kemal Taş, Kamer Kaya. European Conference on Parallel Processing, 405-418, 2019