Filtering and Accelerating: A Unified Framework for High-Performance Persistence Estimation (IEEE TKDE, Under Review)
|
Efficient data stream processing, particularly for persistence estimation, is crucial in handling high-velocity data streams characterized by skewed distributions of item frequencies. Unlike more straightforward frequency metrics, persistence captures items' recurrence across multiple time windows, posing a significant challenge to existing single-structure sketches where high-persistence and low-persistence items collide. To address this, we introduce the Hypersistent Sketch, a unified framework for high-performance estimation built on two decoupled mechanisms: filtering and accelerating. The filtering component, a Cold Filter, directly addresses the skewed nature of data streams. It separates hot items from the majority of cold ones, which allows for differential treatment. The accelerating component, a Burst Filter, then optimizes the processing of hot items. It significantly improves throughput by preventing repeated insertions within a single window. We demonstrate its generality by applying it to various state-of-the-art sketches (e.g., On-Off, Waving, P-Sketch), showing it consistently enhances their original performance. We also deploy our framework on Redis platforms, demonstrating the framework's broad applicability and scalability.
2025.7.12: The source code of the extended Hypersistent Sketch and the Appendix for TKDE have been uploaded.
Hypersistent Sketch: Enhanced Persistence Estimation via Fast Item Separation (IEEE ICDE, 2025)
|
Efficient data stream processing, particularly for persistence estimation, is crucial in handling high-velocity data streams characterized by skewed distributions of item frequencies. Unlike more straightforward frequency metrics, persistence captures items' recurrence across multiple time windows, requiring nuanced processing approaches. In response, we introduce the Hypersistent Sketch, an algorithm that significantly enhances persistence estimation through innovative filtering techniques. Our design incorporates a Cold Filter to address the skewed nature of data streams where a few high-frequency (hot) items dominate. This filter allows for differential treatment by using smaller counters for most low-frequency (cold) items, thus conservatively allocating memory resources that would otherwise be sized uniformly based on hot items. However, the Cold Filter can reduce throughput due to its segregative processing. To mitigate this, we implement a Burst Filter, which optimizes the processing of hot items. The Burst Filter significantly improves throughput by preventing repeated insertions within a single window--where persistence increases by at most one--and deferring the insertion until the window's end. Comparative evaluations demonstrate that the Hypersistent Sketch outperforms existing solutions like the On-Off Sketch, offering up to 3 times improved throughput while maintaining competitive accuracy and substantially reducing memory usage in handling large-scale data streams.
2025.7.12: Source code V1.1 of Hypersistent Sketch has been uploaded, tnd he revised code has optimized the implementation of the PSketch algorithm.
2025.3.31: The final paper of Hypersistent Sketch have been uploaded.
2025.3.30: The source code of Hypersistent Sketch and the Appendix have been uploaded.
2025.3.28: The website of Hypersistent Sketch is under construction and will be finished soon.
E-mail: wenjunli@pku.org.cn.