Filtering and Accelerating: A Unified Framework for High-Performance Persistence Estimation (IEEE TKDE, Under Review)

  • Authors: Qilong Shi*, Lu Cao*, Weiqiang Xiao, Nianfu Wang, Wenjun Li, Tong Yang, Zhijun Li, Weizhe Zhang, and Mingwei Xu
  • Qilong Shi and Lu Cao are co-first authors, and they conducted this work under the guidance of the corresponding authors Wenjun Li and Weizhe Zhang.

Abstract

Efficient data stream processing, particularly for persistence estimation, is crucial in handling high-velocity data streams characterized by skewed distributions of item frequencies. Unlike more straightforward frequency metrics, persistence captures items' recurrence across multiple time windows, posing a significant challenge to existing single-structure sketches where high-persistence and low-persistence items collide. To address this, we introduce the Hypersistent Sketch, a unified framework for high-performance estimation built on two decoupled mechanisms: filtering and accelerating. The filtering component, a Cold Filter, directly addresses the skewed nature of data streams. It separates hot items from the majority of cold ones, which allows for differential treatment. The accelerating component, a Burst Filter, then optimizes the processing of hot items. It significantly improves throughput by preventing repeated insertions within a single window. We demonstrate its generality by applying it to various state-of-the-art sketches (e.g., On-Off, Waving, P-Sketch), showing it consistently enhances their original performance. We also deploy our framework on Redis platforms, demonstrating the framework's broad applicability and scalability.

Downloads

News

2025.7.12: The source code of the extended Hypersistent Sketch and the Appendix for TKDE have been uploaded.

 

 

  Hypersistent Sketch: Enhanced Persistence Estimation via Fast Item Separation (IEEE ICDE, 2025)

  • Authors: Lu Cao*, Qilong Shi*, Weiqiang Xiao, Nianfu Wang, Wenjun Li, Zhijun Li, Weizhe Zhang, and Mingwei Xu
  • Lu Cao and Qilong Shi are co-first authors, and they conducted this work under the guidance of the corresponding authors Wenjun Li and Weizhe Zhang.

Abstract

Efficient data stream processing, particularly for persistence estimation, is crucial in handling high-velocity data streams characterized by skewed distributions of item frequencies. Unlike more straightforward frequency metrics, persistence captures items' recurrence across multiple time windows, requiring nuanced processing approaches. In response, we introduce the Hypersistent Sketch, an algorithm that significantly enhances persistence estimation through innovative filtering techniques. Our design incorporates a Cold Filter to address the skewed nature of data streams where a few high-frequency (hot) items dominate. This filter allows for differential treatment by using smaller counters for most low-frequency (cold) items, thus conservatively allocating memory resources that would otherwise be sized uniformly based on hot items. However, the Cold Filter can reduce throughput due to its segregative processing. To mitigate this, we implement a Burst Filter, which optimizes the processing of hot items. The Burst Filter significantly improves throughput by preventing repeated insertions within a single window--where persistence increases by at most one--and deferring the insertion until the window's end. Comparative evaluations demonstrate that the Hypersistent Sketch outperforms existing solutions like the On-Off Sketch, offering up to 3 times improved throughput while maintaining competitive accuracy and substantially reducing memory usage in handling large-scale data streams.

Downloads

News

2025.7.12: Source code V1.1 of Hypersistent Sketch has been uploaded, tnd he revised code has optimized the implementation of the PSketch algorithm.

2025.3.31: The final paper of Hypersistent Sketch have been uploaded.

2025.3.30: The source code of Hypersistent Sketch and the Appendix have been uploaded.

2025.3.28: The website of Hypersistent Sketch is under construction and will be finished soon.

Contact

E-mail: wenjunli@pku.org.cn.