Cuckoo Counter: Adaptive Structure of Counters for Accurate Frequency and Top-k Estimation (IEEE/ACM Transactions on Networking, 2023)

  • Authors: Qilong Shi, Yuchen Xu, Jiuhua Qi, Wenjun Li, Tong Yang, Yang Xu, Yi Wang
  • Qilong Shi, Yuchen Xu and Jiuhua Qi contributed equally to this paper, and they conducted this work under the guidance of Wenjun Li and Tong Yang.
  • Corresponding authors: Wenjun Li (e-mail: wenjunli@pku.org.cn) and Tong Yang

 

Abstract

Sketch, as an emerging probabilistic structure, has been extensively investigated in the approximate computing of massive data stream processing, but few of them can work well on both frequency and top-k estimation, two fundamental tasks. By introducing a pre-filtering stage to isolate hot and cold flows, the recently proposed Augmented Sketch (ASketch) significantly improves accuracy for both tasks. However, it suffers from serious performance degradation because of frequent flow exchanges. In this paper, we propose Cuckoo Counter (CC), an adaptive structure that consists of several buckets organized in a specific way. The size of the entry in each bucket is carefully designed to match the actual distribution of streams. During processing, CC hashes a flow to buckets and uses the idea of cuckoo hashing to relocate the flow if an overflow or collision happens, which contributes to fully utilizing memory. Therefore, the replacement strategy helps CC precisely record hot flows and cover more cold flows, and also guarantees the throughput. Extensive experimental results show that CC has the highest accuracy, highest (top-k) precision, and competitive throughput compared to the state-of-the-art. Specifically, CC improves the throughput and accuracy by around 1 and 2 orders of magnitude respectively compared to the well-known ASketch.


Q&A for Readers

Q1: Why Coukoo Counter can acheive memory efficient?

A1: The frequency of real-life flows are often highly skewed. In other words, the majority of flows are cold (i.e., have a low frequency), while a few are hot (i.e., have a high frequency). In Cuckoo Counter, we take full advantage of this feature, the size of the entry in each bucket is carefully designed to count the frequencies of cold flows and hot flows respectively, which can handle skewed data streams efficiently and improve the memory utilization. Beside, during the online stream processing, Cuckoo Counter hashes a flow to buckets and uses the idea of cuckoo hashing to relocate the flow if an overflow or collision happens, which contributes to fully utilizing memory.


Q2: Why Coukoo Counter can acheive high speed?

A2: As we said in the abstract, 'By introducing a pre-filtering stage to isolate hot and cold flows, the recently proposed Augmented Sketch (ASketch) significantly improves accuracy for both tasks. However, it suffers from serious performance degradation because of frequent flow exchanges', we can see that flow exchange among stages is the main trouble maker for performance degradation, because each swap among stages may result in one more memory access in ASketch. In Coukoo Counter, we also try to isolate hot and cold flows by using different sized counters, and exchange (i.e., relocate) the flows if an overflow happens. However, the flow exchange in Coukoo Counter is firstly conducted in the same bucket within O(1) memory access, so that hot flows and cold flows can be relocated into suitable entries without sacrificing update performance. Only when all entries in the same bucket are overflowed, we then exchange(i.e.,kick out) the flows among different buckets, which may introduce one more memory access. Besides, the max kicking number is also limited by maxloop, therefore, the worst-case performance of Coukoo Counter can also be guaranteed.

Q3: Why Coukoo Counter can acheive high accuracy and high precision?

A3: When serious conflicts happens, Cuckoo Counter leverages the partial-key cuckoo hashing to kick out flows stored in the smallest entries (i.e., cold flows), fills each bucket as much as possible to improve memory utilization without losing too much accuracy, and naturally preserves more hot flows. Therefore, the replacement strategy (i.e., cuckoo-like kicking & expeling cold flows) helps Cuckoo Counter precisely record hot flows and cover more cold flows.


Q?: If you have any questions? Please feel free to contact with me (e-mail: wenjunli@g.harvard.edu).

A?: I will do my best to answer for you.


Downloads

 

News

2022.12.28: The final journal version of Cuckoo Counter [TON 2023] has been attached.

2022.12.7: The extended journal paper of Cuckoo Counter has been accepted by IEEE/ACM Transactions on Networking.

2022.09.15: We have updated the revised source code of Cuckoo Counter (i.e., Source_Code_V3.0_R2), thanks a lot to the insight comments from our anonymous reviewers. Particularly, to report the top-k most frequent flows, we add an extra heap. This heap is different from the min-heap in the paper of CM-sketch or HeavyKeeper, it uses the f_start field to record the frequency of a flow when it is first entered into the heap. By adding this field, we can improve the algorithm to filter out some false top-k flows. Experiments show that this optimization improves the accuracy of top-k estimation compared to ordinary min-heap. We also implement Cuckoo Counter on top of BESS (a programmable platform for vSwitch dataplane) to verify the portability of CC on network platforms, and we measure its packet rate and throughput under different memory. Feel free to contact with me if you have any questions.

2022.04.20: We have updated the revised source code of Cuckoo Counter (i.e., Source_Code_V2.0_R1), thanks a lot to the insight comments from our anonymous reviewers. Particularly, we modified the query function to return 0 if there is an empty entry in the searched bucket (It should be note that this modification has little effect on the following experimental results, because when the memory is small and the data is large, it is difficult to find an empty entry in CC), besides, we add some recently proposed algorithms for comparison in our revised source code. Feel free to contact with me if you have any questions.

2021.05.27: Some Questions and Answers (Q&A) have been attached in this page. Feel free to contact with me if you have any questions.

2021.05.26: The source code of Cuckoo Counter as well as all compared algorithms have been uploaded. More test datasets can also be downloaded from GitHub Releases (https://github.com/wenjunpaper/CuckooCounter/releases/tag/Datasets).

2021.05.26: The conference version of Cuckoo Counter [ANCS 2019] has been attached.

2021.05.25: The source code of Cuckoo Counter has been uploaded at GitHub (https://github.com/wenjunpaper/CuckooCounter). All test datasets can also be downloaded from GitHub Releases (https://github.com/wenjunpaper/CuckooCounter/releases/tag/Datasets).

2021.04.05: The website of Cuckoo Counter is under construction.

 

Contact

Email: wenjunli@pku.org.cn, wenjun0816@qq.com.