[Sigcomm05]Algorithms for Advanced Packet Classification with Ternary CAMs

Paper: Algorithms for Advanced Packet Classification with Ternary CAMs.pdf

Slide: Algorithms for Advanced Packet Classification with Ternary CAMs_ppt

会议: Sigcomm 2005;

引用:160+;

作者:K.Lakshminarayanan等,加州伯克利

级别:独立编码鼻祖论文,重要,精读;

本文总结:

本文核心创新点是range expansion问题 & multi-match问题,对于range存储,本文的Database Independent Range PreEncoding (DIRPE)是首个独立编码方案(文章编码方案是fence encoding),因此在可扩展性和更新上有优势,对于多匹配,本文提出了Multi-match Using Discriminators (MUD)算法,可避免预查找并适合多线程。

  • 本文中解决TCAM中range expansion问题的DIRPE算法motivation有两点:1)First, instead of representing a range as a set of prefixes, we can represent it as a set of arbitrary ternary values. 2)Second, additional unused bits in a TCAM array can be used to encode the ternary strings.基于上述两点,作者提出了fence encoding方案,table 3给出了编码方法,并且作者从理论上证明了这种编码方案存在这样一个定理:For achieving a worst-case row expansion of 1 for a W-bit range, 2expW-1 bits are necessary。查找的时候,只需要将具体的关键字采用table 3的方案第一行进行转换即可形成TCAM查找关键字。而这么大的比特位来编码肯定是不现实的,咋办?作者提出了切分成chunk的优化方案:To form the ternary representation of a range, let us divide the field into multiple chunks, where each chunk represents a contiguous portion of the bits of W.这样一来,各个chunk的指数就变小了,最后的总比特位用量就是各个chunk部分的用量总和。而对于分chunk后存在的range expansion问题,作者给出了最坏理论结果:We refer to the number of chunks, l, as the number of levels also。The worst-case expansion of a range is 2l+1。可见,chunk分得最多,比特位用量最少,但是range膨胀也会越厉害:As the number of levels l increases, the worst-case expansion increases,and since the chunk widths decrease, the width of the encoded range also decreases。最后,作者结合“database dependent schemes”提出了混合方案来进一步提升效能,也就是对出现较多的range采用类似于HotI02年的相关编码:the k most frequent ranges are computed from the database, and a single bit is assigned to each of the ranges, thus reducing the expansion of all those ranges to 1.此外,作者指出:For achieving a row expansion of 1, we have shown that 2expW-1 bits are necessary, but finding the optimal database-independent encoding is an open problem
  • 本文中解决multi-match问题的MUD算法的是通过增加指示器(Discriminators)实现的。作者首先对前人的multi-match工作做了总结,第一类是Entry-Invalidation Scheme;第二类是发表在HotI04上的Geometric Intersection-based Scheme。第一类做法很简单,就是给每个TCAM entry后面加一位valid bit, 初始时所有entry对应的valid bit都是设置为通配符*,查找关键字最后则增加一位关键字1再进行TCAM查找,因此相当于原始TCAM查找,而当找到一个匹配entry时,则将这个entry对应的valid bit修改为0(下次这条规则就会被忽略),再继续上述过程直到停止,这类算法问题是不利于多线程,可以通过增加比特位来实现,但是会增加TCAM位需求量。第二类则是对规则集进行预处理,但是问题是规则集大了后会出现严重的膨胀问题。作者提出的MUD就是对各个entry增加一个指示器,指示如果这个entry匹配了应该继续查找后面的entry!We first present the basic idea behind MUD. Let the result of searching a TCAM with a key be the rule with index j. To get the next matching result from the TCAM, we need to perform the search on all the entries after index j (指示器就是实现这么简单的道理,因此指示器就是rule序列号).因此,对于N条rule,log2N位就可以了,因此指示器很简单,难的关键是查找过程,关键字也有相应的指示器,初始时查找关键字的指示器位都设为通配符,因此第一次查找就相当于原始TCAM查找,关键是找到一个entry后(第i个),这时候关键字的指示器就需要修改,根据上面的basic idea可知,下次关键字指示器内容就是 >i,而TCAM不支持range,因此就需要转换为多个prefix并对多个prefix分别查找(参见伪代码)直到停止。可见,By using discriminators we have reduced the multimatch problem to the range representation problem.而优缺点则是The benefits that MUD offers—–support for multi-threaded environment and low update cost— –come at the expense of search cycles. 此外,作者又提出了一些改进优化方案,包括1)Assigning Discriminators to Sets of Entries; 2)Assigning Discriminators using DIRPE;3)A Set PruningBased Approach。方法一思想是if several rules can be grouped into a set such that any search key can match at most one rule in this set, then all the rules in this set can be given the same discriminator value.其实这是利用了TCAM查找特性,经典的应用就是相同前缀的IP地址在TCAM块中不分先后顺序。方法二则是将前面的DIRPE编码引进来,从而降低原始MUD中关键字指示器range转prefix的膨胀问题。方法三则是基于When some matches are found, the total number of entries that can possibly match the search key must reduce. Hence, after finding the first few matches, if the list of potential matches is small, then they can be searched quickly by a linear search.

Range expansion的Worst-case:本文指出,worst-case为1时,需要的extra bits是2^w-1位。如图8所示!

最后,作者指出了两个未来的研究方向:We believe that the following future directions are interesting. The first direction deals with finding the optimal encoding of ranges when a certain number of ternary bits are available. The second direction is to investigate how TCAMs and RAMs can be combined to achieve deterministic search throughput at low costs while scaling to real-life databases with millions of rules

Vincent点评:

本文的fence encoding在切分chunk后可以有效降低比特位需求量,但是会影响更新性能,本文的MUD本质是通过指示链实现线性查找得出多个匹配结果,因此会影响性能!不过本文的编码方案可以考虑往小field上用,也许会较好,此外,MUD的子集划分优化方案可以考虑能不能应用于多维度子集划分。

Review From A. Liu

fig2

 

文中图标:

fig3

Leave a Reply

Your email address will not be published.