Dynamic hierarchical header Grid

Hierarchical header is an header column/row that its meaning must be derived from multiple levels. Below is an example of hierarchical headers:

____________________________________________________
|                     IP Group 1                   |
____________________________________________________
|         1.1.1.1        |          1.1.1.2        |
____________________________________________________
|   RX       |     TX    |    RX       |     TX    |
____________________________________________________ 

Obviously, the meaning of column 1 cannot determined by its immediate header Rx. The meaning is RX of 1.1.1.1 of IP Group 1 (received packets by IP address 1.1.1.1 of IP Group1 ).  For the example above, hierarchical header is easy to implement by merging cell (1, 1) + (1,4), cell(2,1) + cell(2,2), (2,3) + (2,4), (2,3) + (2,4).

Static hierarchical headers is easy to implement as we know exactly what cells to merge.

Dynamic hierarchical headers is much harder as user or event can add/delete at any level.

For example, if a new computer w/ 3 NIC cards added to network, you get a whole new IP Group 2 w/ 3 IPs, each IP has 2 columns, making it 6 columns total. If just an IP got added to IP Group 1, you get 2 more columns. To make the worst, there are combined events such as adding a new group and removing a IP from existing group…

Using positional computation is not practical here.

So I invented Tree-To-Grid layout method.
Using a tree to represent the hierarchical headers for both column and row header (row header can be hierarchical too). All adding/removing  are done logically via the tree. If user wants to add a IP Group, get column root node and add to it. If a event wants to add just an IP to an IP Group, it searches for the IP Group in the tree, then add to it.

How to draw:
– Intialization: Traverse the tree once to get number of columns/rows from tree. To eliminate add/remove column/row frequently, we reset all merge cells; use number of columns/rows obtained previously to initialize the grid.
– Assume that all child node occupies 1 column or row.  Through recursion, we will be able to find out from what cell to what cell a parent node occupies and merge them accordingly.

Issues to consider (not cover here, may be next post):
– Match data with header
– Reverse lookup: cell to header tree node
– …

Dynamic hierarchical header Grid

My research on fast multi-pattern search algorithms

I was lucky to have a chance to research a way to improve my company text parser. The key to efficient parsing is lazy parsing that parses only what we need, ignore the rest, which essentially a string matching problem. So I did research on multi-pattern search algorithms. Here are summary of the research:
Single pattern matching algorithms
I include single pattern matching algorithms as a basis for understanding multi-pattern matching algorithms.

  • Boyer-Moore (BM) uses bad character heuristic. It is the foundation of the search algorithms. It is simple, easy to implement, yet effective and fast. To learn about search algorithm, you must understand BM. Since Boyer-Moore-Horspool (BMH) is simpler than BM, I recommend to learn about BMH first.
    It is used in Snort.
     Learn more from wikipedia
  • Boyer-Moore-Horspool (BMH): It is a simplification of the Boyer-Moore algorithm.  It is amazingly simple.
    Learn more from wikipedia
  • Markatos-Anagnostakis ExB:  exclusion based string matching algorithm. It has 2 assumptions:
    1/ Most traffic will not trigger the patterns.
    2/ The text is not too big b/c the chance the pattern appears will increase as the text gets bigger. 
    This algorithm can also be used in multi-pattern search by repeating single pattern search.
  • Markatos-Anagnostakis E2xB: an enhancement to previous ExB. It uses interger for variable exists[256].
Multi-pattern matching algorithm
  • Aho-Corasick algorithm used automata approach. The automaton consumes one character from the input text each time. The character will drive the automaton to a new state which represents all the partially matched patterns.
    It is used in UNIX fgrep & egrep (with -f option) and compilers.
  • Commentz-Walter  combined Boyer-Moore technique with the Aho-Corasick algorithm.
    It is implemented in gre and GNU fgrep 2.0.
  • Baeza-Yates combined Boyer-Moore-Horspool algorithm with the Aho-Corasick algorithm.
  • Manber-Wu uses the bad character heuristic of Boyer-Moore algorithm. By noticing that the large number of patterns will decrease the chance to get bigger movement, this algorithm use a block of characters to find a movement offset.
  • Kim-Kim: compact encoding and hash to decrease the number of potential patterns.
  • Fisk-Varghese proposed Setwise Boyer-Moore-Horspool. The idea is using bad character heuristic and suffix trie.
  • Coit-Staniford-McAlerney proposed AC-BM, almost the same as Setwise Boyer-Moore-Horspool. using bad character and good prefix heuristic  plus prefix trie.
My research on fast multi-pattern search algorithms

Hierarchical address arithmetic algorithm

I invented this algoritm a while back. It allows you to perform arithmetic operations ( add, subtract) on any hierarchical address scheme. 


IP example: 1.1.1.255 + 257 = 1.1.3.0
Generic example:  For a scheme: a[1-2]/b[3-4]/c[5-6]: a1/b3/c6 + 3 = a2/b3/c5

Notes:
A hierarchical address (HA) or multi-level address is a logical address divided into levels in left to right order of significance. IP address 1.1.1.1 and Termination ID tdm/3/28/24 (RFC 3525) are examples of hierarchical address.

Hierarchical Address can also be in binary form by using hierarchical address table to define level/significance.

Hierarchical address arithmetic algorithm