snatch is an experimental network intrusion detection system. The main features are:
The aims are to produce a network IDS which supports various detection techniques, is reliable and easy to develop decoders for, and which evaluates large existing rule-sets efficiently.
Our rule parser is written in python and outputs a C program which describes all of the rules as C data structures.
Right now the rules are compiled directly in to the rule optimizer. When run, the optimizer produces a tree of C and x86-64 assembly files. The intermediate representation is a decision tree. The decision tree is optimised fully if it is small, or heuristically if it is large. The heuristics chosen seem to work better than established, entropy-based, systems such as ID3.
In future, it'd be good to experimeent with some more detailed heuristics, as well as perhaps stochastic searches, or full optimisation of bottom sub-trees.
Code generation is pretty straightforward. But we're avoiding doing register allocation. We may need it eventually but it's not clear. The current scheme has pre-defined roles for registers and it can elide all repetetive loads.
Most nodes in the decision tree are integer nodes. In that case we are emitting code which does binary searches over intervals to find the next node.
While the generated code is large, the longest possible path through the code is very short. Proportional to the number of fields which are used in the rule-set. A representative rule-set for a popular IDS would compile in to about 30-60MB of machine code.
C code is generated for the string matching parts. Describe how hyperscan is used.
Interfacing with C code can be an issue - how variables are accessed differs depending if we are -fPIE or not, for example. Likewise, with LTO whole-program compiles we have issues (ie. if we bring in rule code, we're not -fwhole-program any more). It would be nice to generate C instead of assembly, but since so much is generated, it would cause compile times to be really high. Just adding each function to the symbol table is going to slow the compiler down massively.
There is a question of deliberately generating less efficient code in order to reduce the size of the compiled code and make better use of I-cache, or to speed up compile time. But it remains to be seen if this will be useful.
Another question is whether we want to break down the big multi-string match in to clusters. Doing this per-leaf-node would be hugely wasteful (if even possible with todays hardware). But doing it over all strings seems like it could lead to bad cases too (although hyperscan seems to be doing a good job with it). Describe min-cuts algorithm approach.
The IDS is like a runtime environment for the rules. Describe the decoder structures. We will want to use DPDK here later, or at least af_packet v4 if it will be ready.
Lots of experience to share about data structures for this. Although DPDK has some interesting stuff.
Probably needs its own page.
Lots to be done here. But around 10x existing systems. Still most of the cost is in string-matching. But the rule-matching part is basically reduced to zero cost. Does not even show up as significant in profiles.
Copyright (c) Spanish Inquisition 1478-1834. All rights reversed.