The snatch Network IDS

By Gianni Tedesco. Last updated: 2018-04-09

snatch is an experimental network intrusion detection system. The main features are:

a rule compiler which converts rules in to decision trees which are then converted to x86-64 assembly and assembled in to an x86-64 ELF object. String matching is done with Intel's hyperscan library.
Protocol decoders are written as a combination of a protocol description language which is a sort of markup/C-struct like language that also allows us to define how messages depend on, and are affected by, per-flow or global state.

The aims are to produce a network IDS which supports various detection techniques, is reliable and easy to develop decoders for, and which evaluates large existing rule-sets efficiently.

Front End - Rule Parser

Our rule parser is written in python. It parses a collection of rules and normalizes them in to something like conjunctive normal form. The rules are output to a set of binary files, one file for each protocol, and this is used as the input to the rule-optimizer and code-generator.

Rule Optimizer

When run, the optimizer produces a tree of C and x86-64 assembly files. The intermediate representation is a decision tree. The decision tree is optimised fully by brute force if it is small, or heuristically if it is large. The heuristics chosen seem to work better than established, entropy-based, systems such as ID3.

The decision trees output for complex rule-sets can be pretty large. Furthermore, because of the proliferation of leaf nodes we are constrained to only being able to realistically produce a single set of fast-patterns. Because all of the fast-patterns are combined in to a single hyperscan database we're hitting the worst cases of hyperscan. Although still fast, we're spending upwards of 70% of the runtime in the fast-pattern match engine.

Code Generation

Code generation is pretty straightforward. But we're avoiding doing register allocation. We may need it eventually but it's not clear. The current scheme has pre-defined roles for registers and it can elide all repetetive loads.

Most nodes in the decision tree are integer nodes. In that case we are emitting code which does binary searches over intervals to find the next node.

While the generated code is large, the longest possible path through the code is very short. Proportional to the number of fields which are used in the rule-set. A representative rule-set for a popular IDS would compile in to about 30-60MB of machine code.

C code is generated for the string matching parts. Each rule consists of a chain of function calls emmitted by the rule compiler. There are specialised cases for relative/absolute string matches. Matches where depth = pattern-length where are just emitting single u32 comparisons rather than a needless boyer-moore search for example. Likewise we have specialised implementations for byte_test, byte_extract and byte_jump. The code for each individual rule is all inlined to ensure that the compiler can take care of any further redundancies not avoided by these specialisations.

String-Set Clustering

A question arises of whether we want to break down the big multi-string match in to clusters. Doing this per-leaf-node would be hugely wasteful (if even possible with todays hardware). But doing it over all strings certainly leads to bad cases and is the determining factor in performance.

The issue here is that if we don't want to take the chance of having to run two or more fast-pattern searches on the same packet then however we want to cluster string-sets must be driven by the decision-tree construction algorithm and will also be constrained by the internal structure of the rules themselves.

Many of the bad cases are constrained behind flow-bits or other checks which makes the "bad" rules (ie. those with very short fast-patterns) appear quite sensible. A quick option would be not to add such fast-patterns to the main set and instead perform a single one-off check after doing the flowbits lookups.

As well as this, if we can't find a way to reduce the number of leaf nodes would just be to cluster fast-pattern sets by similarity. For example, all HTTP-related ones combined together. In fact, it makes sense to move protocol specific rules in to separate decision-trees alltogether to avoid the remotest possiblity of their fast-patterns being combined. The issue here is that rule-set authors may not be expecting such a change in semantics and we would end up having to match all of the TCP and all of the HTTP rules against a single packet, too.

Network IDS Runtime

The IDS is like a runtime environment for the rules. Describe the decoder structures. We will want to use DPDK here later.

State tracking

Lots of experience to share about data structures for this. Although DPDK has some interesting stuff.

PLANG compiler

Probably needs its own page.

Results

Lots to be done here. But around 4-5x existing systems. Still most of the cost is in string-matching. But the rule-matching part is basically reduced to zero cost. Does not even show up as significant in profiles.