FFW Dictionary Support

Small feature, much work.

While studying fuzzing papers, it appears that dictionary support greatly improves code coverage. It is like a bypass/alternative for symbolic execution. I decided to quickly implement it, but was surprised on how much work it took to finish.

The basic premise is simple: Having a list of words. If one of these words are found in a packet, exchange it with all the (other) words in the dictionary.

Example by code:

dictionary = [ 'test1', 'test2' ]

networkMessages = [ 
    'bla test1', 'blubb test2 test2' ]

Result:

modifiedNetworkMessage = [ 
    [ 'bla test2', 'blubb test2 test2' ],
    [ 'bla test1', 'blubb test1 test2' ],
    [ 'bla test1', 'blubb test2 test1' ]

Problem

The dictionary can be small, or very very big. Therefore I want to perform dictionary testing mixed with the normal fuzzing. But one interesting restriction is that the fuzzer has to know when the Dictionary mutator is "finished" - there are no more permutations to perform. This will allow FFW to focus on the other mutators.

This was kinda interesting algorithmic question: I could have dictionaries with 100'000 entries in size, with several 1000 corpuses, each maybe having 10's of these dictionary words in them. Storing all possible permutations is very storage intensive. Not storing anything and just calculating on the fly, e.g. via a index, seems to be too CPU intensive, and does not produce nice code. I made something in between.

Solution

I'll identify all occurrences of the Dictionary, and store their msg-index + byte offset in a python-dict. Additionally I store a wordlist-index: The index into the worldlist at the word which we will replace the next time the dictionary mutator is called. This should keep the store requirements to a reasonable level, while still performing quickly during fuzzing.

So in the example above, the python-dict is:

 {
            'networkMsg': 0,
            'byteOffset': 4,
            'word': 'test2',
            'replaceWordIdx': 0,
            'counter': 0,
 },
 {
            'networkMsg': 1,
            'byteOffset': 6,
            'word': 'test2',
            'replaceWordIdx': 0,
            'counter': 0,
 },
 {
            'networkMsg': 1,
            'byteOffset': 12,
            'word': 'test2',
            'replaceWordIdx': 0,
            'counter': 0,
}

networkMsg, byeOffset and word is to identify the location of the identified dictionary word. replaceWord is the index in the dictionary table - it shows which word from the dictionary will be taken (instead of the referenced one). Note that we have the exclude the actual word itself, or it would be an noop. We know we iterated through all possible words in the wordlist if replaceWordIdx is equal to len(dictionary) - 1 (again, the word itself is a noop). The counter is just a simple counter which increments on every word-exchange - it is only used to be stored in the fuzzed message as a kind of seed. With this seed it is possible to identify the actual transformation we performed.

FFW How-to

Create file with dictionary, separated by newline, called dictionary.txt in your bin path:

cd ~/ffw/<project>/
$ cat dictionary.txt
test1
test2

The fuzzer only uses it if the dictionary mutator is configured:

cat config.py:
[...]
mutators: [ 'Radamsa', 'Dictionary' ]
[...]

I also have a unit test in test/test_dictionaryfuzzer.py.