Friday, April 27, 2012

Binary Grep Program: SearchBin

SearchBin is a fast commandline program for searching within binary files. It's a bit like grep for binaries.

It has three capabilities for searching.
-Search for bytes using hexidecimal
-Search for a plain text string
-Search for a smaller binary file

Search for the hex bytes "FF14DE" in the file gamefile.db:
$ ./ -p "FF14DE" gamefile.db
Match at offset:            907          38B in  gamefile.db
Match at offset:           1881          759 in  gamefile.db
Match at offset:           7284         1C74 in  gamefile.db
Match at offset:           7420         1CFC in  gamefile.db
Match at offset:           8096         1FA0 in  gamefile.db
The printed offsets are listed in decimal and hex formats.

You can also search for unknown patterns with "??". Just insert them where ever you have an unknown byte:
$ ./ -p "FF??DE" gamefile.db

You can search through multiple files at once, and search piped input:
$ ./ -p "FF??EE" gamefile.db supersecret.idx
$ cat gamefile.db | ./searchbin -p "0xFF??EE"

You can also search using regular text strings and other binary files.
$ ./ -t "hello" gamefile.db
./ -f binaryfile gamefile.db 

Options of SearchBin:

$ ./ --help

Optional Arguments:
  -h, --help            show help message and exit
  -f FILE, --file FILE  file to read search pattern from
  -t PATTERN, --text PATTERN
                        a (non-unicode case-sensitive) text string to search
  -p PATTERN, --pattern PATTERN
                        a hexidecimal pattern in format '0xFF'
  -b NUM, --buffer-size NUM
                        read buffer size (in bytes). 8MB default
  -s NUM, --start NUM   starting position in file to begin searching
  -e NUM, --end NUM     end search at this position, measuring from beginning
                        of file
  -m NUM, --max-count NUM
                        maximum number of matches to find
  -l FILE, --log FILE   write matched offsets to FILE, instead of standard
  -v, --verbose         verbose, output the number of bytes searched after
                        each buffer read
  -V, --version         print version information

Extra Notes:
An argument -t or -p or -f is required. The -p argument accepts a
hexidecimal pattern string and allows for missing characters,
such as 'FF??FF'. When using -f argument, the pattern file will
be read as a binary file (not hex strings). If no search files are
specified, %prog will read from standard input. The minimum memory
required is about 3 times the size of the pattern byte length.
Increasing buffer-size will increase program search speed for
large search files. All size arguments (-b -s -e) are read in decimal
format, for example: '-s 1024' will start searching after 1kilobyte.
Pattern files do not allow for wildcard matching.
Reported matches are displayed as 0-based offset.

Further Examples:
Search for the text string "Tom" in myfile.exe. Text is case sensitive.
./ -t "Tom" myfile.exe

Search for the text string "T?m" in myfile.exe, where ? is a wildcard. This will match "Tom" "Tim" "Twm" and all other variations, including non-printing bytes.
./ -t "T?m" myfile.exe

Search for the hexidecimal pattern "AABBCCDDEE" in myfile.exe.
./ -p "AABBCCDDEE myfile.exe

Searches for the hexidecimal pattern "AA??CC??EE" in myfile.exe, where ?? can be any byte value.
./ -p "AA??CC??EE" myfile.exe

Takes the binary file pattern.bin, and searches for an exact match within myfile.exe.
./ -f pattern.bin myfile.exe

+No compiling necessary
+Requires Python 2.7 or Python 3
+Less code
+Search in files of unlimited size
keywords: hex hexidecimal binary like grep search seek find fast

DOWNLOAD it from here:


Report Problems, Suggestions, or Thanks to sepero 111 @ gmx . com

2012 Jun 19 Update:
Major over haul to search function. Dramatically increased search speed for wildcard patterns. Also included search functionality for regular text strings.

2012 Jun 28 Update:
Made updates to Readme file and added more comments to code for readability.
Also, I moved all files to Mercurial and

2012 Jul 07 Update:
Unifying all documentation. Publishing to

2013 Feb 02 Update:
Switched code indentation to tabs as it is a universal standard. Moved code back to github due to its popularirty.

2013 Oct 11 Update:
Added Python 3 support. Added in more unittests. Added in more information code comments. Updated documentation.

No comments:

Post a Comment