/Go.HexDump

An implementation of hexdump in Go

Primary LanguageGoGNU General Public License v2.0GPL-2.0

hexdump

An implementation of hexdump in Go

Like my implementation of sed, I'm really just writing this for fun, and to make my life easier on random non-cygwin Windows machines. It also gives me a chance to really learn the nooks and crannies of these tools. For instance, investigating this implementation I learned a LOT about the types of fancy formats the linux hexdump utility lets you produce.

Status

I now consider the program "finished"... it does all I need from it. A couple features of the GNU hexdump are missing, and may get implemented one day just for completeness. Here are the command-line switches supported:

  • -s: skip ahead in the file
  • -n: only read so much of the file
  • -C: set the canned "canonical" hexdump format, with 16 hex bytes followed by those same bytes as characters.
  • -x: 8 16-bit words in hex (little-endian)
  • -x4: 4 32-bit dwords in hex (little-endian)
  • -e: define a custom format (see below)

Custom Format Strings

The format strings supported by hexdump-go are the same as the ones you'd give to the GNU tool, with the following exceptions:

  • The %_p special format is supported, but I haven't implemented %_c or %_u yet, since I don't use those.
  • Instead of using special format %_a for the location, use a @ (at-sign) before the format string.
  • I didn't implement the '_A' "only print once at the end" version of location, since I never use that option personally.
  • Since I use the tool on windows, where one uses double-quotes to surround an argument, I support either single or double quotes inside the format strings. So on UNIX you'd say -e ' 16/2 "%04X "' and on Windows you'd say -e " 16/2 '%04X '", and it works both ways.

Example custom format: two lines for each set of 16 bytes... the top line in octal, and the bottom line in characters (special characters replaced by '.'):

> hexdump-go -e "@ '%08X: ' 16 '%03o ' '\n'" -e "'\t  ' 16 '%-4_p' '\n'" .gitignore 
00000000: 043 040 103 157 155 160 151 154 145 144 040 117 142 152 145 143
          #       C   o   m   p   i   l   e   d       O   b   j   e   c
00000010: 164 040 146 151 154 145 163 054 040 123 164 141 164 151 143 040
          t       f   i   l   e   s   ,       S   t   a   t   i   c
00000020: 141 156 144 040 104 171 156 141 155 151 143 040 154 151 142 163
          a   n   d       D   y   n   a   m   i   c       l   i   b   s
00000030: 040 050 123 150 141 162 145 144 040 117 142 152 145 143 164 163
              (   S   h   a   r   e   d       O   b   j   e   c   t   s
00000040: 051 012 052 056 157 012 052 056 141 012 052 056 163 157 012 012
          )   .   *   .   o   .   *   .   a   .   *   .   s   o   .   .

(note the example shows how to use the @ indicator to mark the format string for the current location. I think it's better than the %_a marker, personally, though there is room to disagree.)

Implementation Notes

I developed the framework with user-defined formatting in mind. Everything is driven from the formatter interface:

type formatter interface {
	bytesNeeded() int
	format(loc uint64, bytes []byte) string
}

Composite formats come in "parallel" and "series" varieties. When you define a format with -e, you are making a serial list of formats -- for example: a location formatter, then a 16-byte double-word formatter. Each of those components takes some bytes and gives what's left to its neighbors in the series. All of those -e series formats are combined into a single parallel format, so that they each get to read all of the bytes.
Series and parallel slices of formats can be arbitrarily nested.

A parser (written with go tool yacc) turns string formats into instances of formatter. All of the canned formats (-x, -C, etc.) are stored as format strings in main.go exactly as they could have been given via the -e flag on the command line, and they get run through the same parser that -e formats use.