/pybfd

A Python interface to the GNU Binary File Descriptor (BFD) library.

Primary LanguagePythonGNU General Public License v2.0GPL-2.0

_______             _______   ________  _______  
|       \           |       \ |        \|       \ 
| $$$$$$$\ __    __ | $$$$$$$\| $$$$$$$$| $$$$$$$\
| $$__/ $$|  \  |  \| $$__/ $$| $$__    | $$  | $$
| $$    $$| $$  | $$| $$    $$| $$  \   | $$  | $$
| $$$$$$$ | $$  | $$| $$$$$$$\| $$$$$   | $$  | $$
| $$      | $$__/ $$| $$__/ $$| $$      | $$__/ $$
| $$       \$$    $$| $$    $$| $$      | $$    $$
 \$$       _\$$$$$$$ \$$$$$$$  \$$       \$$$$$$$ 
          |  \__| $$                              
           \$$    $$                              
            \$$$$$$                               

PyBFD

What is PyBFD?

A Python interface to the GNU Binary File Descriptor (BFD) library.

It's a complete (or at least tries to be) wrapper around the low level functionality provided by GNU Binutils libopcodes and libbfd. This allows the user to manipulate all the supported architectures and file formats that Binutils tools does.

Supported architectures

The PyBFD package installer (setup.py) will automatically detect the installed version of Binutils and extract all its supported architectures.

This includes THREE possibilities:

  • The number of supported architectures will depend on the installed version of Binutils and the targets specified when it was built. Most probably the default installation of Binutils will only support the host architecture (i.e. if your host is X86 then Binutils will support only x86).

  • On certain distributions there is a 'multiarch' package of Binutils. This is the same Binutils supporting the same host architecture as well as several well-known (and others not being in use since Woodstock :-P). It does not include ALL the supported architectures in Binutils but will include the ones you'll bump into 99% of the time so that should be enough in most of the cases. Also, on Mac OSX Binutils can be installed using MacPorts or Homebrew.

  • You can download and compile Binutils from GNU FTP server. The only rules to follow are that it must be compiled as a static library with PIC (Position-Independent Code) enabled. Binutils source code can be obtained from http://ftp.gnu.org/gnu/binutils/

Supported platforms

  • Linux (developed under Ubuntu but all Debian derived distros should work).
  • Mac OS X (MacPorts and Homebrew)
    • on OSX libintl must be installed to compile this extension (i.e. brew install gettext)

Notice: Microsoft Windows is not supported at this time but there are plans to support it as well in a near future.

Installation

There are a couple of ways to download and install PyBFD. Both of them include downloading the source code and compiling it.

To install it with the system Binutils it can be done with just one command using easy_install (this is the preferred way for new users !!!).

The other way consists on downloading (either from PyPi or grabbing the latest source code from GitHub) and compile and link against a version of Binutils installed on a specific location (i.e. other than /usr/lib).

Automatically download and install using easy_install

In the following example the package binutils-multiarch was installed:

$ easy_install pybfd

<... some output removed ...>

[+] pybfd 0.1.0 / Supported architectures:
        bfd_arch_m68k        : Motorola 68xxx
        bfd_arch_sparc       : SPARC
        bfd_arch_spu         : PowerPC SPU
        bfd_arch_mips        : MIPS Rxxxx
        bfd_arch_i386        : Intel 386
        bfd_arch_l1om        : Intel L1OM
        bfd_arch_k1om        : Intel K1OM
        bfd_arch_powerpc     : PowerPC
        bfd_arch_rs6000      : IBM RS/6000
        bfd_arch_hppa        : HP PA RISC
        bfd_arch_sh          : Renesas / SuperH SH (formerly Hitachi SH)
        bfd_arch_alpha       : Dec Alpha
        bfd_arch_arm         : Advanced Risc Machines ARM.
        bfd_arch_m32r        : Renesas M32R (formerly Mitsubishi M32R/D)
        bfd_arch_ia64        : HP/Intel ia64
        bfd_arch_s390        : IBM s390
        bfd_arch_aarch64     : ARM AArch64

Pull source code and compile with a specific Binutils

  • First we have to download and install Binutils with the required architectures (see supported architectures section).
  • Then grab the latest PyBFD version and compile/link against it.
    $ git clone https://github.com/Groundworkstech/pybfd.git
    $ cd pybfd
    $ python ./setup.py install --with-static-binutils=/path/to/specific/binutils

    <... some output removed ...>

    [+] pybfd 0.1.0 / Supported architectures:
        bfd_arch_m68k        : Motorola 68xxx
        bfd_arch_vax         : DEC Vax
        bfd_arch_i960        : Intel 960
        bfd_arch_or32        : OpenRISC 32
        bfd_arch_sparc       : SPARC
        bfd_arch_spu         : PowerPC SPU
        bfd_arch_mips        : MIPS Rxxxx
        bfd_arch_i386        : Intel 386
        bfd_arch_l1om        : Intel L1OM
        bfd_arch_k1om        : Intel K1OM
        bfd_arch_i860        : Intel 860
        bfd_arch_i370        : IBM 360/370 Mainframes
        bfd_arch_m88k        : Motorola 88xxx
        bfd_arch_h8300       : Renesas H8/300 (formerly Hitachi H8/300)
        bfd_arch_pdp11       : DEC PDP-11
        bfd_arch_powerpc     : PowerPC
        bfd_arch_rs6000      : IBM RS/6000
        bfd_arch_hppa        : HP PA RISC
        bfd_arch_d10v        : Mitsubishi D10V
        bfd_arch_d30v        : Mitsubishi D30V
        bfd_arch_dlx         : DLX
        bfd_arch_m68hc11     : Motorola 68HC11
        bfd_arch_z8k         : Zilog Z8000
        bfd_arch_h8500       : Renesas H8/500 (formerly Hitachi H8/500)
        bfd_arch_sh          : Renesas / SuperH SH (formerly Hitachi SH)
        bfd_arch_alpha       : Dec Alpha
        bfd_arch_arm         : Advanced Risc Machines ARM.
        bfd_arch_ns32k       : National Semiconductors ns32000
        bfd_arch_w65         : WDC 65816
        bfd_arch_tic30       : Texas Instruments TMS320C30
        bfd_arch_tic4x       : Texas Instruments TMS320C3X/4X
        bfd_arch_tic54x      : Texas Instruments TMS320C54X
        bfd_arch_tic6x       : Texas Instruments TMS320C6X
        bfd_arch_tic80       : TI TMS320c80 (MVP)
        bfd_arch_m32c        : Renesas M16C/M32C.
        bfd_arch_m32r        : Renesas M32R (formerly Mitsubishi M32R/D)
        bfd_arch_mn10200     : Matsushita MN10200
        bfd_arch_mn10300     : Matsushita MN10300
        bfd_arch_fr30        : Fujitsu FR30
        bfd_arch_frv         : Fujitsu FRV
        bfd_arch_moxie       : The moxie processor
        bfd_arch_mcore       : Motorola MCore
        bfd_arch_mep         : Toshiba Media Embedded Processor (MEP)
        bfd_arch_ia64        : HP/Intel ia64
        bfd_arch_iq2000      : Vitesse IQ2000.
        bfd_arch_mt          : Morpho Technologies MT
        bfd_arch_pj          : Pico Java architecture
        bfd_arch_avr         : Atmel AVR microcontrollers.
        bfd_arch_bfin        : ADI Blackfin
        bfd_arch_cr16        : National Semiconductor CompactRISC (ie CR16).
        bfd_arch_cr16c       : National Semiconductor CompactRISC.
        bfd_arch_crx         : National Semiconductor CRX.
        bfd_arch_rx          : Renesas RX.
        bfd_arch_s390        : IBM s390
        bfd_arch_score       : Sunplus score
        bfd_arch_openrisc    : OpenRISC
        bfd_arch_mmix        : Donald Knuth educational processor.
        bfd_arch_xstormy16   : XStormy16
        bfd_arch_msp430      : Texas Instruments MSP430 architecture.
        bfd_arch_xc16x       : Infineon XC16X Series.
        bfd_arch_xtensa      : Tensilica Xtensa cores.
        bfd_arch_z80         : Zilog Z80
        bfd_arch_lm32        : Lattice Mico32
        bfd_arch_microblaze  : Xilinx MicroBlaze.
        bfd_arch_tilepro     : Tilera TILEPro
        bfd_arch_tilegx      : Tilera TILE-Gx
        bfd_arch_aarch64     : ARM AArch64
        bfd_arch_rl78        : RL78

The --with-static-binutils command line parameter allows the installer script to use a binutils installation different from the one installed on the system. If the -with-static-binutils parameter is not specified then the installer script will lookup the system Binutils and detect its supported architectures and the resulting PyBFD installation will be the exact same as the one performed using easy_install.

Sample python scripts

There are a couple of sample scripts included in the PyBFD package including a partial (but completely useful) reimplementation of objdump in Python using PyBFD called objdump.py.

A basic sample code is located in bfd.py, opcodes.py and a more complete usage is shown in objdump.py.

Basic usage example

Below is the code located at the end of bfd.py script:

    #!/usr/bin/env python

    # Import the disassembly library (libopcodes)
    from pybfd.opcodes import Opcodes, OpcodesException
    from pybfd.bfd import Bfd, BfdException

    from sys import argv
    if len(argv) == 1:
        print "Usage : %s <filename>" % argv[0]
        return

    bfd = None

    try:
        #
        # Initialize BFD instance.
        # We can either pass a filename or a file descriptor and they will be used
        # in the same way.
        #
        print "[+] Creating BFD instance..."
        bfd = Bfd(argv[1])

        # Print the file format and in case that its an archive then just show
        # its files and leave.
        print "[+] File format     : %s" % bfd.file_format_name

        if bfd.is_archive:
            print "[-] List of internal files:"
            #for inner_filename in bfd.archive_filenames:
            #    print "\t%s" % inner_filename

            for inner_bfd in bfd.archive_files:
                print "\t%-40s - sections : %d - symbols : %s" % \
                    (inner_bfd.filename,
                    len(inner_bfd.sections),
                    len(inner_bfd.symbols))

            # The bfd.close() is executed below in the finally clause.
            return

        #
        # Display some information about the currently open file.
        #
        print "[+] Architecture    : %s (%d)" % \
            (bfd.architecture_name, bfd.architecture)
        print "[+] BFD target name : %s" % bfd.target
        print "[+] Entry point     : 0x%X" % bfd.start_address
        print "[+] Sections        : %d" % len(bfd.sections)
        #print "\n".join([str(s) for s in bfd.sections])

        #
        # Get the .text section for further usage.
        #
        section_name = ".text"
        section = bfd.sections.get(section_name)
        if not section:
            print "[-] No section \'%s\' available." % section_name
            return

        #
        # Display its name (we get it from the section instance) and its index
        # inside the binary file.
        #
        print "[+] Selected section information:"
        print "\tName   : %s" % section.name
        print "\tIndex  : %d" % section.index

        # Dump the section content to a buffer
        content = section.content

        # Display approximate section length.
        length = len(content) / 1024

        if length == 0:
            length = len(content) % 1024
            length_unit = "Bytes"
        else:
            length_unit = "Kbytes"

        print "\tLength : %(length)d %(length_unit)s" % vars()

        #
        # At this point we'll disassemble the entire section content. We'll obtain
        # instruction address, size, type and disassembly (in text format).
        #
        try:
            # Create opcodes instance to start code section disassembly.
            opcodes = Opcodes(bfd)

            # Set the code area we'll move through.
            opcodes.initialize_smart_disassemble(content, section.vma)

            # Set an internal -ready to use- callback function to print disassembly
            # information from the current section content.
            opcodes.start_smart_disassemble(0, opcodes.print_single_instruction_callback)

        except OpcodesException, err:
            print "[-] Opcodes exception : %s" % err

        #for vma, size, disasm in opcodes.disassemble(content, bfd.start_address):
        #    print "0x%X (%d)\t %s" % (vma, size, disasm)
        #    print disasm, opcodes.build_instruction(vma, size, disasm)
        #    pass

    except BfdException, err:
        #print_exc()
        print "Error : %s" % err

    finally:
        if bfd:
            # Check is we're working with an archive and close archived files
            # before closing the container.
            try:
                # Release inner BFD files in case we're an archive BFD.
                if bfd.is_archive:
                    [inner_bfd.close() for inner_bfd in bfd.archive_files]
            except TypeError, err:
                pass

            # Release the current BFD and leave. 
            bfd.close()