/hachoir_parser

Package of Hachoir parsers used to open binary files

Primary LanguagePython

hachoir-parser is a package of most common file format parsers written for Hachoir framework. Not all parsers are complete, some are very good and other are poor: only parser first level of the tree for example.

A perfect parser have no "raw" field: with a perfect parser you are able to know each bit meaning. Some good (but not perfect ;-)) parsers:

  • Matroska video
  • Microsoft RIFF (AVI video, WAV audio, CDA file)
  • PNG picture
  • TAR and ZIP archive

GnomeKeyring parser requires Python Crypto module: http://www.amk.ca/python/code/crypto.html

Website: http://bitbucket.org/haypo/hachoir/wiki/hachoir-parser

hachoir-parser 1.3.4 (2010-07-26)

  • update matroska parser to support WebM videos

hachoir-parser 1.3.3 (2010-04-15)

  • fix setup.py: don't use with statement to stay compatible with python 2.4

hachoir-parser 1.3.2 (2010-03-01)

  • Include the README file in the tarball
  • setup.py reads the README file instead of using README.py to break the build dependency on hachoir-core

hachoir-parser 1.3.1 (2010-01-28)

  • Create MANIFEST.in to include extra files: README.py, README.header, tests/run_testcase.py, etc.
  • Create an INSTALL file

hachoir-parser 1.3 (2010-01-20)

  • New parsers:
    • BLP: Blizzard Image
    • PRC: Palm resource
  • HachoirParserList() is no more a singleton: use HachoirParserList.getInstance() to get a singleton
  • Add tags optional argument to createParser(), it can be used for example to force a parser
  • Fix ParserList.print_(): first argument is now the title and not 'out'. If out is not specified, use sys.stdout.
  • MP3: support encapsulated objects (GEOB in ID3)
  • Create a dictionary: Windows codepage => charset name (CODEPAGE_CHARSET)
  • ASN.1: support boolean and enum types; fix bit string parser
  • MKV: use textHandler()
  • AVI: create index parser, use file size header to detect padding at the end
  • ISO9660: strip nul bytes in application name
  • JPEG: add ICC profile chunk name
  • PNG: fix transparency parser (tRNS)
  • BPLIST: support empty value for markers 4, 5 and 6
  • Microsoft Office summary: support more codepages (CP874, Windows 1250..1257)
  • tcpdump: support ICMPv6 and IPv6
  • Java: add bytecode parser, support JDK 1.6
  • Python: parse lnotab content, fill a string table for the references
  • MPEG Video: parse much more chunks
  • MOV: Parse file type header, create the right MIME type

hachoir-parser 1.2.1 (2008-10-16)

  • Improve OLE2 and MS Office parsers: - support small blocks - fix the charset of the summary properties - summary property integers are unsigned - use TimedeltaWin64 for the TotalEditingTime field - create minimum Word document parser
  • Python parser: support magic numbers of Python 3000 with the keyword only arguments
  • Create Apple/NeXT Binary Property List (BPLIST) parser
  • MPEG audio: reject file with no valid frame nor ID3 header
  • Skip subfiles in JPEG files
  • Create Apple/NeXT Binary Property List (BPLIST) parser by Robert Xiao

hachoir-parser 1.2 (2008-09-03)

  • Create FLAC parser, written by Esteban Loiseau
  • Create Action Script parser used in Flash parser, written by Sebastien Ponce
  • Create Gnome Keyring parser: able to parse the stored passwords using Python Crypto if the main password is written in the code :-)
  • GIF: support text extension field; parse image content (LZW compressed data)
  • Fix charset of IPTC string (guess it, it's not always ISO-8859-1)
  • TIFF: Sebastien Ponce improved the parser: parse image data, add many tags, etc.
  • MS Office: guess the charset for summary strings since it could be ISO-8859-1 or UTF-8

hachoir-parser 1.1 (2008-04-01)

Main changes: add "EFI Platform Initialization Firmware Volume" (PIFV) and "Microsoft Windows Help" (HLP) parsers. Details:

  • MPEG audio:
    • add createContentSize() to support hachoir-subfile
    • support file starting with ID3v1
    • if file doesn't contain any frame, use ID3v1 or ID3v2 to create the description
  • EXIF:
    • use "count" field value
    • create RationalInt32 and RationalUInt32
    • fix for empty value
    • add GPS tags
  • JPEG:
    • support Ducky (APP12) chunk
    • support Comment chunk
    • improve validate(): make sure that first 3 chunk types are known
  • RPM: use bzip2 or gzip handler to decompress content
  • S3M: fix some parser bugs
  • OLE2: reject negative block index (or special block index)
  • ip2name(): catch KeybordInterrupt and don't resolve next addresses
  • ELF: support big endian
  • PE: createContentSize() works on PE program, improve resource section detection
  • AMF: stop mixed array parser on empty key

hachoir-parser 1.0 (2007-07-11)

Changes:

  • OLE2: Support file bigger than 6 MB (support many DIFAT blocks)
  • OLE2: Add createContentSize() to guess content size
  • LNK: Improve parser (now able to parse the whole file)
  • EXE PE: Add more subsystem names
  • PYC: Support Python 2.5c2
  • Fix many spelling mistakes

Minor changes:

  • PYC: Fix long integer parser (negative number), add (disabled) code to disassemble bytecode, use self.code_info to avoid replacing self.info
  • OLE2: Add ".msi" file extension
  • OLE2: Fix to support documents generated on Mac
  • EXIF: set max IFD entry count to 1000 (instead of 200)
  • EXIF: don't limit BYTE/UNDEFINED IFD entry count
  • EXIF: add "User comment" tag
  • GIF: fix image and screen description
  • bzip2: catch decompressor error to be able to read trailing data
  • Fix file extensions of AIFF
  • Windows GUID use new TimestampUUID60 field type
  • RIFF: convert class constant names to upper case
  • Fix RIFF: don't replace self.info method
  • ISO9660: Write parser for terminator content

Parser list

Archive

  • 7zip: Compressed archive in 7z format
  • ace: ACE archive
  • bzip2: bzip2 archive
  • cab: Microsoft Cabinet archive
  • gzip: gzip archive
  • mar: Microsoft Archive
  • rar: Roshal archive (RAR)
  • rpm: RPM package
  • tar: TAR archive
  • unix_archive: Unix archive
  • zip: ZIP archive

Audio

  • aiff: Audio Interchange File Format (AIFF)
  • fasttracker2: FastTracker2 module
  • flac: FLAC audio
  • itunesdb: iPod iTunesDB file
  • midi: MIDI audio
  • mod: Uncompressed amiga module
  • mpeg_audio: MPEG audio version 1, 2, 2.5
  • ptm: PolyTracker module (v1.17)
  • real_audio: Real audio (.ra)
  • s3m: ScreamTracker3 module
  • sun_next_snd: Sun/NeXT audio

Container

  • asn1: Abstract Syntax Notation One (ASN.1)
  • matroska: Matroska multimedia container
  • ogg: Ogg multimedia container
  • ogg_stream: Ogg logical stream
  • real_media: RealMedia (rm) Container File
  • riff: Microsoft RIFF container
  • swf: Macromedia Flash data

File System

  • ext2: EXT2/EXT3 file system
  • fat12: FAT12 filesystem
  • fat16: FAT16 filesystem
  • fat32: FAT32 filesystem
  • iso9660: ISO 9660 file system
  • linux_swap: Linux swap file
  • msdos_harddrive: MS-DOS hard drive with Master Boot Record (MBR)
  • ntfs: NTFS file system
  • reiserfs: ReiserFS file system

Game

  • blp1: Blizzard Image Format, version 1
  • blp2: Blizzard Image Format, version 2
  • lucasarts_font: LucasArts Font
  • spiderman_video: The Amazing Spider-Man vs. The Kingpin (Sega CD) FMV video
  • zsnes: ZSNES Save State File (only version 143)

Image

  • bmp: Microsoft bitmap (BMP) picture
  • gif: GIF picture
  • ico: Microsoft Windows icon or cursor
  • jpeg: JPEG picture
  • pcx: PC Paintbrush (PCX) picture
  • png: Portable Network Graphics (PNG) picture
  • psd: Photoshop (PSD) picture
  • targa: Truevision Targa Graphic (TGA)
  • tiff: TIFF picture
  • wmf: Microsoft Windows Metafile (WMF)
  • xcf: Gimp (XCF) picture

Misc

  • 3do: renderdroid 3d model.
  • 3ds: 3D Studio Max model
  • bplist: Apple/NeXT Binary Property List
  • chm: Microsoft's HTML Help (.chm)
  • gnomekeyring: Gnome keyring
  • hlp: Microsoft Windows Help (HLP)
  • lnk: Windows Shortcut (.lnk)
  • ole2: Microsoft Office document
  • pcf: X11 Portable Compiled Font (pcf)
  • pdf: Portable Document Format (PDF) document
  • tcpdump: Tcpdump file (network)
  • torrent: Torrent metainfo file
  • ttf: TrueType font

Program

  • elf: ELF Unix/BSD program/library
  • exe: Microsoft Windows Portable Executable
  • java_class: Compiled Java class
  • pifv: EFI Platform Initialization Firmware Volume
  • prc: Palm Resource File
  • python: Compiled Python script (.pyc/.pyo files)

Video

  • asf: Advanced Streaming Format (ASF), used for WMV (video) and WMA (audio)
  • flv: Macromedia Flash video
  • mov: Apple QuickTime movie
  • mpeg_ts: MPEG-2 Transport Stream
  • mpeg_video: MPEG video, version 1 or 2

Total: 78 parsers