hachoir-parser is a package of most common file format parsers written for Hachoir framework. Not all parsers are complete, some are very good and other are poor: only parser first level of the tree for example.
A perfect parser have no "raw" field: with a perfect parser you are able to know each bit meaning. Some good (but not perfect ;-)) parsers:
- Matroska video
- Microsoft RIFF (AVI video, WAV audio, CDA file)
- PNG picture
- TAR and ZIP archive
GnomeKeyring parser requires Python Crypto module: http://www.amk.ca/python/code/crypto.html
Website: http://bitbucket.org/haypo/hachoir/wiki/hachoir-parser
- update matroska parser to support WebM videos
- fix setup.py: don't use with statement to stay compatible with python 2.4
- Include the README file in the tarball
- setup.py reads the README file instead of using README.py to break the build dependency on hachoir-core
- Create MANIFEST.in to include extra files: README.py, README.header, tests/run_testcase.py, etc.
- Create an INSTALL file
- New parsers:
- BLP: Blizzard Image
- PRC: Palm resource
- HachoirParserList() is no more a singleton: use HachoirParserList.getInstance() to get a singleton
- Add tags optional argument to createParser(), it can be used for example to force a parser
- Fix ParserList.print_(): first argument is now the title and not 'out'. If out is not specified, use sys.stdout.
- MP3: support encapsulated objects (GEOB in ID3)
- Create a dictionary: Windows codepage => charset name (CODEPAGE_CHARSET)
- ASN.1: support boolean and enum types; fix bit string parser
- MKV: use textHandler()
- AVI: create index parser, use file size header to detect padding at the end
- ISO9660: strip nul bytes in application name
- JPEG: add ICC profile chunk name
- PNG: fix transparency parser (tRNS)
- BPLIST: support empty value for markers 4, 5 and 6
- Microsoft Office summary: support more codepages (CP874, Windows 1250..1257)
- tcpdump: support ICMPv6 and IPv6
- Java: add bytecode parser, support JDK 1.6
- Python: parse lnotab content, fill a string table for the references
- MPEG Video: parse much more chunks
- MOV: Parse file type header, create the right MIME type
- Improve OLE2 and MS Office parsers: - support small blocks - fix the charset of the summary properties - summary property integers are unsigned - use TimedeltaWin64 for the TotalEditingTime field - create minimum Word document parser
- Python parser: support magic numbers of Python 3000 with the keyword only arguments
- Create Apple/NeXT Binary Property List (BPLIST) parser
- MPEG audio: reject file with no valid frame nor ID3 header
- Skip subfiles in JPEG files
- Create Apple/NeXT Binary Property List (BPLIST) parser by Robert Xiao
- Create FLAC parser, written by Esteban Loiseau
- Create Action Script parser used in Flash parser, written by Sebastien Ponce
- Create Gnome Keyring parser: able to parse the stored passwords using Python Crypto if the main password is written in the code :-)
- GIF: support text extension field; parse image content (LZW compressed data)
- Fix charset of IPTC string (guess it, it's not always ISO-8859-1)
- TIFF: Sebastien Ponce improved the parser: parse image data, add many tags, etc.
- MS Office: guess the charset for summary strings since it could be ISO-8859-1 or UTF-8
Main changes: add "EFI Platform Initialization Firmware Volume" (PIFV) and "Microsoft Windows Help" (HLP) parsers. Details:
- MPEG audio:
- add createContentSize() to support hachoir-subfile
- support file starting with ID3v1
- if file doesn't contain any frame, use ID3v1 or ID3v2 to create the description
- EXIF:
- use "count" field value
- create RationalInt32 and RationalUInt32
- fix for empty value
- add GPS tags
- JPEG:
- support Ducky (APP12) chunk
- support Comment chunk
- improve validate(): make sure that first 3 chunk types are known
- RPM: use bzip2 or gzip handler to decompress content
- S3M: fix some parser bugs
- OLE2: reject negative block index (or special block index)
- ip2name(): catch KeybordInterrupt and don't resolve next addresses
- ELF: support big endian
- PE: createContentSize() works on PE program, improve resource section detection
- AMF: stop mixed array parser on empty key
Changes:
- OLE2: Support file bigger than 6 MB (support many DIFAT blocks)
- OLE2: Add createContentSize() to guess content size
- LNK: Improve parser (now able to parse the whole file)
- EXE PE: Add more subsystem names
- PYC: Support Python 2.5c2
- Fix many spelling mistakes
Minor changes:
- PYC: Fix long integer parser (negative number), add (disabled) code to disassemble bytecode, use self.code_info to avoid replacing self.info
- OLE2: Add ".msi" file extension
- OLE2: Fix to support documents generated on Mac
- EXIF: set max IFD entry count to 1000 (instead of 200)
- EXIF: don't limit BYTE/UNDEFINED IFD entry count
- EXIF: add "User comment" tag
- GIF: fix image and screen description
- bzip2: catch decompressor error to be able to read trailing data
- Fix file extensions of AIFF
- Windows GUID use new TimestampUUID60 field type
- RIFF: convert class constant names to upper case
- Fix RIFF: don't replace self.info method
- ISO9660: Write parser for terminator content
- 7zip: Compressed archive in 7z format
- ace: ACE archive
- bzip2: bzip2 archive
- cab: Microsoft Cabinet archive
- gzip: gzip archive
- mar: Microsoft Archive
- rar: Roshal archive (RAR)
- rpm: RPM package
- tar: TAR archive
- unix_archive: Unix archive
- zip: ZIP archive
- aiff: Audio Interchange File Format (AIFF)
- fasttracker2: FastTracker2 module
- flac: FLAC audio
- itunesdb: iPod iTunesDB file
- midi: MIDI audio
- mod: Uncompressed amiga module
- mpeg_audio: MPEG audio version 1, 2, 2.5
- ptm: PolyTracker module (v1.17)
- real_audio: Real audio (.ra)
- s3m: ScreamTracker3 module
- sun_next_snd: Sun/NeXT audio
- asn1: Abstract Syntax Notation One (ASN.1)
- matroska: Matroska multimedia container
- ogg: Ogg multimedia container
- ogg_stream: Ogg logical stream
- real_media: RealMedia (rm) Container File
- riff: Microsoft RIFF container
- swf: Macromedia Flash data
- ext2: EXT2/EXT3 file system
- fat12: FAT12 filesystem
- fat16: FAT16 filesystem
- fat32: FAT32 filesystem
- iso9660: ISO 9660 file system
- linux_swap: Linux swap file
- msdos_harddrive: MS-DOS hard drive with Master Boot Record (MBR)
- ntfs: NTFS file system
- reiserfs: ReiserFS file system
- blp1: Blizzard Image Format, version 1
- blp2: Blizzard Image Format, version 2
- lucasarts_font: LucasArts Font
- spiderman_video: The Amazing Spider-Man vs. The Kingpin (Sega CD) FMV video
- zsnes: ZSNES Save State File (only version 143)
- bmp: Microsoft bitmap (BMP) picture
- gif: GIF picture
- ico: Microsoft Windows icon or cursor
- jpeg: JPEG picture
- pcx: PC Paintbrush (PCX) picture
- png: Portable Network Graphics (PNG) picture
- psd: Photoshop (PSD) picture
- targa: Truevision Targa Graphic (TGA)
- tiff: TIFF picture
- wmf: Microsoft Windows Metafile (WMF)
- xcf: Gimp (XCF) picture
- 3do: renderdroid 3d model.
- 3ds: 3D Studio Max model
- bplist: Apple/NeXT Binary Property List
- chm: Microsoft's HTML Help (.chm)
- gnomekeyring: Gnome keyring
- hlp: Microsoft Windows Help (HLP)
- lnk: Windows Shortcut (.lnk)
- ole2: Microsoft Office document
- pcf: X11 Portable Compiled Font (pcf)
- pdf: Portable Document Format (PDF) document
- tcpdump: Tcpdump file (network)
- torrent: Torrent metainfo file
- ttf: TrueType font
- elf: ELF Unix/BSD program/library
- exe: Microsoft Windows Portable Executable
- java_class: Compiled Java class
- pifv: EFI Platform Initialization Firmware Volume
- prc: Palm Resource File
- python: Compiled Python script (.pyc/.pyo files)
- asf: Advanced Streaming Format (ASF), used for WMV (video) and WMA (audio)
- flv: Macromedia Flash video
- mov: Apple QuickTime movie
- mpeg_ts: MPEG-2 Transport Stream
- mpeg_video: MPEG video, version 1 or 2
Total: 78 parsers