
Digital Database of Microbial Phenotypes. Like an online Bergey's Manual.

Primary LanguagePHP

This database is composed of phenotypic information on Bacteria and Archaea, as listed in the primary literature and Bergey's Manual of Systematic Bacteriology.
Contributors to date: R. Eric Collins

volume2-tables.txt: Bergey's Manual Volume 2, The Proteobacteria
volume3-tables.txt: Bergey's Manual Volume 3, The Firmicutes
volume4-tables.txt: Bergey's Manual Volume 4, The Bacteroidetes, Spirochaetes, Tenericutes (Mollicutes), Acidobacteria, Fibrobacteres, Fusobacteria, Dictyoglomi, Gemmatimonadetes, Lentisphaerae, Verrucomicrobia, Chlamydiae, and Planctomycetes

Volume 1, the Archaea and Deeply-Branching Bacteria, is not available in digital format
Volume 5, the Actinobacteria, will be released in early 2012

 - get text out of PDFs
 -- pdftotext -layout -nopgbrk -enc UTF-8 <file>

 - get tables out of text
 -- grep -h -A 100 -B 1 -e 'TABLE' <file>

 - decode messed up Unicode in Volume 2
 -- unicode2not.pl <file>

 - fix whitespace
 -- tabit.pl <file>

 - fix formatting
 -- manually in Kate using Block Selection mode
 -- examples include cleaning up 1-column tables, multi-page tables
 -- multi-line captions were replaced using regex:
 --- Find: (TABLE.*)\n([a-z]{2,}.*)\s*
 --- Replace: \1 \2