Ever wondered what it would look like if Australian Legislation was available in git / Github?
gitlaw-au is my 2015 #govhack project
I didn't quite make it for GovHack.... oh well!
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Text is extracted, but there's still some weird formatting and additional style info, and still missing much of the structure (no table conversion is attempted)
- Get a list of all current acts and their ComLawID acts_current.txt
- Get a list of all the RTF/DOC/DOCx versions and volumes of those acts details_current.json
- Download all the relevant RTF/DOC/DOCx files Amazon S3
- Extract structure of documents and convert to Markdown (in progress)
- Read DOCx format and extract indent and font sizes
- Convert these to markdown indents and heading size
- Extract table structures
- Write to markdown using historical git commit based on date legislation came into force
- Access historical / series of act for history
- spider.py Crawl legislation by year and get the ComLawID
- download.py Get the legislation detail form the ComLawID
- convert.py The actual conversion to Markdown (messy!)