scraping all the speeches of the chairman daily from http://jhsjk.people.cn/, because you never know when it's going to vanish or be altered.
json index of the pages just gets dumped as-is; individual speeches as html files with no processing, named with unique speech id and first 20 chars of the title.