harvard-lil/capstone

Extract HTML for S3 files

kilbergr opened this issue · 0 comments

This is a follow on issue from issue 2144.

Having created the HTML with paragraph and bounding box attributes, create script to extract to S3 files abiding by structure specified in transition doc. Likely add on to script created in issue 2139.

AC:

  • Can create S3 files with full attributes on HTML files for each case