First foray into manipulating current S3 dir structure into static file structure
kilbergr opened this issue · 1 comments
kilbergr commented
Given the success of last week's spike, we will continue pursuing creating a static file site pulling from S3. The problem? Our current file structure in S3 is not the one we ultimately want.
We've manually experimented with creating the file structure we expect. Now, we will programmatically experiment with it.
AC:
- Either get access to current CAP resources in S3 OR create a fake set up with public buckets so Bex can access. @bensteinberg will determine what is appropriate here.
- Create transformation script that will:
- move files and directories as required
The following has been relegated to a separate ticket:- put HTML section of each case into a separate file
- Remove
\n
and escape\
from HTML. - Add attributes for paragraphs and bounding boxes to HTML
- move files and directories as required
- Demonstrate this can work on a limited subset (could be the same amount as used last ticket).
- Can be done in language of choice
- Use same file structure doc although we may change.
kilbergr commented
Ok so the pieces done up to this point are:
redacted/
Reporters.json
Volumes.json
${reporter_id}/ # aka Reporter Folder; e.g. "pa-d-c"; shortcode already in case.law urls
Metadata.json
Cases.jsonl
Volume.pdf
${volume_id}/ # aka Volume Folder; e.g. "6"; already in case.law urls
Metadata.json
Cases.jsonl
case/
1.json # file names named after page case starts on; similar to case.law urls
6.json
...
The pieces that remain are
redacted/
${reporter_id}/ # aka Reporter Folder; e.g. "pa-d-c"; shortcode already in case.law urls
Volumes.json
${volume_id}/ # aka Volume Folder; e.g. "6"; already in case.law urls
case/
1.html # file names named after page case starts on; similar to case.law urls
6.html
...
vendor/
${volume_id}.tar # compression?
${volume_id}.csv
${volume_id}.tar.sha256
misc/
[stuff from https://case.law/download/]