WGBH-MLA/AAPB2

Add text versions of transcripts from Bill Moyers collection to AAPB

Closed this issue · 4 comments

Details

Bill Moyers' team gave us (Miranda) hundreds of beautiful transcripts in a zip file. The transcripts lack any time codes, and are just Word or plain text.

In the AAPB meeting on April 30, Karen backed a decision to go ahead and post these transcripts on the public AAPB site instead of the time-synchronized machine-generated transcripts that are currently up.

This request is for Kevin to take the transcripts from Miranda's zip file, convert files to the appropriate plain text format, upload them to the right location in S3, update the transcript locations, and reindex those asset records.

Submitted by: Kevin
Priority: Medium (within this month)
URL:
Slack message thread:

  1. processed DOCX into TXT
  2. reprocessed TXT provided to remove \x{0D} and other text insanity that chokes the ingest on AAPB
  3. narrowed scope of uploads to only assets currently utilizing transcript JSON
  4. collected all stats.txt files from affected assets on S3 (for later reporting fun)
  5. removed from S3 all ASR-generated objects for affected assets
  6. uploaded reprocessed TXT to S3 for affected assets
  7. reindexed AAPB for affected assets
  8. celebrated
    INFO [2024-05-02 19:12:23]: Starting one big commit...
    INFO [2024-05-02 19:12:29]: Finished one big commit.
    INFO [2024-05-02 19:12:29]: SUMMARY: DETAIL
    INFO [2024-05-02 19:12:29]: SUMMARY: STATS
    INFO [2024-05-02 19:12:29]: (Look just above for details on each error.)
    INFO [2024-05-02 19:12:29]: 582 (100.0%) succeeded
    INFO [2024-05-02 19:12:29]: DONE
    ############################ ENDING HOST 52.55.103.243 ############################

oop -

  • do batch update on ams2 "Transcript Status" = "Correct"
  • reindex on AAPB

Miranda has sent some redone transcripts - see email from May 20, 3:32 pm for zipped docx files.

Done:
converted to TXT
uploaded with backups to S3,
metadata updated
reindexed on AAPB
updated all assignments in AAPB_Enhancements

Batch Ingest 3008

INFO [2024-05-21 19:09:47]: Starting one big commit...
INFO [2024-05-21 19:09:47]: Finished one big commit.
INFO [2024-05-21 19:09:47]: SUMMARY: DETAIL
INFO [2024-05-21 19:09:47]: SUMMARY: STATS
INFO [2024-05-21 19:09:47]: (Look just above for details on each error.)
INFO [2024-05-21 19:09:47]: 42 (100.0%) succeeded
INFO [2024-05-21 19:09:47]: DONE
############################ ENDING HOST 52.55.103.243 ############################