andybega/icews

Unusual January 2022 data file in weekly repo is causing error

andybega opened this issue · 1 comments

On dataverse, the data for 2021 were recently (June 23rd) moved from the weekly repo to a single annual file for 2021. As part of that process, in the weekly repo, the events for January 2022 were packaged into a bigger monthly file. This is causing an error:

Error in list_local_files(raw_file_dir) : 
  unexpected non-data file(s) found in 'data/raw':
  202201-icews-events.tab

In the weekly repo:

Screen Shot 2022-07-25 at 11 10 24

And see this new annual file in the annual repo:

Screen Shot 2022-07-25 at 11 11 03

(Handling the year transition is related to #61)

Progress, to:

> update_icews(dryrun = FALSE)
Ingesting records from 'events.2021.20220623.tab'
 Error: UNIQUE constraint failed: events.event_id, events.event_date

There are entirely duplicated rows in the yearly file. Adding a check to remove those (with a warning).

> table(duplicated(foo))

 FALSE   TRUE 
671235   5789