harvard-lil/capstone

When docker has too little allocated memory, get: "Couldn't apply scheduled task update-elasticsearch: '<' not supported between instances of 'NoneType' and 'NoneType'" error

kilbergr opened this issue · 2 comments

When Docker is allocated too little memory, it will Kill the refresh_case_body_cache task run to set up the dev environment before it completes, without an error message. If you continue with the initiation sequence and run fab rebuild_search_index, you'll receive the following error:

 File "/app/capapi/documents.py", line 209, in prepare_casebody_data
    cites_by_id = {k: list(v) for k,v in groupby(sorted(outbound_cites, key=lambda c: c.opinion_id), lambda c: c.opinion_id)}
celery.beat.SchedulingError: Couldn't apply scheduled task update-elasticsearch: '<' not supported between instances of 'NoneType' and 'NoneType'

This is because in the ExtractedCitation model, opinion_id fields can be created with NoneType: https://github.com/harvard-lil/capstone/blob/develop/capstone/capdb/models.py#L2081 and if they do not complete the extract_citations function during the refresh_case_body_cache run, the following line may not run and an opinion_id may not be assigned: https://github.com/harvard-lil/capstone/blob/develop/capstone/scripts/extract_cites.py#L58.

Essentially, you'll have a number of partially extracted citations.

I believe the best option is to add an error in the refresh_case_body_cache task that indicates the issue is with memory allocation.

Nevermind--sounds like the best we can do is update the README 😢

Fixed here: 9381382