[feature] add safeguards when re-fetching PDFs articles to guarantee content's integrity
Opened this issue · 0 comments
vmeylan commented
Right now at every manual fetching of articles via get_articles_content
, we overwrite the previous content with the latest fetched one, which propagates past edits.
We would need to add safeguards around making sure that the scrapping process does not lose a significant chunk of the content in the event of malfunction.