opencleveland/drocer-webapp

Changing the regex pattern for incoming PDFs

skorasaurus opened this issue · 2 comments

While running the extractor on my setup, I noticed that all of the PDFs that I had added were skipped by the extractor since the regex pattern for files only accept pdfs named like those already in the repo.

Should the pattern change or should we rename the files instead?

this is the first step to get a new instance up.

for consistency's sake, we had discussed in the past to use the ISO 8601 (YYYY-MM-DD) date pattern; as we had used at https://github.com/opencleveland/drocer although this was

I rewrote the regex pattern in https://github.com/skorasaurus/drocer-webapp/tree/fix-9-regex-input
so we can index PDFs using the ISO 8601 pattern
and double checking through the code now to see if this change has any other consequences;

The only noticeable consequence is when viewing results in the left-side window ( ); the formatting is