nlmatics/nlm-ingestor

Lost pages

sailxjx opened this issue · 0 comments

pythonlearn.pdf

I used a local docker server to parse the above document, which has 239 pages. However, the ingestor only parsed 158 pages, and the remaining content was discarded. Is this a bug?

Here is the logs:

processing page: 140 Number of p_tags.... 178
processing page: 141 Number of p_tags.... 4
processing page: 142 Number of p_tags.... 251
processing page: 143 Number of p_tags.... 303
processing page: 144 Number of p_tags.... 322
processing page: 145 Number of p_tags.... 287
processing page: 146 Number of p_tags.... 330
processing page: 147 Number of p_tags.... 308
processing page: 148 Number of p_tags.... 265
processing page: 149 Number of p_tags.... 312
processing page: 150 Number of p_tags.... 298
processing page: 151 Number of p_tags.... 346
processing page: 152 Number of p_tags.... 412
processing page: 153 Number of p_tags.... 287
processing page: 154 Number of p_tags.... 193
processing page: 155 Number of p_tags.... 5
processing page: 156 192.168.65.1 - - [18/Apr/2024 14:24:54] "POST /api/parseDocument?renderFormat=all HTTP/1.1" 200 -