lareferencia/dspace-stats-collector

posible deadlock al procesar un core de estadísticas con más de 30 días sin eventos

Opened this issue · 4 comments

En CIC-DIGITAL (dspace7) https://digital.cic.gba.gob.ar/ tenemos eventos registrados desde 2014. Dado que este módulo de DSpace tuvo varias fallas en el pasado, tenemos intervalos/ventanas de tiempo en los que no se registraron eventos o sí pero debieron ser eliminados.

Ahora tenemos un caso puntual del stat-collector procesando correctamente los eventos hasta que llegó al 2020-08-03, momento en el cual comienza una ventana de 80 días sin eventos. Ahí, se quedó colgado hace meses y cada vez que se levanta desde la cronjob, consulta por los mismos días y termina, quedándose en este punto para siempre.

Creo que el problema se da en https://github.com/lareferencia/dspace-stats-collector/blob/master/dspace_stats_collector/solrinput.py#L87 con las variables retryToLookAhead y self.maxDaysToLookForEvents (30).
retryToLookAhead comienza en 1 e incrementa de a 1 cada vez que resulta en una página sin documentos hasta llegar a maxDaysToLookForEvents . Cuando esto sucede termina y no procesa más.
Agrego log de ejemplo.

openaire@digital7:~$ cat dspace-stats-collector/var/logs/dspace-stats-collector.log 
2023-02-09 08:20:02,346 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.286 seconds, with status 200
2023-02-09 08:20:02,351 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B1DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B1DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,353 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B2DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B2DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,355 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B3DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B3DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,358 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B4DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B4DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,360 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B5DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B5DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,362 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B6DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B6DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,365 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B7DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B7DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,367 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B8DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B8DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,369 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B9DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B9DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,371 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B10DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B10DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,374 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B11DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B11DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,376 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B12DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B12DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,378 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B13DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B13DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,381 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B14DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B14DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,383 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B15DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B15DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,385 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B16DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B16DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,387 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B17DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B17DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,389 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B18DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B18DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,392 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B19DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B19DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,394 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B20DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B20DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,397 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B21DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B21DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,399 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B22DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B22DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,401 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B23DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B23DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,403 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B24DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B24DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,406 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B25DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B25DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,408 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B26DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B26DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,410 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B27DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B27DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,412 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B28DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B28DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,415 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B29DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B29DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,417 INFO: Finished 'http://localhost:8983/solr/statistics/select/?q=%2A+%2Btime%3A%7B%222020-08-03T09%3A46%3A42.176Z%2B30DAYS%22+TO+%222020-08-03T09%3A46%3A42.176Z%2B30DAYS%2B1DAYS%22%5D&sort=time+asc&start=0&wt=json&fq=%2Bstatistics_type%3A%22view%22+%2Btype%3A%280+OR+2%29+%2BisBot%3Afalse&fl=id%2Cip%2CowningItem%2Creferrer%2Ctime%2Ctype%2CuserAgent&rows=500' (get) with body '' in 0.002 seconds, with status 200
2023-02-09 08:20:02,417 INFO: DSpace Stats Collector finished processing 0 events from 2020-08-03T09:46:42.176Z to 2020-08-03T09:46:42.176Z. Breakdown: 0 events sent succesfully, 0 events discarted as robot

@arieljlira
Sí, hace poco hice yo esa nueva implementación porque las queries con ventanas grandes nos estaban tirando los solr en otros repos grandes. Entones ahora la estrategia es esta búsqueda en ventanas de 1 día en forma incremental. Creo que el problema es que no altere el timestamp al llegar al maxDaysToLookAhead. Le pusimos 30 entendiendo que si pasaba un caso como el tuyo manualmente podés adelantar el timestamp ejecutando con -f nuevamente.
Que te parece pongo que en los casos donde llega a max haga un update del timestamp al terminar ? Eso haría que en el proximo run siguiera buscando para adelante otros 30 días, eventualmente recorrería hasta encontrar algo o estar al día.
Me interesa tu opinion. Gracias y saludos

Hola Lautaro, si, hacer el guardado del timestamp al completar la ventana solucionaría el problema de base. Es decir, sería cuando retryToLookAhead == self.maxDaysToLookForEvents
En cuanto al incremento de 1 día capaz se podría probar con una estrategia exponencial. Ej
retryToLookAhead=retryToLookAhead*2
En general en los cores no hay blancos, y cuando los hay son de varios días.
Saludos y gracias!

Confirmo que avanzando un poco el lastTrackedEventTimestamp de dspace-stats-collector/var/timestamp/default.dat (hasta el cierre de la ventana), soluciona el problema.