Erro ao buscar valor anual de uma cidade com acento no nome
anapaulagomes opened this issue · 4 comments
Traceback (most recent call last):
File "/home/ana/.local/lib/python3.8/site-packages/scrapy/utils/defer.py", line 120, in iter_errback
yield next(it)
File "/home/ana/.local/lib/python3.8/site-packages/scrapy/utils/python.py", line 353, in __next__
return next(self.data)
File "/home/ana/.local/lib/python3.8/site-packages/scrapy/utils/python.py", line 353, in __next__
return next(self.data)
File "/home/ana/.local/lib/python3.8/site-packages/scrapy/core/spidermw.py", line 62, in _evaluate_iterable
for r in iterable:
File "/home/ana/.local/lib/python3.8/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
for x in result:
File "/home/ana/.local/lib/python3.8/site-packages/scrapy/core/spidermw.py", line 62, in _evaluate_iterable
for r in iterable:
File "/home/ana/.local/lib/python3.8/site-packages/scrapy/spidermiddlewares/referer.py", line 340, in <genexpr>
return (_set_referer(r) for r in result or ())
File "/home/ana/.local/lib/python3.8/site-packages/scrapy/core/spidermw.py", line 62, in _evaluate_iterable
for r in iterable:
File "/home/ana/.local/lib/python3.8/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
return (r for r in result or () if _filter(r))
File "/home/ana/.local/lib/python3.8/site-packages/scrapy/core/spidermw.py", line 62, in _evaluate_iterable
for r in iterable:
File "/home/ana/.local/lib/python3.8/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
return (r for r in result or () if _filter(r))
File "/home/ana/.local/lib/python3.8/site-packages/scrapy/core/spidermw.py", line 62, in _evaluate_iterable
for r in iterable:
File "/home/ana/workspace/documentos-tcmba/tcmba/spiders/consulta_publica.py", line 297, in get_detailed_results
filename=f"{uuid4()}-{self.normalize_text(texts[1])}",
IndexError: list index out of range
Ao printar os valores:
[<Selector xpath='./td' data='<td colspan="5">No records found.</td>'>] # columns
['No records found.'] # texts
Desconfio que seja por causa do acento. Ao inspecionar o elemento encontro SÃO GONÇALO DOS CAMPOS
mas no nosso crawler é exibido o parâmetro SAO GONCALO DOS CAMPOS
.
Na request do site ele envia com acento? Estou afk mas amanhã consigo testar esse caso.
Tranquilo, sem pressa. Envia com acento, @Laerte.
@anapaulagomes No filtro de cidade nós removemos os acentos, tem alguma razão para isso? Poderíamos talvez usar uma expresão regular? Onde só permitimos letras, acentos, espaço e traços?
tcm-ba/tcmba/spiders/helpers.py
Lines 8 to 9 in 1babb6e
Eu removi aqui e funcionou corretamente, argumentos usados:
scrapy crawl consulta_publica -a periodicidade=anual -a competencia=2019 -a cidade="SÃO GONÇALO DOS CAMPOS"
Rapaz, se tem alguma razão eu não lembro 😂 Manda ver! @Laerte