DadosAbertosDeFeira/tcm-ba

Erro ao buscar valor anual de uma cidade com acento no nome

anapaulagomes opened this issue · 4 comments

Traceback (most recent call last):
  File "/home/ana/.local/lib/python3.8/site-packages/scrapy/utils/defer.py", line 120, in iter_errback
    yield next(it)
  File "/home/ana/.local/lib/python3.8/site-packages/scrapy/utils/python.py", line 353, in __next__
    return next(self.data)
  File "/home/ana/.local/lib/python3.8/site-packages/scrapy/utils/python.py", line 353, in __next__
    return next(self.data)
  File "/home/ana/.local/lib/python3.8/site-packages/scrapy/core/spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "/home/ana/.local/lib/python3.8/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
    for x in result:
  File "/home/ana/.local/lib/python3.8/site-packages/scrapy/core/spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "/home/ana/.local/lib/python3.8/site-packages/scrapy/spidermiddlewares/referer.py", line 340, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/home/ana/.local/lib/python3.8/site-packages/scrapy/core/spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "/home/ana/.local/lib/python3.8/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/home/ana/.local/lib/python3.8/site-packages/scrapy/core/spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "/home/ana/.local/lib/python3.8/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/home/ana/.local/lib/python3.8/site-packages/scrapy/core/spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "/home/ana/workspace/documentos-tcmba/tcmba/spiders/consulta_publica.py", line 297, in get_detailed_results
    filename=f"{uuid4()}-{self.normalize_text(texts[1])}",
IndexError: list index out of range

Ao printar os valores:

[<Selector xpath='./td' data='<td colspan="5">No records found.</td>'>]  # columns
['No records found.']  # texts

Desconfio que seja por causa do acento. Ao inspecionar o elemento encontro SÃO GONÇALO DOS CAMPOS mas no nosso crawler é exibido o parâmetro SAO GONCALO DOS CAMPOS .

Na request do site ele envia com acento? Estou afk mas amanhã consigo testar esse caso.

Tranquilo, sem pressa. Envia com acento, @Laerte.

@anapaulagomes No filtro de cidade nós removemos os acentos, tem alguma razão para isso? Poderíamos talvez usar uma expresão regular? Onde só permitimos letras, acentos, espaço e traços?

city = strip_accents(city.strip().upper())
return city.ljust(limit)

Eu removi aqui e funcionou corretamente, argumentos usados:

scrapy crawl consulta_publica -a periodicidade=anual -a competencia=2019 -a cidade="SÃO GONÇALO DOS CAMPOS"

Rapaz, se tem alguma razão eu não lembro 😂 Manda ver! @Laerte