stanford-oval/storm

[BUG] Outputting numerous 403 errors when running Co-STORM

ColtonBehannon opened this issue · 1 comments

Describe the bug
When running the Co-STORM example, I get numerous 403 errors output in the terminal. These errors are then followed by some trafilatura errors and errors complaining about 'The API deployment for this resource does not exist'.

Despite all this, the final report is seemingly output just fine. The only issue is the terminal is impossible to follow as a result of the errors.

This issue is similar to #133 where I also commented as I experienced similar results in the past with STORM. I have tried multiple networks, and this has not had an impact.

Are the 403 errors a result of these sites not allowing scraping and hence not included in the final report?

To Reproduce
Report following things

  1. Setup environment according to run_costorm_gpt.py
  2. Run it

Screenshots
Error while requesting URL 403
image

followed by

Trafilatura errors and 'An error occurred for text: root, <topic_here>' with 404 code
image

Environment:

  • OS: Ubuntu [WSL2]
  • Retriever: Bing (though I tested with You.com and got a similar result)
  • LLM Provider: Azure OpenAI

This is because for some urls, WebPageHelper fails to process them. This could because it cannot fetch the url content or fails to parse the content.

If you don't want to see the error, you could add the following code to the script to run

import logging

logging.basicConfig(level=logging.CRITICAL)

However, it's generally suggested to log the error/warning. If you don't want to see them in the console output, you can write them to a file. See https://docs.python.org/3/library/logging.html for more info.