stanford-oval/storm

[BUG] DuckDuckGoSearchRM is not working - AttributeError

AIconicInsight opened this issue · 2 comments

Describe the bug
When using DuckDuckGoSearchRM in the STORMWikiRunner, I always get an attribute error in STORMWikiRunner.run(...): 'AssertionError' object has no attribute 'message'. The underlaying error comes from _text_api function in the DDGS class in the duckduckgo_search library, file duckduckgo_search.py: AssertionError: keywords is mandatory.

I used a LLM from Ollama: llama3.1:8b-instruct-fp16.

To Reproduce
See 'Code' section below

Environment:

  • OS: Ubuntu 22.04.5 LTS
  • Python v3.10.0
Code
  
import os
from knowledge_storm import (
    STORMWikiRunnerArguments,
    STORMWikiRunner,
    STORMWikiLMConfigs,
)
from knowledge_storm.lm import OllamaClient
from knowledge_storm.rm import DuckDuckGoSearchRM
from dotenv import load_dotenv

load_dotenv()

output_dir = os.getenv("OUTPUT_DIR")

lm_configs = STORMWikiLMConfigs()
engine_args = STORMWikiRunnerArguments(output_dir=output_dir)

ollama_kwargs = {
"temperature": 0.7,
"top_p": 0.9,
}

fast_model_name = os.getenv("FAST_MODEL_NAME")
fast_model_host = os.getenv("FAST_MODEL_HOST")
fast_model_port = int(os.getenv("FAST_MODEL_PORT"))
fast_model_max_tokens = int(os.getenv("FAST_MODEL_MAX_TOKENS"))
strong_model_name = os.getenv("STRONG_MODEL_NAME")
strong_model_host = os.getenv("STRONG_MODEL_HOST")
strong_model_port = int(os.getenv("STRONG_MODEL_PORT"))
strong_model_max_tokens = int(os.getenv("STRONG_MODEL_MAX_TOKENS"))

rm = DuckDuckGoSearchRM()

fast_model = OllamaClient(
model=fast_model_name,
url=fast_model_host,
port=fast_model_port,
max_tokens=fast_model_max_tokens,
**ollama_kwargs
)
strong_model = OllamaClient(
model=strong_model_name,
url=strong_model_host,
port=strong_model_port,
max_tokens=strong_model_max_tokens,
**ollama_kwargs
)

lm_configs.set_conv_simulator_lm(fast_model)
lm_configs.set_question_asker_lm(fast_model)
lm_configs.set_outline_gen_lm(strong_model)
lm_configs.set_article_gen_lm(strong_model)
lm_configs.set_article_polish_lm(strong_model)

runner = STORMWikiRunner(args=engine_args, lm_configs=lm_configs, rm=rm)

topic = "Deep Neural Networks"
runner.run( # <--- Error is thrown here
topic=topic,
do_research=True,
do_generate_outline=True,
do_generate_article=True,
do_polish_article=True,
)

Error Message
{
	"name": "AttributeError",
	"message": "'AssertionError' object has no attribute 'message'",
	"stack": "---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/backoff/_sync.py:105, in retry_exception.<locals>.retry(*args, **kwargs)
    104 try:
--> 105     ret = target(*args, **kwargs)
    106 except exception as e:

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/knowledge_storm/rm.py:787, in DuckDuckGoSearchRM.request(self, query)
    778 @backoff.on_exception(
    779     backoff.expo,
    780     (Exception,),
   (...)
    785 )
    786 def request(self, query: str):
--> 787     results = self.ddgs.text(
    788         query, max_results=self.k, backend=self.duck_duck_go_backend
    789     )
    790     return results

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/duckduckgo_search/duckduckgo_search.py:243, in DDGS.text(self, keywords, region, safesearch, timelimit, backend, max_results)
    242 if backend == \"api\":
--> 243     results = self._text_api(keywords, region, safesearch, timelimit, max_results)
    244 elif backend == \"html\":

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/duckduckgo_search/duckduckgo_search.py:275, in DDGS._text_api(self, keywords, region, safesearch, timelimit, max_results)
    258 \"\"\"DuckDuckGo text search. Query params: https://duckduckgo.com/params.
    259 
    260 Args:
   (...)
    273     TimeoutException: Inherits from DuckDuckGoSearchException, raised for API request timeouts.
    274 \"\"\"
--> 275 assert keywords, \"keywords is mandatory\"
    277 vqd = self._get_vqd(keywords)

AssertionError: keywords is mandatory

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
Cell In[6], line 2
      1 topic = \"Deep Neural Networks\"
----> 2 runner.run(
      3     topic=topic,
      4     do_research=True,
      5     do_generate_outline=True,
      6     do_generate_article=True,
      7     do_polish_article=True,
      8 )

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/knowledge_storm/storm_wiki/engine.py:394, in STORMWikiRunner.run(self, topic, ground_truth_url, do_research, do_generate_outline, do_generate_article, do_polish_article, remove_duplicate, callback_handler)
    392 information_table: StormInformationTable = None
    393 if do_research:
--> 394     information_table = self.run_knowledge_curation_module(
    395         ground_truth_url=ground_truth_url, callback_handler=callback_handler
    396     )
    397 # outline generation module
    398 outline: StormArticle = None

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/knowledge_storm/interface.py:499, in Engine.log_execution_time_and_lm_rm_usage.<locals>.wrapper(*args, **kwargs)
    496 @functools.wraps(func)
    497 def wrapper(*args, **kwargs):
    498     start_time = time.time()
--> 499     result = func(*args, **kwargs)
    500     end_time = time.time()
    501     execution_time = end_time - start_time

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/knowledge_storm/storm_wiki/engine.py:219, in STORMWikiRunner.run_knowledge_curation_module(self, ground_truth_url, callback_handler)
    212 def run_knowledge_curation_module(
    213     self,
    214     ground_truth_url: str = \"None\",
    215     callback_handler: BaseCallbackHandler = None,
    216 ) -> StormInformationTable:
    218     information_table, conversation_log = (
--> 219         self.storm_knowledge_curation_module.research(
    220             topic=self.topic,
    221             ground_truth_url=ground_truth_url,
    222             callback_handler=callback_handler,
    223             max_perspective=self.args.max_perspective,
    224             disable_perspective=False,
    225             return_conversation_log=True,
    226         )
    227     )
    229     FileIOHelper.dump_json(
    230         conversation_log,
    231         os.path.join(self.article_output_dir, \"conversation_log.json\"),
    232     )
    233     information_table.dump_url_to_info(
    234         os.path.join(self.article_output_dir, \"raw_search_results.json\")
    235     )

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/knowledge_storm/storm_wiki/modules/knowledge_curation.py:379, in StormKnowledgeCurationModule.research(self, topic, ground_truth_url, callback_handler, max_perspective, disable_perspective, return_conversation_log)
    377 # run conversation
    378 callback_handler.on_information_gathering_start()
--> 379 conversations = self._run_conversation(
    380     conv_simulator=self.conv_simulator,
    381     topic=topic,
    382     ground_truth_url=ground_truth_url,
    383     considered_personas=considered_personas,
    384     callback_handler=callback_handler,
    385 )
    387 information_table = StormInformationTable(conversations)
    388 callback_handler.on_information_gathering_end()

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/knowledge_storm/storm_wiki/modules/knowledge_curation.py:340, in StormKnowledgeCurationModule._run_conversation(self, conv_simulator, topic, ground_truth_url, considered_personas, callback_handler)
    338     for future in as_completed(future_to_persona):
    339         persona = future_to_persona[future]
--> 340         conv = future.result()
    341         conversations.append(
    342             (persona, ArticleTextProcessing.clean_up_citation(conv).dlg_history)
    343         )
    345 return conversations

File ~/miniconda3/envs/thesis/lib/python3.10/concurrent/futures/_base.py:438, in Future.result(self, timeout)
    436     raise CancelledError()
    437 elif self._state == FINISHED:
--> 438     return self.__get_result()
    440 self._condition.wait(timeout)
    442 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:

File ~/miniconda3/envs/thesis/lib/python3.10/concurrent/futures/_base.py:390, in Future.__get_result(self)
    388 if self._exception:
    389     try:
--> 390         raise self._exception
    391     finally:
    392         # Break a reference cycle with the exception in self._exception
    393         self = None

File ~/miniconda3/envs/thesis/lib/python3.10/concurrent/futures/thread.py:52, in _WorkItem.run(self)
     49     return
     51 try:
---> 52     result = self.fn(*self.args, **self.kwargs)
     53 except BaseException as exc:
     54     self.future.set_exception(exc)

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/knowledge_storm/storm_wiki/modules/knowledge_curation.py:318, in StormKnowledgeCurationModule._run_conversation.<locals>.run_conv(persona)
    317 def run_conv(persona):
--> 318     return conv_simulator(
    319         topic=topic,
    320         ground_truth_url=ground_truth_url,
    321         persona=persona,
    322         callback_handler=callback_handler,
    323     )

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/dspy/primitives/program.py:26, in Module.__call__(self, *args, **kwargs)
     25 def __call__(self, *args, **kwargs):
---> 26     return self.forward(*args, **kwargs)

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/knowledge_storm/storm_wiki/modules/knowledge_curation.py:69, in ConvSimulator.forward(self, topic, persona, ground_truth_url, callback_handler)
     67 if user_utterance.startswith(\"Thank you so much for your help!\"):
     68     break
---> 69 expert_output = self.topic_expert(
     70     topic=topic, question=user_utterance, ground_truth_url=ground_truth_url
     71 )
     72 dlg_turn = DialogueTurn(
     73     agent_utterance=expert_output.answer,
     74     user_utterance=user_utterance,
     75     search_queries=expert_output.queries,
     76     search_results=expert_output.searched_results,
     77 )
     78 dlg_history.append(dlg_turn)

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/dspy/primitives/program.py:26, in Module.__call__(self, *args, **kwargs)
     25 def __call__(self, *args, **kwargs):
---> 26     return self.forward(*args, **kwargs)

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/knowledge_storm/storm_wiki/modules/knowledge_curation.py:214, in TopicExpert.forward(self, topic, question, ground_truth_url)
    212 queries = queries[: self.max_search_queries]
    213 # Search
--> 214 searched_results: List[Information] = self.retriever.retrieve(
    215     list(set(queries)), exclude_urls=[ground_truth_url]
    216 )
    217 if len(searched_results) > 0:
    218     # Evaluate: Simplify this part by directly using the top 1 snippet.
    219     info = \"\"

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/knowledge_storm/interface.py:314, in Retriever.retrieve(self, query, exclude_urls)
    309     return local_to_return
    311 with concurrent.futures.ThreadPoolExecutor(
    312     max_workers=self.max_thread
    313 ) as executor:
--> 314     results = list(executor.map(process_query, queries))
    316 for result in results:
    317     to_return.extend(result)

File ~/miniconda3/envs/thesis/lib/python3.10/concurrent/futures/_base.py:608, in Executor.map.<locals>.result_iterator()
    605 while fs:
    606     # Careful not to keep a reference to the popped future
    607     if timeout is None:
--> 608         yield fs.pop().result()
    609     else:
    610         yield fs.pop().result(end_time - time.monotonic())

File ~/miniconda3/envs/thesis/lib/python3.10/concurrent/futures/_base.py:438, in Future.result(self, timeout)
    436     raise CancelledError()
    437 elif self._state == FINISHED:
--> 438     return self.__get_result()
    440 self._condition.wait(timeout)
    442 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:

File ~/miniconda3/envs/thesis/lib/python3.10/concurrent/futures/_base.py:390, in Future.__get_result(self)
    388 if self._exception:
    389     try:
--> 390         raise self._exception
    391     finally:
    392         # Break a reference cycle with the exception in self._exception
    393         self = None

File ~/miniconda3/envs/thesis/lib/python3.10/concurrent/futures/thread.py:52, in _WorkItem.run(self)
     49     return
     51 try:
---> 52     result = self.fn(*self.args, **self.kwargs)
     53 except BaseException as exc:
     54     self.future.set_exception(exc)

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/knowledge_storm/interface.py:295, in Retriever.retrieve.<locals>.process_query(q)
    294 def process_query(q):
--> 295     retrieved_data_list = self.rm(
    296         query_or_queries=[q], exclude_urls=exclude_urls
    297     )
    298     local_to_return = []
    299     for data in retrieved_data_list:

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/dspy/retrieve/retrieve.py:30, in Retrieve.__call__(self, *args, **kwargs)
     29 def __call__(self, *args, **kwargs):
---> 30     return self.forward(*args, **kwargs)

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/knowledge_storm/rm.py:813, in DuckDuckGoSearchRM.forward(self, query_or_queries, exclude_urls)
    809 collected_results = []
    811 for query in queries:
    812     #  list of dicts that will be parsed to return
--> 813     results = self.request(query)
    815     for d in results:
    816         # assert d is dict
    817         if not isinstance(d, dict):

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/backoff/_sync.py:111, in retry_exception.<locals>.retry(*args, **kwargs)
    107 max_tries_exceeded = (tries == max_tries_value)
    108 max_time_exceeded = (max_time_value is not None and
    109                      elapsed >= max_time_value)
--> 111 if giveup(e) or max_tries_exceeded or max_time_exceeded:
    112     _call_handlers(on_giveup, **details, exception=e)
    113     if raise_on_giveup:

File ~/miniconda3/envs/thesis/lib/python3.10/site-packages/dsp/modules/mistral.py:28, in giveup_hdlr(details)
     26 def giveup_hdlr(details):
     27     \"\"\"wrapper function that decides when to give up on retry\"\"\"
---> 28     if \"rate limits\" in details.message:
     29         return False
     30     return True

AttributeError: 'AssertionError' object has no attribute 'message'"
}

I also tried GoogleSearch instead of DuckDuckGoSearchRM, which logs a lot of errors in the search phase but it runs through however.
The code is the same, I only adjusted the rm variable:

rm = GoogleSearch(
    google_search_api_key=os.getenv("GOOGLE_SEARCH_API_KEY"),
    google_cse_id=os.getenv("GOOGLE_CSE_ID"),
)

Generated output:
generated.zip

Google Search Error Logs
  
root : ERROR    : Error occurred while searching query : 
`Error while requesting URL('https://www.sciencedirect.com/topics/computer-science/deep-neural-network') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.sciencedirect.com/topics/computer-science/deep-neural-network'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403")`
`root : ERROR    : Error occurred while searching query History of deep learning: `
`Error while requesting URL('https://ofac.treasury.gov/faqs') - ReadTimeout('The read operation timed out')`
`root : ERROR    : Error occurred while searching query : 
root : ERROR    : Error occurred while searching query : The read operation timed out
root : ERROR    : Error occurred while searching query Here are the queries I would type in the search box to find information on computing error gradients during backpropagation and handling vanishing or exploding gradients:: The read operation timed out
root : ERROR    : Error occurred while searching query : The read operation timed out
root : ERROR    : Error occurred while searching query backpropagation error gradient computation: The read operation timed out`
`Error while requesting URL('https://www.tandfonline.com/doi/full/10.1080/10888691.2018.1537791') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.tandfonline.com/doi/full/10.1080/10888691.2018.1537791'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403")
Error while requesting URL('https://www.sciencedirect.com/science/article/pii/S0268401223000233') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.sciencedirect.com/science/article/pii/S0268401223000233'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403")
Error while requesting URL('https://www.sciencedirect.com/science/article/pii/S266734522300024X') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.sciencedirect.com/science/article/pii/S266734522300024X'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403")`
`root : ERROR    : Error occurred while searching query : 
root : ERROR    : Error occurred while searching query Here are some queries I would use to find relevant information:: [SSL] internal error (_ssl.c:2536)
root : ERROR    : Error occurred while searching query **What makes a Deep Neural Network "Deep"?**: The read operation timed out
root : ERROR    : Error occurred while searching query Here are the Google search queries I would use:: The read operation timed out
root : ERROR    : Error occurred while searching query : `
`Error while requesting URL('https://www.sciencedirect.com/topics/computer-science/deep-neural-network') - HTTPStatusError("Client error '403 Forbidden' for url 'https://www.sciencedirect.com/topics/computer-science/deep-neural-network'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403")`
`root : ERROR    : Error occurred while searching query Here are some Google search queries I would use to find information on the concept of gradient descent and its relation to backpropagation in deep neural networks:: The read operation timed out
root : ERROR    : Error occurred while searching query : The read operation timed out
trafilatura.utils : ERROR    : parsed tree length: 1, wrong data type or not valid HTML
trafilatura.core : ERROR    : empty HTML tree: None
trafilatura.core : WARNING  : discarding data: None
root : ERROR    : Error occurred while searching query : 
root : ERROR    : Error occurred while searching query : [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:2536)
trafilatura.utils : ERROR    : parsed tree length: 1, wrong data type or not valid HTML
trafilatura.core : ERROR    : empty HTML tree: None
trafilatura.core : WARNING  : discarding data: None
root : ERROR    : Error occurred while searching query Here are the queries I would use to find information on common loss functions used in deep neural networks:: The read operation timed out
knowledge_storm.interface : INFO     : run_knowledge_curation_module executed in 263.2473 seconds
knowledge_storm.interface : INFO     : run_outline_generation_module executed in 7.4509 seconds
sentence_transformers.SentenceTransformer : INFO     : Use pytorch device_name: cuda
sentence_transformers.SentenceTransformer : INFO     : Load pretrained SentenceTransformer: paraphrase-MiniLM-L6-v2`
  

Sorry for the delay. Could you check whether the rm can output content with some queries written by yourself?

Given you are using a quantized model, the model may fail to output correct queries.

The RM works fine, after some more testing I narrowed the issue down to the LLMs. I tested Ollama models in a up to 80GB VRAM range (both quantized and full-precision). The only model that didn't run into errors was Llama 3.2 3b (fp), however the output was only the introduction section. I also used the DSPy templates from the ollama example.

Have you found success with a model from the Ollama model library? If so, which one(s) was it?

Tested LLMs:

  • Llama 3.1 8b (fp)
  • Llama 3.1 70b (q8)
  • Llama 3.2 3b (fp)
  • Nemotron Mini 4b (fp)
  • Gemma 2 27b (q8)
  • Qwen 2.5 14b (q8)
  • Qwen 2.5 72b (q8)