Improve & Address Bugs in`test_retrieval` the Batch Test Question DAG

Question

Improve & Address Bugs in`test_retrieval` the Batch Test Question DAG

davidgxue opened this issue 9 months ago · 0 comments

Bug

Describe the bug

Traceback

[2024-02-13, 17:31:11 EST] {taskinstance.py:2699} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 433, in _execute_task
    result = execute_callable(context=context, **execute_callable_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/airflow/decorators/base.py", line 242, in execute
    return_value = super().execute(context)
                   ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/airflow/operators/python.py", line 199, in execute
    return_value = self.execute_callable()
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/airflow/operators/python.py", line 216, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/dags/monitor/test_retrieval.py", line 210, in generate_test_answers
    questions_df[["askastro_answer", "askastro_references", "langsmith_link"]] = questions_df.question.apply(
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 4079, in __setitem__
    self._setitem_array(key, value)
  File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 4138, in _setitem_array
    self._iset_not_inplace(key, value)
  File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 4157, in _iset_not_inplace
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key

question_number_subset
- The questions_df after adding debug logging is empty, this only occurs if someone puts in a subset of question ids
- The question_number_subset param isn't parsed correctly due to the incorrect code json.loads() which attempts to parse string into list of ints (but not correctly), leading to no questions being added here.

To Reproduce
Steps to reproduce the behavior:

Have proper configuration of environment variables for the test_retrieval DAG
Trigger the DAG
Put a list of subset question ids in the parameter prompt, such as [1,2,3]
Errors out during DAG run

Expected behavior
No errors

Improvements

The references saved in the csv are in random incorrect order. This is probably related to the fact that it is put into a set using {} somewhere.
The multi-query references and the weaviate search references are not relevant. They don't provide useful info but delays the pipeline and incurs cost.