Improve & Address Bugs in`test_retrieval` the Batch Test Question DAG
davidgxue opened this issue · 0 comments
davidgxue commented
Bug
Describe the bug
- Traceback
[2024-02-13, 17:31:11 EST] {taskinstance.py:2699} ERROR - Task failed with exception
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 433, in _execute_task
result = execute_callable(context=context, **execute_callable_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/airflow/decorators/base.py", line 242, in execute
return_value = super().execute(context)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/airflow/operators/python.py", line 199, in execute
return_value = self.execute_callable()
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/airflow/operators/python.py", line 216, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/airflow/dags/monitor/test_retrieval.py", line 210, in generate_test_answers
questions_df[["askastro_answer", "askastro_references", "langsmith_link"]] = questions_df.question.apply(
~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 4079, in __setitem__
self._setitem_array(key, value)
File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 4138, in _setitem_array
self._iset_not_inplace(key, value)
File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 4157, in _iset_not_inplace
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
- question_number_subset
- The
questions_df
after adding debug logging is empty, this only occurs if someone puts in a subset of question ids - The
question_number_subset
param isn't parsed correctly due to the incorrect codejson.loads()
which attempts to parse string into list of ints (but not correctly), leading to no questions being added here.
- The
To Reproduce
Steps to reproduce the behavior:
- Have proper configuration of environment variables for the
test_retrieval
DAG - Trigger the DAG
- Put a list of subset question ids in the parameter prompt, such as
[1,2,3]
- Errors out during DAG run
Expected behavior
No errors
Improvements
- The references saved in the csv are in random incorrect order. This is probably related to the fact that it is put into a set using
{}
somewhere. - The multi-query references and the weaviate search references are not relevant. They don't provide useful info but delays the pipeline and incurs cost.