Saving `is_training_set_available` in `sys_info` during `get_overall_statistics()`
Opened this issue · 2 comments
Although this issue occurs in the web interface. I'm writing it here as it's mainly SDK-related.
Problem
In the web interface:
[0] File "/Users/oscar/opt/anaconda3/envs/exb/lib/python3.9/site-packages/explainaboard/processors/processor.py", line 252, in perform_analyses
[0] my_analysis.perform(
[0] File "/Users/oscar/opt/anaconda3/envs/exb/lib/python3.9/site-packages/explainaboard/analysis/analyses.py", line 191, in perform
[0] raise RuntimeError(f"bucket analysis: feature {self.feature} not found.")
In SDK:
The function _gen_cases_and_stats()
in conditional_generation.py
(called by processor.py
’s get_overall_statistics()
) skips saving require_training_set=True
example-level features. However, these skipped feature names are saved in sys_info.analysis_levels[0]
.
This causes perform()
in BucketAnalysis
in analyses.py
to attempt to look up these features and throw the above error as the features cannot be found in the actual cases (since they are skipped).
Quick fix
Set skip_failed_analyses=True
.
Long-term solution
Following up on #410, we should save a flag like is_training_set_available
in sys_info
. If set to false, we should skip the require_training_set=True
features during bucket analysis.
@OscarWang114 Thanks for reporting the issue!
First, could skip_failed_analyses=True
in Processor.process
be a quick fix, or does not it satisfy the use case?
I also agree with having more specific control around feature groups (in this case, train-only or not). Is the flag name just is_trainint_set
rather than is_training_set_available
?
@odashi Thanks! Yes,skip_failed_analyses=True
is a valid quick fix; I updated the issue description. And thanks for catching the typo (also updated).