Improve Narrative Section Extraction
cragwolfe opened this issue · 0 comments
cragwolfe commented
Right now, a SECSection
regex is used to identify a TOC section in get_section_narrative. That generally works pretty well. The matching TOC title text is then used to look for the section in the content but rather than sticking with the original regex, a more lenient match condition is ultimately used in 10-K’s and 10-Q’s with match_10k_toc_title_to_section. The better thing to do is likely stick with the original matching regex.
The lenient post-TOC match is why the EHC test fails for the BUSINESS section, and may be the reason for other failures as well.
Definition of Done
- Updated section extraction logic such that fewer tests are marked as xfailed, in particular the EHC case mentioned above.