Improve Narrative Section Extraction

Question

Improve Narrative Section Extraction

cragwolfe opened this issue 2 years ago · 0 comments

Right now, a SECSection regex is used to identify a TOC section in get_section_narrative. That generally works pretty well. The matching TOC title text is then used to look for the section in the content but rather than sticking with the original regex, a more lenient match condition is ultimately used in 10-K’s and 10-Q’s with match_10k_toc_title_to_section. The better thing to do is likely stick with the original matching regex.

The lenient post-TOC match is why the EHC test fails for the BUSINESS section, and may be the reason for other failures as well.

Definition of Done

Updated section extraction logic such that fewer tests are marked as xfailed, in particular the EHC case mentioned above.