Repeated optimization passes
Closed this issue · 2 comments
jpeddicord commented
The optimize_bounds
method of TextData
is capable of isolating a window within input text to identify a single license chunk. It'd be nice to find multiple licenses within a file, in the case of dual licenses, etc.
My initial thought:
A new method that uses optimize_bounds repeatedly; storing the results of the call and removing (or blanking out) the matched text from the original. Then another iteration that tries optimize_bounds again. Repeat until there's no identifiable text (above, say, 0.8 confidence).
jpeddicord commented
This is starting to happen via "strategies": 05bd6ac
jpeddicord commented
Closing this as strategies have landed in master. Should make it out with the next release soon. \o/