Typo in extractor#isHighlinkDensity ?
dminkovsky opened this issue · 1 comments
Line 378 of extractor.coffee is currently:
linkText = sb.join('')
which doesn't really make sense... shouldn't it be this?:
linkText = sb.join(' ')
Changing it to this, however, causes the polygon_video
test to break. Not sure how to address this, or, really, how to make complete sense of the isHighlinkDensity
function. The idea behind the function is perfectly clear, but how the function is "tuned" doesn't really make sense:
linkDivisor = numberOfLinkWords / wordsNumber
score = linkDivisor * numberOfLinks
score >= 1.0
Maybe a bad question to ask about a heuristic, but why is this the heuristic?
Yes, nice find! Thanks for pointing out that typo. I'm pushing a fix.
As far as the heuristic, that section based on the original Goose code. I cleaned up the naming to make the flow a little easier to follow, but I didn't come up with that heuristic originally.