[RFC] we need more debugging printouts
wkoszek opened this issue · 6 comments
Hello,
I enabled middleman-blog-similar
on my website with this commit:
https://github.com/wkoszek/me/commit/2b8479586c3af100ea006b00f37a4a635f1d1658
But I don't have any indication why it picked certain pages. It somehow generated similar result for every page. You can observe it here:
Click on any 3-4 links, scroll to "You may also like" section. Choices picked:
01001011, or on the art of snare drum patterns
How to write a good Google Summer of Code Proposals
Funny mistakes and The Toyota Way
Fixing Middleman-spellcheck
Why it's not about self-driving
and they're the same for every page.
Somehow it worked differently here:
The default algorithm looks up similar articles by strings from article body by spaces.
How about using TreeTagger instead?
I just tried TreeTagger. I have it enabled with:
+activate :similar, :algorithm => :'word_frequency/tree_tagger'
and I did the TreeTagger installation steps from the README.md. Results are similar. 1 thing I see:
- blog entries tagged as
articles
all have the same "similar" articles - blog entries tagged as
books
all have the same "similar articles"
Is this normal? Do you have any other website where I could see middleman-blog-similar
used?