ngs/middleman-blog-similar

[RFC] we need more debugging printouts

wkoszek opened this issue · 6 comments

Hello,

I enabled middleman-blog-similar on my website with this commit:

https://github.com/wkoszek/me/commit/2b8479586c3af100ea006b00f37a4a635f1d1658

But I don't have any indication why it picked certain pages. It somehow generated similar result for every page. You can observe it here:

http://www.koszek.com/

Click on any 3-4 links, scroll to "You may also like" section. Choices picked:

01001011, or on the art of snare drum patterns
How to write a good Google Summer of Code Proposals
Funny mistakes and The Toyota Way
Fixing Middleman-spellcheck
Why it's not about self-driving

and they're the same for every page.

ngs commented

The default algorithm looks up similar articles by strings from article body by spaces.

How about using TreeTagger instead?

I just tried TreeTagger. I have it enabled with:

+activate :similar, :algorithm => :'word_frequency/tree_tagger'

and I did the TreeTagger installation steps from the README.md. Results are similar. 1 thing I see:

  • blog entries tagged as articles all have the same "similar" articles
  • blog entries tagged as books all have the same "similar articles"

Is this normal? Do you have any other website where I could see middleman-blog-similar used?

@ngs ping. :-)

@ngs Ping. I didn't really get any better results with Tree Trigger. Do you think we could work on an improved documentation? I'd like to try 2 strategies:

  • random selection
  • semi-random based on tags of articles.
ngs commented

Refactored in #15 , This may not reproduce, please re-open if you see this error again. Thanks.