algolia/hn-search

Why does the Algolia HN API return this comment's text with `\\n`s instead of `<p>`s

spookyuser opened this issue · 1 comments

Until now, every comment I've requested from the Algolia HN API has returned the comment's text field with html characters, and specifically <p> element's for new lines in the comments. However I just stumbled upon this comment which, when requested through Algolia's API, returns comment text that is formatted like this:

<p>SORT-OF SUMMARY (I didn&#x27;t write this - it was posted on What.cd&#x27;s forum by a user)</p><p>The story so far. Corrections welcome -- just add them.</p><p>Three short stories by J.D. Salinger (1919-2010) were to remain unpublished\\nuntil 2060, but were released onto the Internet late 27Nov2013 and removed\\nearly the next day (Thanksgiving morning in the USA).</p><p>TWO LIVES</p><p>J.D. Salinger (1919-2010) enjoyed his personal creativity better if he\\npublished less because he felt the presence of a public following -- not to\\nmention reviewers and critics -- was a constraint on his freedom. Salinger\\nstopped publishing and instructed...

(Click here to get the comment through the Algolia API.)

While there are some <p> elements, many of the line breaks are replaced with \\n

Is this expected behavior, or is something going weird with the API?

@spookyuser Just check the indexing code, we dont seem to do any transformation to the strings, so it's possible that that's how the original comment looks like, especially as it's not happening anywhere else, so I'd sadly say this is expected. If you have ideas on how we could fix or improve this, dont hesitate to drop a PR, the crawling code is inside app/workers/hacker_news_realtime_crawler.rb