This is the Rails 5 application providing HN Search. It's leveraging react on the frontend, algoliasearch-rails for the search and uses wkhtmltoimage to crawl+render thumbnails.
We love pull-requests :)
# clone the repository
git clone https://github.com/algolia/hn-search
cd hn-search
# install dependencies
bundle install
# setup credentials
cp config/database.example.yml config/database.yml # feel free to edit, default configuration is OK for search-only
cp config/application.example.yml config/application.yml # feel free to edit, default configuration is OK for search-only
# setup your (sqlite3) database
bundle exec rake db:migrate
# start contributing enjoying Guard (watchers, livereload, notifications, ...)
bundle exec guard
# done!
open http://localhost:3000
If you want to contribute to the UI, the only directory you need to look at is app/assets
. This directory contains all the JS, HTML & CSS code.
To deploy, we're using capistrano and therefore you need SSH access to the underlying machines and run from your own computer:
bundle exec cap deploy
There is currently (December 2018) a bug with bluepill
stopping the deployment. To workaround it, you need to force a restart with the following command instead:
bundle exec cap deploy:restart
There seems to as well be an issue with thin server, where after deployment orphaned thin processes are not killed. This means that the server tries serving previous version of the app and causes ChunkLoadErrors
as the manifest points to no longer existing files. To fix the intermittent errors, you need to ssh to both servers, check for any orphaned thin processes and kill them manually.
ps aux | grep thin
kill <insert old thin process pid's>
The indexing is configured using the following algoliasearch
block:
class Item < ActiveRecord::Base
include AlgoliaSearch
algoliasearch per_environment: true do
# the list of attributes sent to Algolia's API
attribute :created_at, :title, :url, :author, :points, :story_text, :comment_text, :author, :num_comments, :story_id, :story_title
attribute :created_at_i do
created_at.to_i
end
# `title` is more important than `{story,comment}_text`, `{story,comment}_text` more than `url`, `url` more than `author`
# btw, do not take into account position in most fields to avoid first word match boost
attributesToIndex ['unordered(title)', 'unordered(story_text)', 'unordered(comment_text)', 'unordered(url)', 'author', 'created_at_i']
# list of attributes to highlight
attributesToHighlight ['title', 'story_text', 'comment_text', 'url', 'story_url', 'author', 'story_title']
# tags used for filtering
tags do
[item_type, "author_#{author}", "story_#{story_id}"]
end
# use associated number of HN points to sort results (last sort criteria)
customRanking ['desc(points)', 'desc(num_comments)']
# controls the way results are sorted sorting on the following 4 criteria (one after another)
# I removed the 'exact' match critera (improve 1-words query relevance, doesn't fit HNSearch needs)
ranking ['typo', 'proximity', 'attribute', 'custom']
# google+, $1.5M raises, C#: we love you
separatorsToIndex '+#$'
end
def story_text
item_type_cd != Item.comment ? text : nil
end
def story_title
comment? && story ? story.title : nil
end
def story_url
comment? && story ? story.url : nil
end
def comment_text
comment? ? text : nil
end
def comment?
item_type_cd == Item.comment
end
end
- HackerNews
- Firebase for the real-time crawling API
- wkhtmltoimage to back the thumbnails' crawl+rendering