This gem removes the surplus “clutter” (boilerplate, templates) around the main textual content of a web page (pure Ruby implementation). It's especially made for news websites content. It's also able to extract microdata and other HTML meta data.
gem install BoilerpipeArticle
###Usage Example
require 'boilerpipe_article'
require 'net/http'
uri = URI('')
html = Net::HTTP.get(uri)
parser =
articleText = parser.getArticle
metas = parser.getMetas
microdata = parser.getMicroData
allText = parser.getAllText
puts articleText
puts metas
puts microdata
nokogiri = 1.6.8 mida = 0.3.9
Check out for lastest updates and API