We’ve all been sitting in the back of a CS lecture or class and looked up from our laptop to actually listen and taken a quick peek around at everyone’s laptops. More likely than not, quite a few of those screens were displaying the all too familiar Hacker News orange. While maybe we should all pay more attention to the speaker, it seems new, cool news always takes precedence. So what if you were determined to never miss a single article? Or what if you wanted to get every update from the site and automate based off that new information? By leveraging the power of PubNub’s real time global network, and scraping a little RSS, everyone will never miss a new Hacker News article again. If you want to see it working live, there is a quick and dirty demo you can see here. It uses the JavaScript Pubnub SDK and will display the updates to the Hacker News feed. To see it in action locally, clone the source from Github and run the Python scraper from the command line.
The first task is to grab the RSS feed from Hacker News. There is a plethora of ways to do this and you can quickly write your own rss scraper if you want, but I decided to use Python and feedparser. With a quick “pip install feedparser” we have our RSS.
// -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
// Get Hacker News Rss
// -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
def current_hn(rss):
return feedparser.parse(rss)
There is lots of information you will get in this feed and if you want, take it all. However, I decided the most interesting information was the rank of the post, title of the post, the link to the article, and the comments link.
// -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
// Store interesting information
// -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
message = []
for index, entry in enumerate(rss.entries):
post = {}
post["rank"] = index + 1
post["title"] = entry.title
post["link"] = entry.link
post["comments"] = entry.comments
message.append(post)
The Python Argparse module is used, which very powerfully gives you robust command line options.
// -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
// Argparse
// -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
parser = ArgumentParser(description="Options to parse RSS Feed")
parser.add_argument("-t", "--time", dest="time_to_wait", type=int, default=10)
You can python hn.py --help
to see descriptions of all the options from the command line. The Python module gives you options for specifying how often you want to poll Hacker News for changes and if you want to get a new page after every change to the site or just the new posts that appear on the site. For instance, if you wanted to poll every five seconds and get the entire page you could run to be up and going:
python hn.py --mode entire --time 5
Argparse also gives defaults, so run the following to use the defaults:
python hn.py
Now that we have the information that is important to us, and know how to run the scraper locally, it's time to send it global. PubNub provides our incredibly simple API to publish the message. Quickly “pip install Pubnub” and publish our information from Hacker News.
// -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
// Publish to PubNub
// -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
pubnub.publish({
channel : "hacker-news",
message : message
})
Now it’s up to you. PubNub offers over 50 different SKD’s for your use. Take your pick. When trying to consume the information simply subscribe to the channel (in our case “hacker-news”) and you’re off. There are publically available demo publish and subscribe keys to use.
If you want to dive further into PubNub, we have lots of tutorials and walkthroughs. Happy Hacking.