Exploring Hacker News Posts

Project Description

In this project, I compare two types of posts from a popular site Hacker News to determine:

Which of them receive more comments on average?
Do posts created at a certain time receive more comments on average?

The types of posts I'm interested in are Ask HN (created to ask a question to the community) and Show HN (created to show the community a project that you've created).

Data Set

Hacker News is a site started by the startup incubator Y Combinator, where user-submitted stories (known as "posts") receive votes and comments, similar to reddit. Hacker News is extremely popular in technology and startup circles, and posts that make it to the top of the Hacker News listings can get hundreds of thousands of visitors as a result.

The original dataset can be found on Kaggle. For this project, the original dataset was downsampled to this set. The number of rows was reduced from almost 300,000 rows to approximately 20,000 rows by removing all submissions that didn't receive any comments and then randomly sampling from the remaining submissions.

The descriptions of the columns

id: the unique identifier from Hacker News for the post
title: the title of the post
url: the URL that the posts links to, if the post has a URL
num_points: the number of points the post acquired, calculated as the total number of upvotes minus the total number of downvotes
num_comments: the number of comments on the post
author: the username of the person who submitted the post
created_at: the date and time of the post's submission

Technologies

Python:
- data analysis: working with strings, OOP (Object-Oriented Programming), working with dates and times
Jupyter Notebook

gaiaengineer/hacker_news_posts

Exploring Hacker News Posts

Project Description

Data Set

The descriptions of the columns

Technologies