In this project, I compare two types of posts from a popular site Hacker News to determine:
- Which of them receive more comments on average?
- Do posts created at a certain time receive more comments on average?
The types of posts I'm interested in are Ask HN
(created to ask a question to the community) and Show HN
(created to show the community a project that you've created).
Hacker News is a site started by the startup incubator Y Combinator, where user-submitted stories (known as "posts") receive votes and comments, similar to reddit. Hacker News is extremely popular in technology and startup circles, and posts that make it to the top of the Hacker News listings can get hundreds of thousands of visitors as a result.
The original dataset can be found on Kaggle. For this project, the original dataset was downsampled to this set. The number of rows was reduced from almost 300,000 rows to approximately 20,000 rows by removing all submissions that didn't receive any comments and then randomly sampling from the remaining submissions.
id
: the unique identifier from Hacker News for the posttitle
: the title of the posturl
: the URL that the posts links to, if the post has a URLnum_points
: the number of points the post acquired, calculated as the total number of upvotes minus the total number of downvotesnum_comments
: the number of comments on the postauthor
: the username of the person who submitted the postcreated_at
: the date and time of the post's submission
- Python:
- data analysis: working with strings, OOP (Object-Oriented Programming), working with dates and times
- Jupyter Notebook