/hacker-news-posts

This is a guided project under Dataquest (Data Engineering track) where I analyzed a data set of submissions to Hacker News and determined which posts will most likely be commented and upvoted.

Primary LanguageJupyter Notebook

Exploring Hacker News Posts

This is a guided project under Dataquest (Data Engineering track) where I analyzed a data set of submissions to Hacker News (HN), a website focusing on computer science and entrepreneurship. I am interested in two specific categories of user-submitted posts: Ask HN posts where users ask the HN community a specific question, and Show HN posts where users present the HN community an interesting project or product. I wanted to know the following:

  • Do Ask HN or Show HN posts receive more comments on average?
  • Do posts created at a certain time receive more comments on average?
  • Do Ask HN or Show HN posts receive more upvotes or points on average?
  • Do posts created at a certain time receive more points on average?
  • Do posts other than Ask HN or Show HN receive more comments and points on average?

I performed the following for the data analysis:

  • Opening and exploration of the hacker_news.csv data set
  • Extraction of Ask HN and Show HN posts
  • Calculation of the average number of comments for Ask HN and Show HN posts
  • Determination of the number of Ask HN posts and comments by hour created
  • Calculation of the average number of comments for Ask HN posts by hour
  • Calculation of the average number of upvotes or points for Ask HN and Show HN posts
  • Determination of the number of points by hour created for Ask HN posts
  • Calculation of the average number of points for Ask HN posts by hour
  • Calculation of the average number of comments and points for other posts

Results of the data analysis show that to maximize the number of comments and upvotes a post receives, the post should be categorized as an Ask HN post and created around 3:00 - 4:00 EST.

Please see the hacker_news.csv data set and the full exploratory data analysis in the Project 2.ipynb notebook above.