/Mars-Web-Scrape-Info-Page

In this project, I built a web application that scrapes various websites for data related to the Mission to Mars and displays the information on a single HTML page.

Primary LanguageJupyter Notebook

Mars Web Scrape Info Page

In this project I built a web application that scrapes various websites for data related to the Mission to Mars and displays the information in a single HTML page. The following outlines what I did.

Technolgies: Jupyter Notebook, BeautifulSoup, Pandas, and Requests/Splinter, MongoDB, Python, Flask, SQLAlchemy

Step 1 - Scraping

  1. NASA Mars News: https://mars.nasa.gov/news/?page=0&per_page=40&order=publish_date+desc%2Ccreated_at+desc&search=&category=19%2C165%2C184%2C204&blank_scope=Latest
  • First I scraped the NASA Mars News Site and collected the latest News Title and Paragraph Text.
  • I then Assigned the text to variables to be referenced later.
  1. JPL Mars Space Images - Featured Image: https://www.jpl.nasa.gov/images?search=&category=Mars
  • Next I Visited the url for JPL Featured Space Image.
  • I then used splinter to navigate the site and find the image url for the current Featured Mars Image and assign the url string to a variable.
  1. Mars Facts: https://space-facts.com/mars/
  • The next items I grabed were on the Mars Facts webpage.
  • I used Pandas to scrape the table containing facts about the planet including Diameter, Mass, etc.
  • I then used Pandas to convert the data to a HTML table string.
  1. Mars Hemispheres: https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars ((This sign is currently down and may cause issues))
  • Lastly I visited the USGS Astrogeology site here to obtain high resolution images for each of Mar's hemispheres.
  • I used splinter to click each of the links to the hemispheres in order to find the image url to the full resolution image.
  • I then saved both the image url string for the full resolution hemisphere image, and the Hemisphere title containing the hemisphere name.
  • I used a Python dictionary to store the data.
  • Then I appended the dictionary with the image url string and the hemisphere title to a list.
  • This list contains one dictionary for each hemisphere.

Step 2 - MongoDB and Flask Application

  1. First I utilized MongoDB with Flask templating to create a new HTML page that displays all of the information that was scraped from the URLs above.

  2. I then converted my Jupyter notebook into a Python script with a function called scrape that will execute all of the scraping code from above and return one Python dictionary containing all of the scraped data.

  3. Next, I built a route called /scrape that will import the python script and called my scrape function.

  4. I Stored the return value in Mongo as a Python dictionary.

  5. Next I created a root route / that queries my your Mongo database and passes the mars data into an HTML template to display the data.

  6. Lastly I custom built a template HTML file called index.html that takes the mars data dictionary and display's all of the data in the appropriate HTML elements.

Here's what the final product looked like!