I was hired to automate a web browser to visit different websites to extract data about the Mission to Mars. I scraped articles with their titles, images of Mars and its hemispheres, and a Mars Facts data table. The competed work scrapes for the newest featured articles, pictures and tables. Web scraping is a method used by organizations worldwide to extract online data for analysis. It also helps to automate tedious tasks for smaller projects. The outcome is to use these methods and start an HTML portfolio so future companies can see my capabilities.
- https://data-class-mars.s3.amazonaws.com/Mars/index.html
- https://spaceimages-mars.com
- https://galaxyfacts-mars.com
- https://marshemispheres.com/
- Python
- Jupyter Notebook
- Pandas, BeautifulSoup, Splinter, ChromeDriverManager, Flask, PyMongo, DateTime
- MongoDB
- HTML5/CSS
- Bootstrap 3
The first app.py, scraping.py, and html index returned the newest Mars article, a featured picture, and the Mars facts table.
Figure 1: The intitial local web page shows, the basic button in a header, the article and title, the featured image and table.
Then I was asked to also add one more bit of scraping to pull in the beautiful shots of the hemispheres of Mars. After assuring the code worked in Jupyter Notebook (see below); I then refracted and cleaned, while making it into a more useful tool.
After cleaning and updating the html, I also added a picture behind the button, changed the heading color for ease of reading, and added some styling to the table.
It was also nice to see the MongoDB mars_app working in the background. Here you can see what has loaded into the mars collection.