This project involves scraping data from the Stanford Neurology Faculty webpage and organizing it in a Google Sheets document. The data includes faculty names, individual profile links, and email addresses.
The data is scraped from the following URL: 'https://med.stanford.edu/neurology/faculty/overview.html?tab=proxy'
The following data is collected from the website:
- Names: The names of the faculty members are scraped from the main page.
- Profile Links: The individual profile links of the faculty members are collected from the href attribute of their names on the main page.
- Emails: The email addresses of the faculty members are collected from their individual profile pages.
- Web Scraping: Python's BeautifulSoup library is used for web scraping.
- Data Storage: The
gspread
library and Google Sheets API are used to store and organize the data in a Google Sheets document.
- Clone the GitHub repository.
- Install the necessary Python libraries (BeautifulSoup, gspread).
- Run the Python script to start the web scraping process.
- Check the Google Sheets document for the scraped and organized data.
Please ensure you have the necessary permissions to scrape data from the website and use the data responsibly.