/WebScrapingProject

A web scraping project that uses Scrapy, Pandas, PyYAML, Selenium Wire, and PyMongo to extract variations and mutations of COVID-19 from https://cov-lineages.org/lineage_list.html, store them in a MongoDB database, and provide a Python GUI for tracking and viewing the data.

Primary LanguagePython

WebScrapingProject

A web scraping project that uses Scrapy, Pandas, PyYAML, Selenium Wire, and PyMongo to extract variations and mutations of COVID-19 from https://cov-lineages.org/lineage_list.html, store them in a MongoDB database, and provide a Python GUI for tracking and viewing the data. First, we would use Scrapy, a web scraping framework in Python, to extract the HTML content from https://cov-lineages.org/lineage_list.html. We would then use XPath selectors to extract the relevant information, including the lineage, the mutation, the amino acid change, and the frequency.

Next, we would use Pandas, a data manipulation library in Python, to clean and organize the extracted data into a tabular format. This would make it easier to work with the data and perform further analysis.

We would then use PyYAML, a YAML parser and emitter for Python, to load and parse a configuration file that contains the database connection information. This would enable us to connect to a MongoDB database and store the scraped data.

To track the web scraping process, we would use Selenium and Selenium Wire, which are libraries in Python for automated web testing. We would set up a headless browser and use it to navigate to https://cov-lineages.org/lineage_list.html, simulate user behavior, and capture any errors or exceptions that occur during the scraping process.

Finally, we would use PyMongo, a Python library for working with MongoDB, to insert the scraped data into the database. We would also use a Python GUI, such as PyQt or Tkinter, to provide a user-friendly interface for querying and viewing the scraped data in the database.

Overall, this web scraping project would automate the process of extracting COVID-19 variation and mutation data from https://cov-lineages.org/lineage_list.html, clean and organize the data, store it in a MongoDB database, and provide a user-friendly interface for querying and viewing the data.