After reading through the BeautifulSoup documentation, I realised that many common operations are not in the module. As such, I filled in as many holes as I possibly can, applying OOP principles to boost the extensibility of my web scraper. Among its features are the following:
- Extract all tables from a particular webpage and merge them based on whichever tables have the same column names
- Extract hrefs and insert them into the text itself using delimiters like brackets (same can be done for tables and lists)
- Standardises all hrefs to be complete links, rather than relational ones
I use this module most frequently, so I feel that it is the most impactful out of my earlier projects (to me, at least).