This project explains how you can use web scraping techniques to build a data set that is then used in a TF-IDF model
If you rather use the notebook you can view it on:
https://colab.research.google.com/drive/1qMP-2Ekh4rqUxUK9bK3V6qwk9AML7lu6?usp=sharing