/HTML-Text-Processing-and-Unique-Word-Extraction

This Python script extracts text content from an HTML page, processes it, and extracts unique words from the processed text. The script utilizes various text processing techniques including cleaning, normalization, tokenization, lemmatization or stemming, and stop words removal.

Primary LanguageJupyter Notebook

Watchers