Overview • Project Structure • Requirements • Installation • License
The nlp-script-test
repository contains scripts for analyzing and processing text data. It consists of two main Python scripts (manage_classes.py
and key_terms.py
), which use Natural Language Processing (NLP) to analyze articles. The project utilizes libraries such as NLTK, pandas, scikit-learn, and lxml for text data processing and analysis.
manage_classes.py
: Contains class definitions and methods for processing text data, including tokenization, lemmatization, filtering stopwords, and calculating most common words.key_terms.py
: The main script that runs the analysis of text data loaded from an XML file. Uses TfidfVectorizer to analyze the importance of words in documents.news.xml
: An XML file containing text data for analysis.
The project requires Python version 3.8 or newer, along with the following libraries:
- pandas
- nltk
- scikit-learn
- lxml
- Create python virtual environment.
- To install the required dependencies, run the following command in the terminal:
pip install -r requirements.txt
- To run the script, execute the command:
python key_terms.py
This project is licensed under the MIT License - see the LICENSE file for details.