NLP script test

Overview • Project Structure • Requirements • Installation • License

Overview

The nlp-script-test repository contains scripts for analyzing and processing text data. It consists of two main Python scripts (manage_classes.py and key_terms.py), which use Natural Language Processing (NLP) to analyze articles. The project utilizes libraries such as NLTK, pandas, scikit-learn, and lxml for text data processing and analysis.

Project Structure

manage_classes.py: Contains class definitions and methods for processing text data, including tokenization, lemmatization, filtering stopwords, and calculating most common words.
key_terms.py: The main script that runs the analysis of text data loaded from an XML file. Uses TfidfVectorizer to analyze the importance of words in documents.
news.xml: An XML file containing text data for analysis.

Requirements

The project requires Python version 3.8 or newer, along with the following libraries:

pandas
nltk
scikit-learn
lxml

Installation

Create python virtual environment.
To install the required dependencies, run the following command in the terminal:

pip install -r requirements.txt

To run the script, execute the command:

python key_terms.py

License

This project is licensed under the MIT License - see the LICENSE file for details.

Portfolio • GitHub • LinkedIn • YouTube • TikTok