Welcome to the NLP Practice repository! This repository contains various NLP (Natural Language Processing) practices using Python libraries such as NLTK and SpaCy. The exercises include Stemming, Lemmatization, POS Tagging, and Named Entity Recognition (NER).
This repository showcases hands-on practice with key NLP concepts using NLTK and SpaCy. It covers:
- Stemming: Reducing words to their root form using NLTK.
- Lemmatization: Converting words to their base form using SpaCy.
- POS Tagging: Assigning parts of speech to words using SpaCy.
- Named Entity Recognition (NER): Identifying and categorizing entities in text using SpaCy.
- Python 3.x
- NLTK
- SpaCy
-
Clone the repository:
git clone https://github.com/YourUsername/NLP-Practice.git cd NLP-Practice
-
Create a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required libraries:
pip install -r requirements.txt
-
Download SpaCy's English model:
python -m spacy download en_core_web_sm
The script demonstrates how to perform stemming using NLTK's PorterStemmer
. Stemming reduces words to their root forms, which may not always be actual words.
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()
words = ["eating", "eats", "eat", "ate", "adjustable", "rafting", "ability", "meeting"]
for word in words:
print(word, "|", stemmer.stem(word))
Lemmatization converts words to their base or dictionary form. Unlike stemming, lemmatization provides real words as output.
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("eating eats eat ate adjustable rafting ability meeting better")
for token in doc:
print(token, " | ", token.lemma_)
Part of Speech (POS) tagging assigns a grammatical category to each token in a text, such as noun, verb, adjective, etc.
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Elon flew to mars yesterday. He carried biryani masala with him")
for token in doc:
print(token," | ", token.pos_, " | ", spacy.explain(token.pos_))
NER identifies and classifies named entities in text into categories such as person names, organizations, locations, monetary values, etc.
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Tesla Inc is going to acquire Twitter Inc for $45 billion")
for ent in doc.ents:
print(ent.text, " | ", ent.label_, " | ", spacy.explain(ent.label_))
Contributions are welcome! Please open an issue or submit a pull request if you'd like to contribute to this repository.
This repository is licensed under the MIT License.
You can customize the repository name, clone link, and other sections according to your preferences.