/NLP-Natural-Language-Processing-

This repository is a collection of six minor projects focused on Natural Language Processing (NLP) along with relevant datasets. The projects are designed to help individuals gain a better understanding of NLP by applying concepts to real-world problems. Additionally, the repository includes a file that provides a comprehensive overview of NLP .

Primary LanguageJupyter NotebookMIT LicenseMIT

NLP (Natural Language Processing)

What is NLP ?

NLP: Natural Language Processing (NLP) is a field of artificial intelligence (AI) that focuses on enabling machines to understand, interpret and generate human language. NLP involves analyzing and processing large amounts of human language data, such as written text or spoken language, and extracting meaning and insights from it.

nlp intro

Application of NLP

  • chatbots
  • voice assistants
  • sentiment analysis
  • language translation
  • text summarization and many more
  • With the growing popularity of digital assistants and chatbots, NLP has become an essential tool for businesses to provide efficient and personalized customer service.

application

1) Regular Expression in NLP

  • Regular expression (regex) is a pattern-matching language used to manipulate and extract text data in NLP. Regular expressions consist of a sequence of characters and metacharacters that represent a particular pattern in a text string.
  • For example, regular expressions can be used to extract all email addresses or phone numbers from a text document, or to remove all punctuation marks or stop words from a piece of text.
    Extracting phone Numbers

image as mentioned above in code we are extracting 10 digits, using '\d' we can extract digits and {n} here in place of n you can replace any number that much digits you want .
we are using findall function for matching data with our pattern

Matching Random Pattern image

Extracting Email Address

image

here we are matching text with our designed pattern for mail that is '[a-z0-9A-z_]@[a-z0-9A-z_].[a-zA-Z]'
here a-z: means any character between a to z, simillar for A-Z and 0-9.

you can view my full content of regular expression in my jupyter file :
https://github.com/meet5398/NLP-Natural-Language-Processing-/blob/57611b2b14c58a205c3f93a264daa88f31acc341/regular%20expression%20in%20NLP.ipynb

2) Text Tokenization using spacy and nltk

Text Tokenization: Text tokenization involves breaking text into smaller units or tokens, such as words or sentences. This process enables computers to analyze and understand human language.

  • Tokenization is a crucial step in many natural language processing tasks, including sentiment analysis, named entity recognition, and machine translation. image

Difference between spacy and nltk

spacy: is an open-source software library for advanced natural language processing, written in Python and Cython. It provides a variety of tools for language understanding and processing, including named entity recognition, dependency parsing, and word vectors. it returns value in terms of object.

nltk (Natural Language Toolkit) is a leading platform for building Python programs to work with human language data. It provides a range of tools for text processing and analysis, including tokenization, stemming, tagging, and parsing. it returns value in terms of string.

Prerequisites

Before running the code, make sure you have the following installed:

  • Python 3.x
  • spacy library (can be installed via pip)
  • English language model for spacy (can be downloaded via python -m spacy download en)
  • nltk library (can be installed via pip)

Some imp Screenshots:

image
in above code we are using spacy and in output we can see that it is returning sentence in object form

image
In above code we are using nltk and we can see that it is returning output of sentence in string format

For more topics and code you can view my full repository where I have also updated 6 projects on nlp :https://github.com/meet5398/NLP-Natural-Language-Processing-