/Keyword_Extractor

The keyword extraction process helps us in identifying the important words. It also effective in topic modeling tasks. You can know a lot about your text data by only a few keywords. These keywords will help you to determine whether you want to read an article or not.

Primary LanguagePython

Keyword Extractor

The keyword extraction technique will sift through the whole set of data in minutes and obtain the words and phrases that best describe each subject. This way, you can easily identify which parts of the available data cover the subjects you are looking for while saving your teams many hours of manual processing.

Streamlit app: https://ankitkalauni-streamlitkeywordexe-main-tgj0pu.streamlitapp.com/

How to download

  1. Clone this repo to your local machine

  2. make virtualenv (recommended)

  3. open and change terminal location to the project directory

  4. run the below command after activating the virtualenv

     pip install -r requirement.txt
    
  5. now run the below command

     streamlit run main.py
    

The following text will be shown

$ streamlit run main.py

You can now view your Streamlit app in your browser.

Local URL: http://localhost:8501
Network URL: http://192.168.0.105:8501

Open a browser with the local URL given in a terminal

Home page

  • Below is the home page of the keyword Extractor Streamlit app

Home Page

Upload a doc file

  • you can upload doc/docx file (max 200mb)

Upload doc

Process and edit the document online

  • process the file and edit it online

edit doc

Extract the keywords and download them as a PDF/DOC file

  • Apply keyword extraction to the edited doc file and download it as a PDF/DOC file.
  • For now, only spacy with textrank,Rake and tf-idf are implemented.

Currently using Spacy with textrank layer


download file

Save document with Keywords at the end of the download document

  • The kewords will be appended to the last of the downloaded document.

keywords

Preprocessing

Data preprocessing is an essential step in building a Machine Learning model and depending on how well the data has been preprocessed; the results are seen.

In NLP, text preprocessing is the first step in the process of building a model.

The various text preprocessing steps are:

Tokenization Lower casing Stop words removal Stemming Lemmatization These various text preprocessing steps are widely used for dimensionality reduction.

Learn more about - Text Preprocessing in Natural Language Processing

Contribution

  • If you have better idea and want to update this tool. please contribute to this project.