/extractorChinese

NLP model for extracting chinese datas from the documents

Primary LanguagePython

extract-chinese

Extract Chinese and English from 2 documents and matching them by same meaning sentences.

Getting Started

This project is a python project to extract two chinese and english sentences text from 2 PDFs. And to match the sentences by cosine score created embedding values.

pip install pdfplumber pip install nltk pip install jieba pip install sentence_transformers ...

Open python console

import nltk nltk.download('punkt')

and set some env values