Cleantextclip

Library to prepare text for machine learning and NLP tasks. Originated from CLIP model preparation, but a few more rules were added.

Installation

pip install -U ternaus_cleantext

Cleans text similar, but stricter than in the CLIP model:

from ternaus_cleantext.ternaus_cleantext import clean_text
print(clean_text("This is a test https://ternaus.com <b>bold</b>"))

returns this is a test bold