This project pre-processor web server for NLP.
num2words
waitress
Unidecode
contractions
emoji
flask
torch
spacy (and "en_core_web_sm" spacy model)
* First, Upload text file
* Second, Select pre-processing option.
* If the option requires an additional value, fill in the "value".
* If the option requires an additional value, fill in the "value2".
space_normalizer: Whitespace normalization. (Including newline removal). No additional value required.
to_capitalize: Capitalize only the first letter of the line, and change the rest to lowercase. No additional value required.
to_lower: Change all words to lowercase. No additional value required.
accent: Replace accented characters like ï with regular characters. No additional value required.
expander: Increases the abbreviation(can't → can not). No additional value required.
emoji_remover: Remove emojis. No additional value required.
emoji_to_text: Change emojis to text. No additional value required.
lemmatizer: Turns all words into basic form(is -> be, bats -> bat). But, very slow option. No additional value required.
html_tag_remover: Remove the html tag. No additional value required.
url_remover: Remove url. No additional value required.
number_to_text: Turn numbers into words. No additional value required.
number_normalizer: Replace the number with another one you entered. Need additional value.
short_line_remover: It also takes a number and removes the lines shorter than the number. Need additional value.
short_word_remover: It also takes a number and removes words shorter than the number. Need additional value.
full_stop_normalizer: It receives special characters and turns them all into dot. Need additional value.
comma_normalizer: Receive special characters and turn them all into commas. Need additional value.
special_remover: Receive the special characters to be deleted and delete them all. Need additional value.
special_replacer: Input and replace special characters and words to be replaced. Need additional value, value2.
word_replacer: Replace the word with another one you entered. Need additional value, value2.
text_file: Text file you want to pre-process.
option: The pre-processing option.
value: The value required by the specific option.
value2: Second value required by specific option.
text/plain
Sample data download: GitHub
Hello, guys my name is Kïm. And I have 10-blocks.
I like Orange🍊🍊🍊 & Banana🍌 @ peach🍑🍑 that yummy...!
<hr>
Okay; I love it. I can eat 23 oranges at same time.
lol
Oh.... you can't? sorry😢.
-
input: option=space_normalizer, value=whatever, value2=whatever
curl -X POST "https://master-pre-processor-fpem123.endpoint.ainize.ai/dpp" -H "accept: text/plain" -H "Content-Type: multipart/form-data" -F "option=space_normalizer" -F "value=" -F "value2=" -F "text_file=@sample.txt;type=text/plain"
-
output
Hello, guys my name is Kïm. And I have 10-blocks. I like Orange🍊🍊🍊 & Banana🍌 @ peach🍑🍑 that yummy...!
Okay; I love it. I can eat 23 oranges at same time. lol Oh.... you can't? sorry😢.
-
input: option=to_capitalize, value=whatever, value2=whatever
curl -X POST "https://master-pre-processor-fpem123.endpoint.ainize.ai/dpp" -H "accept: text/plain" -H "Content-Type: multipart/form-data" -F "option=to_capitalize" -F "value=" -F "value2=" -F "text_file=@sample.txt;type=text/plain"
-
output
Hello, guys my name is kïm. And i have 10-blocks. I like orange🍊🍊🍊 & banana🍌 @ peach🍑🍑 that yummy...!
Okay; i love it. I can eat 23 oranges at same time.
Lol Oh.... You can't? sorry😢.
-
input: option=to_lower, value=whatever, value2=whatever
curl -X POST "https://master-pre-processor-fpem123.endpoint.ainize.ai/dpp" -H "accept: text/plain" -H "Content-Type: multipart/form-data" -F "option=to_lower" -F "value=" -F "value2=" -F "text_file=@sample.txt;type=text/plain"
-
output
hello, guys my name is kïm. and i have 10-blocks. i like orange🍊🍊🍊 & banana🍌 @ peach🍑🍑 that yummy...!
okay; i love it. i can eat 23 oranges at same time.
lol oh.... you can't? sorry😢.
-
input: option=accent, value=whatever, value2=whatever
curl -X POST "https://master-pre-processor-fpem123.endpoint.ainize.ai/dpp" -H "accept: text/plain" -H "Content-Type: multipart/form-data" -F "option=accent" -F "value=" -F "value2=" -F "text_file=@sample.txt;type=text/plain"
-
output
Hello, guys my name is Kim. And I have 10-blocks. I like Orange & Banana @ peach that yummy...!
Okay; I love it. I can eat 23 oranges at same time.
lol Oh.... you can't? sorry.
-
input: option=expander, value=whatever, value2=whatever
curl -X POST "https://master-pre-processor-fpem123.endpoint.ainize.ai/dpp" -H "accept: text/plain" -H "Content-Type: multipart/form-data" -F "option=expander" -F "value=" -F "value2=" -F "text_file=@sample.txt;type=text/plain"
-
output
Hello, guys my name is Kïm. And I have 10-blocks. I like Orange🍊🍊🍊 & Banana🍌 @ peach🍑🍑 that yummy...!
Okay; I love it. I can eat 23 oranges at same time.
lol Oh.... you can not? sorry😢.
-
input: option=emoji_remover, value=whatever, value2=whatever
curl -X POST "https://master-pre-processor-fpem123.endpoint.ainize.ai/dpp" -H "accept: text/plain" -H "Content-Type: multipart/form-data" -F "option=emoji_remover" -F "value=" -F "value2=" -F "text_file=@sample.txt;type=text/plain"
-
output
Hello, guys my name is Km. And I have 10-blocks. I like Orange & Banana @ peach that yummy...!
Okay; I love it. I can eat 23 oranges at same time.
lol Oh.... you can't? sorry.
-
input: option=emoji_to_text, value=whatever, value2=whatever
curl -X POST "https://master-pre-processor-fpem123.endpoint.ainize.ai/dpp" -H "accept: text/plain" -H "Content-Type: multipart/form-data" -F "option=emoji_to_text" -F "value=" -F "value2=" -F "text_file=@sample.txt;type=text/plain"
-
output
Hello, guys my name is Kïm. And I have 10-blocks. I like Orange tangerine tangerine tangerine & Banana banana @ peach peach peach that yummy...!
Okay; I love it. I can eat 23 oranges at same time.
lol Oh.... you can't? sorry crying_face.
-
input: option=lemmatizer, value=whatever, value2=whatever
curl -X POST "https://master-pre-processor-fpem123.endpoint.ainize.ai/dpp" -H "accept: text/plain" -H "Content-Type: multipart/form-data" -F "option=lemmatizer" -F "value=" -F "value2=" -F "text_file=@sample.txt;type=text/plain"
-
output
hello , guy my name be Kïm . and I have 10 - block . I like Orange 🍊 🍊 🍊 & Banana 🍌 @ peach 🍑 🍑 that yummy ... ! < hr >
okay ; I love it . I can eat 23 orange at same time .
lol oh .... you ca n't ? sorry 😢 .
-
input: option=html_tag_remover, value=whatever, value2=whatever
curl -X POST "https://master-pre-processor-fpem123.endpoint.ainize.ai/dpp" -H "accept: text/plain" -H "Content-Type: multipart/form-data" -F "option=html_tag_remover" -F "value=" -F "value2=" -F "text_file=@sample.txt;type=text/plain"
-
output:
Hello, guys my name is Kïm. And I have 10-blocks. I like Orange🍊🍊🍊 & Banana🍌 @ peach🍑🍑 that yummy...!
Okay; I love it. I can eat 23 oranges at same time.
lol Oh.... you can't? sorry😢.
-
input: option=number_to_text, value=whatever, value2=whatever
curl -X POST "https://master-pre-processor-fpem123.endpoint.ainize.ai/dpp" -H "accept: text/plain" -H "Content-Type: multipart/form-data" -F "option=number_to_text" -F "value=" -F "value2=" -F "text_file=@sample.txt;type=text/plain"
-
output
Hello, guys my name is Kïm. And I have ten-blocks. I like Orange🍊🍊🍊 & Banana🍌 @ peach🍑🍑 that yummy...!
Okay; I love it. I can eat twenty-three oranges at same time.
lol Oh.... you can't? sorry😢.
-
input: option=number_normalizer, value=number, value2=whatever
curl -X POST "https://master-pre-processor-fpem123.endpoint.ainize.ai/dpp" -H "accept: text/plain" -H "Content-Type: multipart/form-data" -F "option=number_normalizer" -F "value=number" -F "value2=" -F "text_file=@sample.txt;type=text/plain"
-
output
Hello, guys my name is Kïm. And I have number-blocks. I like Orange🍊🍊🍊 & Banana🍌 @ peach🍑🍑 that yummy...!
Okay; I love it. I can eat number oranges at same time.
lol Oh.... you can't? sorry😢.
-
input: option=short_line_remover, value=4, value2=whatever
curl -X POST "https://master-pre-processor-fpem123.endpoint.ainize.ai/dpp" -H "accept: text/plain" -H "Content-Type: multipart/form-data" -F "option=short_line_remover" -F "value=4" -F "value=" -F "text_file=@sample.txt;type=text/plain"
-
output
Hello, guys my name is Kïm. And I have 10-blocks. I like Orange🍊🍊🍊 & Banana🍌 @ peach🍑🍑 that yummy...!
Okay; I love it. I can eat 23 oranges at same time. Oh.... you can't? sorry😢.
-
input: option=short_word_remover, value=3, value2=whatever
curl -X POST "https://master-pre-processor-fpem123.endpoint.ainize.ai/dpp" -H "accept: text/plain" -H "Content-Type: multipart/form-data" -F "option=short_word_remover" -F "value=3" -F "value2=" -F "text_file=@sample.txt;type=text/plain"
-
output
Hello, guys name have-blocks. like Orange🍊🍊🍊 & Banana🍌 @ peach🍑🍑 that yummy...!
Okay love oranges same time.
? sorry😢.
-
input: option=full_stop_normalizer, value=;!?, value2=whatever
curl -X POST "https://master-pre-processor-fpem123.endpoint.ainize.ai/dpp" -H "accept: text/plain" -H "Content-Type: multipart/form-data" -F "option=full_stop_normalizer" -F "value=;!?" -F "value2=" -F "text_file=@sample.txt;type=text/plain"
-
output
Hello, guys my name is Kïm. And I have 10-blocks. I like Orange🍊🍊🍊 & Banana🍌 @ peach🍑🍑 that yummy....
Okay. I love it. I can eat 23 oranges at same time.
lol Oh.... you can't. sorry😢.
-
input: option=comma_normalizer, value=&;-@, value2=whatever
curl -X POST "https://master-pre-processor-fpem123.endpoint.ainize.ai/dpp" -H "accept: text/plain" -H "Content-Type: multipart/form-data" -F "option=comma_normalizer" -F "value=&;-@" -F "value2=" -F "text_file=@sample.txt;type=text/plain"
-
output
Hello, guys my name is Kïm. And I have 10-blocks. I like Orange🍊🍊🍊 , Banana🍌 , peach🍑🍑 that yummy...! ,hr,
Okay, I love it. I can eat 23 oranges at same time.
lol Oh.... you can't, sorry😢.
-
input: option=special_remover, value=:;,-, value2=whatever
curl -X POST "https://master-pre-processor-fpem123.endpoint.ainize.ai/dpp" -H "accept: text/plain" -H "Content-Type: multipart/form-data" -F "option=special_remover" -F "value=:;,-" -F "value2=" -F "text_file=@sample.txt;type=text/plain"
-
output
Hello guys my name is Kïm. And I have 10 blocks. I like Orange🍊🍊🍊 & Banana🍌 @ peach🍑🍑 that yummy...!
Okay I love it. I can eat 23 oranges at same time.
lol Oh.... you can't? sorry😢.
-
input: option=special_replacer, value=&@, value2=and
curl -X POST "https://master-pre-processor-fpem123.endpoint.ainize.ai/dpp" -H "accept: text/plain" -H "Content-Type: multipart/form-data" -F "option=special_replacer" -F "value=&@" -F "value2=and" -F "text_file=@sample.txt;type=text/plain"
-
output
Hello, guys my name is Kïm. And I have 10-blocks. I like Orange🍊🍊🍊 and Banana🍌 and peach🍑🍑 that yummy...!
Okay; I love it. I can eat 23 oranges at same time.
lol Oh.... you can't? sorry😢.
-
input: option=word_replacer, value=Orange, value2=Apple
curl -X POST "https://master-pre-processor-fpem123.endpoint.ainize.ai/dpp" -H "accept: text/plain" -H "Content-Type: multipart/form-data" -F "option=word_replacer" -F "value=Orange" -F "value2=Apple" -F "text_file=@sample.txt;type=text/plain"
-
output
Hello, guys my name is Kïm. And I have 10-blocks. I like Apple🍊🍊🍊 & Banana🍌 @ peach🍑🍑 that yummy...!
Okay; I love it. I can eat 23 oranges at same time.
lol Oh.... you can't? sorry😢.
-
input: option=special_replacer, value=&@, value2=and, option=html_tag_remover, value=whatever, value2=whatever, option=space_normalizer, value=whatever, value2=whatever
curl -X POST "https://master-pre-processor-fpem123.endpoint.ainize.ai/dpp" -H "accept: text/plain" -H "Content-Type: multipart/form-data" -F "option=special_replacer" -F "value=&@;" -F "value2=and" -F "option=html_tag_remover" -F "value=" -F "value2=" -F "option=space_normalizer" -F "value=" -F "value2=" -F "text_file=@sample.txt;type=text/plain"
-
output
Hello, guys my name is Kïm. And I have 10-blocks. I like Orange🍊🍊🍊 and Banana🍌 and peach🍑🍑 that yummy...! Okay; I love it. I can eat 23 oranges at same time. lol Oh.... you can't? sorry😢.
API page: In Ainize
Demo page: End-point