Bots are essential for modern chat services. Text classification is the main tool for giving automated answers or answer suggestions. The big problem with text classification are answeing multiple languages with the same system. Luckily with fasttext we can create bots for every language. Here is a simple training and answering tool that works with every common language since fasttext is character based.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
You will need python 3.6 or higher. pip3 is also neccessary.
pip3 install fasttext
pip3 install flask
pip3 install requests
Run server.py on your local.
python3 server.py
If you would like to use a port other than 80, say, 8080 use:
python3 server.py --port 8080
Now check your localhost.
http://localhost/demo
and if using a nonstandard port like 8080 check,
http://localhost:8080/demo
End with an example of getting some data out of the system or using it for a little demo
In order to test the system, we first need to train our model with some data.
Copy the text on sampleTurkish.json and paste it on the FAQ area. Enter 2000 as epoch and 20 as word vector size. Hit the train button. You will see "training completed" in a while.
Now you can test your bot. Enter a query and click on test. Server should respond with a json as below.
[{
"predName":"__label__QnA0",
"score":0.9986332654953003,
"className":"1101xxxxxxbexx1"
},
{
"predName":"__label__QnA46",
"score":0.00092932244297117,
"className":"NotFound",
"classActual":"1101xxxxxxbexx6"
},
{
"predName":"__label__QnA3",
"score":0.0002752315194811672,
"className":"NotFound",
"classActual":"1101xxxxxxbexx4"
}]
If you look at the responses above, you will see the best result first. The classnames come from the sample data.
Fasttext performs better with an extra NotFound class, having a huge number of arbitrary sentences. If a class has less than 0.45 accurracy, we accept it as ununderstood and the program returns NotFound className. Threshold parameter can be updated in the code.
Fasttext supports Ubuntu, we recommend using it as it is hard to get fasttext running on other systems.
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.
We use SemVer for versioning. For the versions available, see the tags on this repository.
- B Ozan Bozkurt - Initial work - bozanbozkurt
This project is licensed under the MIT License - see the LICENSE.md file for details
- Special thanks to SOR'UN team in creating sample data.