The respository contains serveral encoder decoder model with Luong's Attention for Hinglish text normalization. Since the data was propriety, this repository includes just the models as samples.
virtualenv venv
source <path-to-venv>/bin/activate
pip install requirements.txt
python app.py
df, data_info = readData( 'data/output.noisy.processed', 'data/output.normalized.processed')
filename = <prefix-path-to-model-storage>
modelObj = <MODEL_CLASS>(data_info, attention = True)
modelObj.buildModel()
train(filename, modelObj, df[0:10], data_info, batchGenerator, 100, 0, 5, encoding="padd0")
Encoding can be "padd0" or "paddNL"
df, data_info = readData( 'data/output.noisy.processed', 'data/output.normalized.processed')
filename = <prefix-path-to-model-storage>
modelObj = <MODEL_CLASS>(data_info, attention = True)
modelObj.loadModel(filename)
evaluate(modelObj, filename, df, data_info, encoding="padd0", batch_size=5000)
df, data_info = readData( 'data/output.noisy.processed', 'data/output.normalized.processed')
filename = <prefix-path-to-model-storage>
modelObj = <MODEL_CLASS>(data_info, attention = True)
modelObj.loadModel(filename)
decodeSample(filename, modelObj, <DATAFRAME|LIST-OF-STRINGs|LIST-OF-DF-INDICES>, data_info, encoding="padd0", batch_size=5000)