Speedtoxify is a wrapper around detoxify
that speeds up inference by 2-4x by using ONNX runtime.
Detoxify is a NLP library for detecting toxic / inappropriate / profane texts. Speedtoxify makes use of their pretrained models and runs them in ONNX runtime for much faster inference speeds, which makes it the better option for being used in production.
Speedtoxify provides the same Python API as Detoxify, so it can be used as a drop-in replacement.
However, if your focus is on fine-tuning / re-training the models with your own data, please refer to Detoxify.
Model | Batch size | Device | Detoxify (ms/sample) | Speedtoxify (ms/sample) | Speedup |
---|---|---|---|---|---|
original-small | 8 | cpu | 13.34 | 5.43 | 2.46x |
original-small | 1 | cpu | 31.07 | 13.03 | 2.38x |
original-small | 8 | cuda | 1.55 | 0.79 | 1.98x |
original-small | 1 | cuda | 11.17 | 3.24 | 3.44x |
original | 8 | cpu | 22.99 | 5.39 | 4.26x |
original | 1 | cpu | 31.48 | 13.11 | 2.40x |
original | 8 | cuda | 1.60 | 0.75 | 2.12x |
original | 1 | cuda | 12.13 | 3.37 | 3.60x |
Evaluation script can be found in test_speed.py.
Evaluation is done on my laptop with AMD 4900HS and Nvidia 2060 Max-Q.
pip install speedtoxify
Please additionally install onnxruntime-gpu
for inference on gpus.
Requires the machine to have CUDA installed.
pip install onnxruntime-gpu
Speedtoxify provides the identical Python API as Detoxify.
from speedtoxify import Speedtoxify
model = Speedtoxify("original-small")
# Exporting to onnx format to ~/.cache/detoxify_onnx/original-small.onnx...
# Using framework PyTorch: 1.11.0+cu102
# Removing shared weights from ~/.cache/detoxify_onnx/original-small.onnx...
# Validating ONNX model...
# -[✓] ONNX model output names match reference model ({'logits'})
# - Validating ONNX Model output "logits":
# -[✓] (2, 6) matches (2, 6)
# -[✓] all values close (atol: 1e-05)
res = model.predict("I hate you!")
print(res)
# {'toxicity': 0.9393415, 'severe_toxicity': 0.015587699, 'obscene': 0.039672945, 'threat': 0.0733101, 'insult': 0.15676126, 'identity_attack': 0.019178415}
Please refer to detoxify for available model types.
The first time Speedtoxify("original-small")
is called, an onnx model is
exported and stored at ~/.cache/detoxify_onnx
.
This directory can be customized in the cache_dir
argument to
Speedtoxify()
.
Please refer to docs.
The memory usage is much higher for original-small
and unbiased-small
in ONNX runtime (300-400MB) than in Detoxify (Vanilla PyTorch, 30-40MB).
This is most likely due to the fact that the lightweight models are based on ALBERT, which makes use of shared layers to reduce memory usage. However, these shared layers seem to be duplicated (instead of shared) in the ONNX model graph which leads to the much higher memory usage.
Other models such as original
(BERT), unbiased
(RoBERTa), or
multilingual
(XMLR) does not have this issue.