chaodreaming/Simple-LaTeX-OCR

LaTeX without style components

hovkaren opened this issue · 4 comments

Hi @chaodreaming
Thank you for this repository.

Is possible make API to predict image to without LaTeX style components

for example now from this input image :

math

output LaTeX is :

(\mathfrak{a}+\mathfrak{b})^{2}=\mathfrak{a}^{2}+2\mathfrak{a}\mathfrak{b}+\mathfrak{b}^{2}

without style components is :

({a}+{b})^{2}={a}^{2}+2{a}{b}+{b}^{2}

This is caused by insufficient model generalization capability and the dataset is not clean enough, the generalization capability is currently being addressed, please contact me if there is a tool that can express the dataset in a uniform way

Thanks for answer @chaodreaming. I need to get only math formulas without style components, just variables and math operations and functions. As I understand we need to create new onnx models to get that results. Is there any documentation to do that?

It is possible to remove all styles through regular expressions, but this issue is caused by insufficient generalization ability of the model, which leads to a lack of understanding of image features

Hi @chaodreaming. Thanks for answer. I think I will do that with regular expressions.