Modality purification

Question

Modality purification

dginev opened this issue 9 years ago · 1 comments

Similarly to KWARC/deprecated-LLaMaPUn#2, we should port the old modality purification to Rust, and run it prior doing any NLP analyses.

For instance, the naive arXMLiv token model I just generated shows 75,000 unique words that contain "mathformula" in them. A purification step can denoise that.

Answer 1 · 2019-07-31T19:21:13.000Z

... but we have seen good results can be obtained without going the heuristic preprocessing route, given enough data, so maybe this can be allowed to rest without a new rust implementation