/from_Sefaria_to_Passim

Text preparation pipeline (digital witnesses) for training text recognition models. Retrieves texts from Sefaria.org, analyzes structure, cleans, concatenates and creates an index of text content. Texts are then ready for alignment search on OCR results with Passim.

Primary LanguageJupyter Notebook

No issues in this repository yet.