name of media should be kept
Opened this issue · 2 comments
Hi,
I noticed that there is a method called that remove all the html of the card, leaving only actual text.
I don't think it's a good idea to remove completely the media. Personnaly, I sometimes use a screenshot as a source in a "source" field in my cloze. I think that the parsing should keep the name of the picture or sound media etc as a word in the card. As I think it is relevant to decide to bury the cards or not.
What do you think?
Is this the method?
def _preprocess(self, a: str) -> str:
# replace html entity that gets frequently entered in cloze cards
a = a.replace(" ", " ")
return CLOZE_EXTRACT.sub(r"\g<answer>", a).lower()
If so, it does a lot less than it sounds like. In my testing, adding cloze cards often replaces a single space character with the
entity which throws off some of the rules and this just swaps removes that character (and replaces cloze fields with their answers)
Hi, thanks for the answer.
I was not talking about that method, sorry for not being clear.
I'm talking about line 206 of file main.py which contains the following :
yield note_id, stripHTMLMedia(value)
The fonction "stripHTMLMedia" removes the html from the text.