name of media should be kept

Question

name of media should be kept

Opened this issue 4 years ago · 2 comments

thiswillbeyourgithub commented 4 years ago

Hi,

I noticed that there is a method called that remove all the html of the card, leaving only actual text.

I don't think it's a good idea to remove completely the media. Personnaly, I sometimes use a screenshot as a source in a "source" field in my cloze. I think that the parsing should keep the name of the picture or sound media etc as a word in the card. As I think it is relevant to decide to bury the cards or not.

What do you think?

Answer 1 · 2021-06-25T00:58:49.000Z

Is this the method?

    def _preprocess(self, a: str) -> str:
        # replace html entity that gets frequently entered in cloze cards
        a = a.replace("&nbsp;", " ")

        return CLOZE_EXTRACT.sub(r"\g<answer>", a).lower()

If so, it does a lot less than it sounds like. In my testing, adding cloze cards often replaces a single space character with the   entity which throws off some of the rules and this just swaps removes that character (and replaces cloze fields with their answers)

Answer 2 · 2021-06-25T14:17:52.000Z

Hi, thanks for the answer.

I was not talking about that method, sorry for not being clear.

I'm talking about line 206 of file main.py which contains the following :

            yield note_id, stripHTMLMedia(value)

The fonction "stripHTMLMedia" removes the html from the text.