lukesmurray/markdown-anki-decks

`soup.prettify()` causes the generated HTML to have improper meaning

ZBQtesla opened this issue · 1 comments

In the current version of mdankideck, the inline-code doesn't have a different background-color by default to distinguish it from the normal text. Just like this:

With this markdown snippet

## Will inline code have a different background color?

This `code` should have a different background color.

will get this anki card
Snipaste_2022-03-22_16-05-34

I want the line-code have a different background-color like in github. And thanks for the custom css feature, I can provide a css file inline-code-styles.css to define a background-color for inline-code:

/* define a background-color for inline-code */
code {
  background-color: rgba(175,184,193,0.2);
}

/* make code block have a separate background-color */
.codehilite code {
  background-color: #f8f8f8;
}

The markdown file is:

---
css: inline-code-styles.css
---

# Deck with inline code

## Will inline code have a different background color?

This `code` should have a different background color.

And `inline-code`'s background-color should not interfere the background-color of code block:

```python
def hello():
    print("Hello mdankideck!")

for _ in range(2):
    hello()
```

The generated anki card looks like this:
Snipaste_2022-03-22_16-02-06

I got what I desired except that there is an additional whitespace after every inline-code:
Snipaste_2022-03-22_16-03-42

After some debugging, I found the real problem.

When we generate the HTML of questions and answers

fields=[soup_to_html_string(question), soup_to_html_string(answer)],

We use the prettify() method of BeautifulSoup:

# convert beautiful soup object to a string
def soup_to_html_string(soup):
return soup.prettify(formatter="html5")

But the prettify() method is not suitable for this scenario, as BeautifulSoup's document mentioned:

Since it adds whitespace (in the form of newlines), prettify() changes the meaning of an HTML document and should not be used to reformat one. The goal of prettify() is to help you visually understand the structure of the documents you work with.

The more suitable method seems to be str(soup) as indicated in BeautifulSoup's document and stackoverflow.

But we can't simply replace soup.prettify(formatter="html5") to str(soup) in:

# convert beautiful soup object to a string
def soup_to_html_string(soup):
return soup.prettify(formatter="html5")

Because the note's GUID rely on field[0] (i.e. soup_to_html_string(question)):

class FrontIdentifierNote(genanki.Note):
def __init__(self, deck_id, model=None, fields=None, sort_field=None, tags=None):
guid = genanki.guid_for(fields[0], deck_id)

If we use str(soup) inside soup_to_html_string() instead, we'll get a different GUID for existing card, due to the return string of soup.prettify() and str(soup) is different. Then we'll get duplicated card for each deck we have before and users will lose their history data.

Actually, I have figured out a solution and tested for it. I will make a PR as soon as possible.

Wow thank you for such an in depth explanation of your changes! I'll get back to you as soon as I've read this in depth but this looks fairly reasonable.