wormi4ok/evernote2md

Consecutive lines incorrectly exported as paragraphs

BurningDog opened this issue · 2 comments

I have a very simple note split into two paragraphs. The first is a count of books I've read each year since 2017, the second is a list of books I've read this year. It looks like this in Evernote:

Screenshot 2023-12-12 at 09 26 33

Here's the HTML export from Evernote:
Books I've read in 2023.html.zip

However, the markdown produced by evernote2md results in this:

# Books I've read in 2023

2017: 16

2018: 10

2019: 7

2020: 16

2021: 22

2022: 14

2023: 14

2 Jan \- Break point: SAS who dares wins, Ollie Ollerton

26 Jan \- Jingo, Terry Pratchett

11 Feb \- Eric, Terry Pratchett

8 April \- Tiamat's Wrath \- The Expanse book 8 re\-read

13 April \- Leviathan Falls \- The Expanse book 9

4 May \- Equal Rites \- Terry Pratchett

18 June \- Maskerade \- Terry Pratchett

25 July \- Thud \- Terry Pratchett

9 Aug \- Going postal \- Terry Pratchett

5 Sept \- Making Money \- Terry Pratchett

7 Oct \- Lords and Ladies \- Terry Pratchett

6 Nov \- Moving Pictures \- Terry Pratchett

30 Nov \- The Fifth Elephant \- Terry Pratchett

10 Dec \- Nine Princes in Amber \- Roger Zelazny

Every line is now its own paragraph! And each - has become \-.

Hey @BurningDog ! Since the list of the books is not formatted as a list - it's very difficult to infer that each line is not a paragraph on its own. Evernote wraps every line with a <div> tag in the exported file, which is a block element - in many cases it gives better formatting if there is a newline after the block element, even though in your case it doesn't give the expected result, unfortunately.

As for the - has become \- - this is escaping, to prevent accidentally converting to a list (- in Markdown may describe a list element) something that was formatted as a list originally. It's been reported already a few times, that this behaviour is not expected. I'll try to find time and implement a flag disabling this escaping.

To resolve your problem with formatting, I can suggest running an additional script on your file (or any other converted markdown) to reformat it in a more readable way:

sed -i  '/^\s*$/d ; s/\\-/-/g' books.md
# or, if you use BSD version of sed
sed -i ''  '/^\s*$/d ; s/\\-/-/g' books.md

It will remove empty lines (/^\s*$/d) and replace escaped - (s/\\-/-/g')

There's a similar issue #44 that addresses this problem. I proposed a workaround in a comment over there.