samueldg/clippings

AttributeError: 'NoneType' object has no attribute 'group'

Closed this issue · 4 comments

(base) bob@rja15 kindle % clippings -o dict ~/Dropbox/kindle\ appunti/My\ Clippings_20201211.txt
Traceback (most recent call last):
File "/opt/miniconda3/bin/clippings", line 11, in
sys.exit(main())
File "/opt/miniconda3/lib/python3.7/site-packages/clippings/parser.py", line 218, in main
clippings = parse_clippings(args.file)
File "/opt/miniconda3/lib/python3.7/site-packages/clippings/parser.py", line 165, in parse_clippings
metadata = Metadata.parse(metadata_line)
File "/opt/miniconda3/lib/python3.7/site-packages/clippings/parser.py", line 122, in parse
category = match.group('category')

I have melted my old brain to find the cause :)
My clippings show a change of format between 2012 and 2015 !!!!!

Here's a snippet:

God's Bankers: A History of Money and Power at the Vatican (Gerald Posner)
- Your Highlight on Page 10 | Location 140-142 | Added on Saturday, September 15, 2012 7:55:46 AM

As Elliot Welles, an Auschwitz survivor and a Nazi hunter for the Anti-Defamation League, told me, “Profits. They matter as much in the church as they do inside IBM. Don’t forget it.”
==========
God's Bankers: A History of Money and Power at the Vatican (Gerald Posner)
- Your Highlight on page 120 | Location 1839-1840 | Added on Thursday, April 30, 2015 2:25:56 AM

III. The Obelisk of Axum was placed in a central Roman square, in front of what would become the United Nations Food and Agriculture Organization. Italy resisted returning it for decades, but finally did so in 2005.
==========

as you can see prior to 2015 the "page" word was capitalized as in "Page", while thereafter it is a lowercase "page".

As my clippings.txt file has a couple hundred highlights from 2012, your program gave that error.
If I remove them it works fine.

I am not able to modify your regex pattern to accomodate this (tried with (Pp) instead of p ) but it gives errror.
Thanks

Ok here is what you need to fix the code so that it works also on the old clippings:

PATTERN = re.compile(r'^- Your (?P<category>\w+) ' +
                         r'(on|at) ((P|p)age (?P<page>\d+) \| )?' +
                         r'(L|l)ocation (?P<location>\d+(-\d+)?) \| ' +
                         r'Added on (?P<timestamp>.+)$')

He @rjalexa! Thanks a lot for the report and investigation ❤️

I took some time to fix this because I dreaded having to port everything from Travis CI to GitHub actions, but now it's done!

You can find the fix in version 0.7.0, which I just uploaded to PyPI: https://pypi.org/project/clippings/

Here are the full release notes if you're interested: https://github.com/samueldg/clippings/releases/tag/0.7.0

Also, I used the snippet you shared in your comment above as a regression test to make sure it's correctly parsed:
https://github.com/samueldg/clippings/blob/0.7.0/tests/resources/clippings-new-format.txt

Let me know if you'd prefer I use something else.

Thank you very much.
May the Force be with you :)