samastur/GReader-hoover

Special characters in titles

bobharris opened this issue · 4 comments

First I'd like to say I really appreciate what you have done here. Its exactly what I wanted. I have one problem though, and my unicode/ascii/python knowledge is limited.

Filename generation on windows throws exceptions if the special characters are in the feed name. For example \ / : * ? " < > |
I was able to fix this by using a loop on the filename from this link: http://stackoverflow.com/questions/295135/turn-a-string-into-a-valid-filename-in-python

Then
Special characters in the feed name are throwing exceptions at line 70 in the main file. I got an error with the unicode bullet character ‘\u2022’. I removed the bullet from one of my feeds, to have another feed throw an exception here. This I haven't figured out how to correct yet. I tried what was suggested here: http://ltslashgt.com/2007/07/16/string-sanitization-in-python/
but have yet to get it to work yet.

Any help on the issue would be greatly appreciated. Time is kinda of the essence too. :D Either way, much thanks for the program. I was able to get my most important feeds out with all data.

line 70 replace with this :
return "{0}.json".format(feed_label.encode('utf-8')) <= this works with some special characters but not all characters...

I made a change to "slugify" feed_label see my fork (pull requested)

I'm sorry for late response. I noticed this issue just now.

I have already merged vjeantet changes and I'll take a closer look at it this weekend to see if problem remains.

Some of above listed punctuation marks don't seem to be allowed by Google, but it should not matter. vjeantet changes should handle all of them. I just needed to add a cast to unicode so that unidecode would not fail on strings.

Could you please try latest changes and let me know if they work or not?

There's one caveat. New slugify function replaces all consecutive punctuations into one dash (-). This could potentially lead to similar tag names with different punctuations marks translating into same new file name. Last one being backed up would overwrite previous and some data would be lost. I found this problem academic enough to not fix it. Do let me know if it isn't.

Maybe, you can add a feed id (number) at the end of the feed' title