brainkim/archieml-python

Problems with Google Docs

Closed this issue · 3 comments

I have been playing with downloading an ArchieML-formatted Google Doc as plain text, then parsing it with archieml-python.

It pretty much works, except for the very first line. I have found that any time I have a key/value pair on line 1 of the text file, the parser doesn't recognize them.

If, however, I add a blank line to the top of the document, then the key/value on line 2 WILL be parsed.

After some more digging, it seems clear that when I download a file as "text/plain" from Google Docs, it's actually a UTF-8 file with a BOM.

So, if I change the following line in archieml-python's __init.py__:
line = line.decode('utf-8')
to:
line = line.decode('utf-8-sig')

Then it works.

Not sure if you might want to consider adding BOM detection, ala http://stackoverflow.com/questions/13590749/reading-unicode-file-data-with-bom-chars-in-python.

Thanks for opening this issue Kirkman! I have never been good at parsing string encoding issues and I haven't written Python in a maybe a year or so, so if you want to submit a pull request please do!

Brian Kim

Okay, made a pull request. #11

This has been merged lemme know if it doesn't work.