Problems with Google Docs
Closed this issue · 3 comments
I have been playing with downloading an ArchieML-formatted Google Doc as plain text, then parsing it with archieml-python
.
It pretty much works, except for the very first line. I have found that any time I have a key/value pair on line 1 of the text file, the parser doesn't recognize them.
If, however, I add a blank line to the top of the document, then the key/value on line 2 WILL be parsed.
After some more digging, it seems clear that when I download a file as "text/plain" from Google Docs, it's actually a UTF-8 file with a BOM.
So, if I change the following line in archieml-python's __init.py__
:
line = line.decode('utf-8')
to:
line = line.decode('utf-8-sig')
Then it works.
Not sure if you might want to consider adding BOM detection, ala http://stackoverflow.com/questions/13590749/reading-unicode-file-data-with-bom-chars-in-python.
Thanks for opening this issue Kirkman! I have never been good at parsing string encoding issues and I haven't written Python in a maybe a year or so, so if you want to submit a pull request please do!
Brian Kim
This has been merged lemme know if it doesn't work.