Google Drive upload not working
mkmcconnell opened this issue · 12 comments
Hello,
Firstly, thank you for writing this script! I got the spider.py working to the point where it attempts to upload to my Google Drive. However, I don't believe the script ever prompted me via web browser to generate an auth_token.json. Here's the backtrace (occurs after successful download of .pdf) --
[-] <type 'exceptions.AttributeError'> 'module' object has no attribute 'from_file' | spider.py@54
Traceback (most recent call last):
File "script/spider.py", line 54, in main
upload.run(packpub.info['paths'])
File "/Users/mmcconnell/_HerokuProjects/packtpub-crawler/script/upload.py", line 26, in run
self.service.upload(path)
File "/Users/mmcconnell/_HerokuProjects/packtpub-crawler/script/drive.py", line 125, in upload
self.__guess_info(file_path)
File "/Users/mmcconnell/_HerokuProjects/packtpub-crawler/script/drive.py", line 28, in __guess_info
'mime_type': magic.from_file(file_path, mime=True),
AttributeError: 'module' object has no attribute 'from_file'
[-] something weird occurred, exiting...
I'm a python newb, so any assistance is appreciated.
Thanks,
Michael
Hi,
thanks. So just to be sure can you tell me which command did you run and confirm me that you config your Drive account exactly how is written in the readme (i.e. folder paths, Drive Api, client_secrets, ...) and you have read/write/execute permissions on your project root directory? Do you have chrome installed? I'm asking you this because it should be pretty straightforward and it should simply open a new browser window.
Then the error says that it can not retrieve correctly the mime type (pdf/epub/mobi) from the downloaded file, so check that the directories are fully accessible for example.
By the way, I'm not a python ninja too..I just used the most suitable tool for this job!
I just checked now, so mime_type should be for example application/pdf
, but I would suggest to verify first of all if you have python-magic installed.
If this is the problem I will update also requirements.txt
.
Thanks
@mkmcconnell do you still have the error or did you manage to make it work?
Thanks
Hi Niq,
Apologies for the delay in reply to your last message… work got in the way ☹
I did have python-magic installed and just in case, I did an upgrade of the module using ‘pip’. I’m currently at version 0.4.12 --
Metadata-Version: 2.0
Name: python-magic
Version: 0.4.12
Summary: File type identification using libmagic
Home-page: http://github.com/ahupp/python-magic
Author: Adam Hupp
Author-email: adam@hupp.org
Installer: pip
License: MIT
Location: /usr/local/lib/python2.7/site-packages
Requires:
Classifiers:
Intended Audience :: Developers
License :: OSI Approved :: MIT License
Programming Language :: Python
Programming Language :: Python :: 2
Programming Language :: Python :: 3
Unfortunately, I’m still getting the same error. Which version of python-magic do you have on your setup?
Thanks,
— Michael
From: niqdev notifications@github.com
Reply-To: niqdev/packtpub-crawler reply@reply.github.com
Date: Friday, August 12, 2016 at 8:12 AM
To: niqdev/packtpub-crawler packtpub-crawler@noreply.github.com
Cc: Michael McConnell mmcconnell@juniper.net, Mention mention@noreply.github.com
Subject: Re: [niqdev/packtpub-crawler] Google Drive upload not working (#12)
@mkmcconnellhttps://github.com/mkmcconnell do you still have the error or did you manage to make it work?
Thanks
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com//issues/12#issuecomment-239473365, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AQZEMrtIAhuXrGEZ3H2Kh3aAZ7SBCC_Vks5qfI1fgaJpZM4Jgqgn.
My current version is 0.4.11
, so let's do a quirk to understand if the problem is this library or something else.
I didn't test it but try to substitute line 28 in drive.py
'mime_type': magic.from_file(file_path, mime=True)
with
'mime_type': 'application/pdf'
and tell me if it works running python script/spider.py -c config/prod.cfg -u drive
Good news -- that worked. I got past the mime_type error and Chrome launched a browser window with the authentication token. I copied and pasted that into the script prompt, and now it appears that the script is having a problem with the path I specified in my Drive:
[+] new credentials saved
[+] uploading file...
-Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 754, in run
self.__target(_self.__args, *_self.__kwargs)
File "/Users/mmcconnell/_HerokuProjects/packtpub-crawler/script/drive.py", line 108, in __insert_file
file = self.__drive_service.files().insert(body=body, media_body=media_body).execute()
File "/usr/local/lib/python2.7/site-packages/oauth2client/util.py", line 137, in positional_wrapper
return wrapped(_args, *_kwargs)
File "/usr/local/lib/python2.7/site-packages/googleapiclient/http.py", line 804, in execute
_, body = self.next_chunk(http=http, num_retries=num_retries)
File "/usr/local/lib/python2.7/site-packages/oauth2client/util.py", line 137, in positional_wrapper
return wrapped(_args, *_kwargs)
File "/usr/local/lib/python2.7/site-packages/googleapiclient/http.py", line 971, in next_chunk
return self._process_response(resp, content)
File "/usr/local/lib/python2.7/site-packages/googleapiclient/http.py", line 998, in _process_response
raise HttpError(resp, content, uri=self.uri)
HttpError: <HttpError 404 when requesting https://www.googleapis.com/upload/drive/v2/files?uploadType=resumable&alt=json returned "File not found: E-books/Packt">
/
\ [path] ebooks/Mastering_Python_Regular_Expressions.pdf
[name] Mastering_Python_Regular_Expressions.pdf
[mime_type] application/pdf
From: niqdev notifications@github.com
Reply-To: niqdev/packtpub-crawler reply@reply.github.com
Date: Friday, August 12, 2016 at 9:53 AM
To: niqdev/packtpub-crawler packtpub-crawler@noreply.github.com
Cc: Michael McConnell mmcconnell@juniper.net, Mention mention@noreply.github.com
Subject: Re: [niqdev/packtpub-crawler] Google Drive upload not working (#12)
My current version is 0.4.11, so let's do a quirk to understand if the problem is this library or something else.
I didn't test it but try to substitute line 28
'mime_type': magic.from_file(file_path, mime=True)
with
'mime_type': 'application/pdf'
and tell me if it works running python script/spider.py -c config/prod.cfg -u drive
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com//issues/12#issuecomment-239499787, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AQZEMqDFR9vTt-Up3t9PGNl5pDp72GBQks5qfKURgaJpZM4Jgqgn.
Umh... I suspect both errors are related to your path "File not found: E-books/Packt"
... check that it exists and is correct or maybe -
is the cause, it seems stupid but investigate this.
@mkmcconnell did you check if the problem was the path or something else?
Thanks
Hi @niqdev,
I was able to overcome the file upload path issue -- I continued to use the modified drive.py (with explicit ‘application/pdf’ instead of file_path) and changed my config file back to the default settings for drive folders:
drive.default_folder=packtpub
#drive.upload_folder=E-books (<--commented this out)
Upon script execution, this created a folder called ‘packtpub’ in the root of my Google Drive, and nested inside that was the downloaded PDF file.
I noticed something in the script output after the script ran past the prior ‘from_file’ break point:
[+] Please add this line after [drive] in your configuration file:
drive.upload_folder=0B_nKYsh3DW5VVlhoMktfeTZtdTQ
After modifying my config to include the UUID for the folder instead of name, subsequent script runs put multiple file copies within the ‘packtpub’ folder:
[cid:image001.png@01D1F7B4.C69236A0]
So, I think I have the Drive upload problem sorted out. However, it appears that another issue has surfaced -- the files that are downloaded (using either the modified drive.py or the original drive.py) are being downloaded as ASCII text and not as binary PDF:
Python 2.7.11 (default, Jan 22 2016, 08:28:37)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
import magic
with magic.Magic() as m:
... pass
...
with magic.Magic() as m:
... m.id_filename('ebooks/Linux_Mint_Essentials.pdf')
...
'PDF document, version 1.6'
with magic.Magic() as m:
... m.id_filename('ebooks/PhoneGap_for_Enterprise.pdf')
...
'HTML document, ASCII text, with CRLF line terminators'
I’m wondering if Packtpub has changed something recently on their end that affects how your script is able to process the files properly?
From: niqdev notifications@github.com
Reply-To: niqdev/packtpub-crawler reply@reply.github.com
Date: Monday, August 15, 2016 at 10:36 PM
To: niqdev/packtpub-crawler packtpub-crawler@noreply.github.com
Cc: Michael McConnell mmcconnell@juniper.net, Mention mention@noreply.github.com
Subject: Re: [niqdev/packtpub-crawler] Google Drive upload not working (#12)
@mkmcconnellhttps://github.com/mkmcconnell did you check if the problem was the path or something else?
Thanks
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com//issues/12#issuecomment-240005550, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AQZEMslzsBM7DZq8vAVh0S86TPOuBl_0ks5qgUxGgaJpZM4Jgqgn.
Yep @mkmcconnell ,
path name could be better, but was changed later
drive.default_folder=DRIVE_NAME_FOLDER
drive.upload_folder=DRIVE_GENERATED_ID_FOLDER
so if I understood correctly Drive is configured and working fine now, the problem is on the downloaded file, right?
I just checked the pdf of today and past days and everything looks fine for me. I don't understand why you have problem with python-magic, but probably the problem is related to that.
So, if you need any help with other tests just let me know and I will try to help you, otherwise I will close the issue.
Thanks
Yeah that’s what I’m thinking too… I’ll try running the script from a different box and see if the issue persists. Thanks for the help.
From: niqdev notifications@github.com
Reply-To: niqdev/packtpub-crawler reply@reply.github.com
Date: Tuesday, August 16, 2016 at 1:10 PM
To: niqdev/packtpub-crawler packtpub-crawler@noreply.github.com
Cc: Michael McConnell mmcconnell@juniper.net, Mention mention@noreply.github.com
Subject: Re: [niqdev/packtpub-crawler] Google Drive upload not working (#12)
Yep @mkmcconnellhttps://github.com/mkmcconnell ,
path name could be better, but was changed later
drive.default_folder=DRIVE_NAME_FOLDER
drive.upload_folder=DRIVE_GENERATED_ID_FOLDER
so if I understood correctly Drive is configured and working fine now, the problem is on the downloaded file, right?
I just checked the pdf of today and past days and everything looks fine for me. I don't understand why you have problem with python-magic, but probably the problem is related to that.
So, if you need any help with other tests just let me know and I will try to help you, otherwise I will close the issue.
Thanks
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com//issues/12#issuecomment-240222370, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AQZEMoR-bwzrs5XXhEa6E0XU7VaWwHbrks5qghkagaJpZM4Jgqgn.
No problem, if you need any help just reopen this issue or open another one.
If you have any improvement your are welcome to contribute!
Thanks