wkentaro/gdown

Can't download big file

DimaZhu opened this issue · 15 comments

Hi! You've designed pretty nice tool. It works great with small files. But can't handle big ones. For example I can't download this file:
gdown https://drive.google.com/uc?id=11SzYIezaF8yaIVKAml7kPdqgncna2vj7
It seems that it can't pass the warning "Can't scan file for viruses"

I used version 3.8.1 with python 3.6

actually it works in my env

image

I just tried the same thing and can confirm what DimaZhu reported. Does not work for me either. I tried python3.6 with gdown 3.8.3. After that I also tried your setup (python 3.5, gdown 3.8.1). The problem is, that it ends up in an endless loop in download.py. It tries to redirect with confirmation but ends up at the "File to big for virus scan" page of google all the time.

Any suggestions on this?

(I'd like to reopen this issue please)

+1, does not work for big files

+1 does not work for files that are too large for virus scanning and gives permission error.
@wkentaro e.g.:
gdown --id 1XRAWYpMJeaVVXNN442xDgXnAa3pLBUvv

I could reproduce with @AnselmC 's example.

I found this happens with large data + many access (e.g., public dataset) #42. However, I have no idea how to fix this.

Yeah it seems that sometimes these files then aren't even downloadable through the browser. I think #42 is the best you can do. Thanks for looking into it!

👍

I have the same issue.
It seems like it can't pass the warning too large to scan for virus.
I can download in private browsing if I click download anyways.

Hello, I have the same question, and I had solved it 5 months ago. See here: https://github.com/wayne931121/tensorflow_remove_person_background_project/blob/main/Export_Model_Function.ipynb

https://colab.research.google.com/drive/1rDcVczczKy8IbUnfA4aAVrM_AKSwuJl7

先將檔案或資料夾壓縮成zip檔案並新增一個資料夾,再把zip檔案拆分放到剛新增的資料夾內,並上傳到google drive,最後將google drive資料夾內的檔案一個個個別下載再結合。
[Google Transform]
First compress the file or folder into a zip file and add a new folder, then split the zip file into the newly added folder, and upload it to google drive, and finally copy the files in the google drive folder one by one Individual downloads are then combined.

1.Large File or Folder to Zip File
2.Add New Folder
3.Split Zip File to The New Folder(Split Large Zip File to Small Files that <25MB per file. Example one 900MB file to 90 10MB files.)
4.Upload The New Folder to Google Drive
5.Use Gdown Download File in Google drive folder one by one.
6.Combine Splited File to Zip File
7.Unzip

On Computer:

import os
import zipfile

def zip(path):
    zf = zipfile.ZipFile('{}.zip'.format(path), 'w', zipfile.ZIP_DEFLATED)
   
    for root, dirs, files in os.walk(path):
        for file_name in files:
            zf.write(os.path.join(root, file_name))
zip(path)

# 為了防止 colab gdown 下載檔案時因檔案過大而出現無法掃描病毒,造成下載失敗。
def export_for_google_drive(file, target_folder, size = 10*1024*1024):
    target_folder_files = target_folder+"\\%d"
    
    # Make target folder is empty.
    for e in os.listdir(target_folder):
        f = target_folder+"\\"+e
        os.remove(f)
    
    # Read source file, then write them to files by numbers.
    with open(file, "rb") as f:
        # 每個檔案大小 10MB
        # size = 10*1024*1024 # 單位 bytes
        i = 1
        while True:
            tmp = f.read(size)
            
            if len(tmp)==0 :
                break
            
            with open(target_folder_files%i, "wb") as gf:
                gf.write(tmp)
                
            i += 1
try:
    os.mkdir("detection model for google drive")
except:
    pass
# Limit 25MB
file = "detection_model.zip"
target = "detection model for google drive"
export_for_google_drive(file, target, 20*1024*1024)

On Google Colab:

!gdown --no-cookies --folder "ID"
def export_from_split_file(source_folder, target_file):
    # create a empty file
    with open(target_file, "wb") as gf:
        gf.write(b"")
    
    # write all buffers to a new file
    export = sorted([int(i) for i in os.listdir(source_folder)])
    # For each source file, we read it, apppending to new file.
    with open(target_file, "ab") as f:
        
        for _ in export:
            tmp = source_folder+"/"+str(_)
            
            with open(tmp, "rb") as f1:
                tmp = f1.read(-1)
            
            f.write(tmp)
export_from_split_file("detection model for google drive", "detection_model.zip")
with zipfile.ZipFile("detection_model.zip", 'r') as zip_ref:
    zip_ref.extractall("/content")
!rm "detection model for google drive" -r
!rm "detection_model.zip"

See: #43 (comment), #43

Google Drive Anonymous downloads have a daily limit, if exceed, it will say:

Sorry, you can't view or download this file at this time.

Too many users have viewed or downloaded this file recently. Please try accessing the file again later. If the file you are trying to access is particularly large or is shared with many people, it may take up to 24 hours to be able to view or download the file. If you still can't access a file after 24 hours, contact your domain administrator.

Maybe 4.8GB per file (per day), I write this script to test, when downloaded very big file, it only download 4.8GB.
Script: https://github.com/wayne931121/download_googledrive_file

To solve this problem, you can provide cookie to log in, the same cookie in the same time can only visite once, or server will response 429 Too Many Requests (See Here: HTTP error 429 (Too Many Requests))

Way2 is to use other cloud drive to solve this problem.

Way3 is to split your large file to many small file to download them, this is the best way I suggest, I also write a script to do it(See Here: #26 (comment))

try this command:
gdown https://drive.google.com/uc?id=

image
how to fix it..?

For anyone visiting this issue in 2024, I was having the same problem with a very large file (> 28 GB). After many failed attempts using gdown, I tried to download the file from Google Drive on the browser. When the pop-up window warning about Google Drive being unable to scan the file for virus appeared, I clicked Cancel. Next, I tried downloading via gdown again and it worked.