royorel/FFHQ-Aging-Dataset

About Downloading with PyDrive

xuedue opened this issue · 19 comments

Hello author, thank you for sharing.

I met "quota exceeded" error and I wanna download with pydrive. But in Step2, after click on enable drive API, I can't find the destop App,
image
I wanna know how to Select Desktop app and Download client configuration.

Thanks again.

Hi @xuedue,

It looks like Google have updated that page. Just follow these instructions in the prerequisites section:
image

Specifically bullets 3 & 4. What PyDrive really needs is the credentials files.

I will update the readme file accordingly soon, sorry about the confusion

Thank you for your reply.

I have download the client_secrets.json.
image
image

After add --pydrive flag, I run the script but I met two error.

Sometimes the error is TimeoutError: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应,连接尝试失败。
image

Sometimes the error appears as mentioned in another issue ConnectionResetError: [WinError 10054] 远程主机强迫关闭了一个现有的连接。

image

I tried many times, but still don't know how to solve this problem.

@xuedue

These errors seem like local connection or firewall issues, there is nothing I can do about it.

@xuedue

These errors seem like local connection or firewall issues, there is nothing I can do about it.

Thanks for your reply, I have resumed the download now, but is the size of this data set really as large as the following?

微信图片_20210414211505

Assuming that the size of each picture is 100K, the data set should be nearly 7G.
image

@xuedue,

The script downloads the in-the-wild images of the original FFHQ dataset, because we take slightly larger crops than the original FFHQ dataset. The downloaded in-the-wild images are deleted after processing and aren't saved, but their overall size is indeed 0.93TB. The Final dataset size depends on the final resolution of the image (defined by the user), if you save 1024x1024 images the size will be about 90GB, slightly larger than the original FFHQ dataset because of the segmentation maps. For 256x256 resolution, the final size should indeed be around 7.5GB

@xuedue,

The script downloads the in-the-wild images of the original FFHQ dataset, because we take slightly larger crops than the original FFHQ dataset. The downloaded in-the-wild images are deleted after processing and aren't saved. The Final dataset size depends on the final resolution of the image (defined by the user), if you save 1024x1024 images the size will be about 90GB, slightly larger than the original FFHQ dataset because of the segmentation maps. For 256x256 resolution, the final size should indeed be around 7.5GB

Thank you for your reply.

I just run the get_ffhq_aging.bat without any modification. So the size of data set should be 7.5GB after downloading.

image

I am sorry to confirm again.

I wanna to know if my disk space only needs to be greater than 7.5GB. And Except for the 256*256 pictures that need to be saved, no other pictures will be downloaded to my disk? I am a little afraid that all 0.93TB of data will be stored on my disk.

Your disk space shouldn't be larger than 7.5 GB. You can look at the download script and see that each thread deletes the in-the-wild image right after processing is done.

def _download_thread(spec_queue, exception_queue, stats, dst_dir, output_size, drive, download_kwargs):
with requests.Session() as session:
while not spec_queue.empty():
spec = spec_queue.get()
try:
if drive != None:
pydrive_utils.pydrive_download(drive, spec['file_url'], spec['file_path'])
else:
download_file(session, spec, stats, **download_kwargs)
if spec['file_path'].endswith('.png'):
align_in_the_wild_image(spec, dst_dir, output_size)
os.remove(spec['file_path'])
except:
exception_queue.put(sys.exc_info())
with stats['lock']:
stats['files_done'] += 1

So during download, the maximum number of in-the-wild images on your disk will be num_threads (default is 32).

I am sorry to ask again.

When I used PyDrive to download, I encountered a problem. In the middle of downloading, the program will report the following error. I am confused that the program does not report an error at the beginning but reports an error during the download.

I spent two days trying to download and search for solutions to this problem. But this problem is still unsolved.

微信图片_20210416173126

Hi @xuedue,

This also seems like a local machine issue that doesn't relate to the downloading code.

Google-ing your error message suggests it might be a proxy issue. Here is the most relevant result:
aws/aws-cli#5773

Hi @xuedue,

This also seems like a local machine issue that doesn't relate to the downloading code.

Google-ing your error message suggests it might be a proxy issue. Here is the most relevant result:
aws/aws-cli#5773

Well, but I just use the school network.

I want to know that 256x256 resolution dataset has any difference with the origin NVIDIA FFHQ dataset except for the resolution?

Could you provide the resize code for me?

If I resize the original FFHQ dataset image to 256*256, does it mean that I have obtained the image data and annotation data(ffhq_aging_labels.csv) of your paper?

Sorry to bother you again.

Well, but I just use the school network.

It might be an issue with your school's network than

I want to know that 256x256 resolution dataset has any difference with the origin NVIDIA FFHQ dataset except for the resolution?

There are differences in the dataset, we take larger crops, that's why we start from the in-the-wild images.

Could you provide the resize code for me?

The alignment code is this function:

def align_in_the_wild_image(spec, dst_dir, output_size, transform_size=4096, enable_padding=True):

If I resize the original FFHQ dataset image to 256*256, does it mean that I have obtained the image data and annotation data(ffhq_aging_labels.csv) of your paper?

No, since the crops are different, If you wish, you can run the segmentation code on the original FFHQ dataset to get the correct segmentation maps. The annotation data is correct regardless of the image size or cropping method (the age, gender and the rest of the labels don't change). However, since the crop size is different you won't get the exact same results that we got in the paper. That's because tighter crops don't capture the change in head shape through the years so well.

@xuedue
这些错误似乎是本地连接或防火墙问题,对此我无能为力。

感谢您的答复,我现在恢复了下载,但是此数据集的大小真的和下面的一样大吗?

微信图片_20210414211505

假设每张图片的大小为100K,则数据集应接近7G。
图像

我也遇到了同样问题,请问您是如何解决的?

@woshixiaozhou, please use english in comments

Hi there!
I get a similar issue. After creating an Auth2 API Key and downloading the client_secrets.json i can authorize the script.
But i get the quota limint error.
googleapiclient.errors.HttpError: <HttpError 403 when requesting https://www.googleapis.com/drive/v2/files/1Tkyob6bsb0POmg8gg-XXXXXXXXX?alt=json returned "User Rate Limit Exceeded. Rate of requests for user exceed configured project quota. You may consider re-evaluating expected per-user traffic to the API and adjust project quota limits accordingly. You may monitor aggregate quota usage and adjust limits in the API Console: https://console.developers.google.com/apis/api/drive.googleapis.com/quotas?project=2260XXXXXX". Details: "[{'domain': 'usageLimits', 'reason': 'userRateLimitExceeded', 'message': 'User Rate Limit Exceeded. Rate of requests for user exceed configured project quota. You may consider re-evaluating expected per-user traffic to the API and adjust project quota limits accordingly. You may monitor aggregate quota usage and adjust limits in the API Console: https://console.developers.google.com/apis/api/drive.googleapis.com/quotas?project=2260XXXXXX', 'extendedHelp': 'https://console.developers.google.com/apis/api/drive.googleapis.com/quotas?project=2260XXXXX'}]">
It sounds like the script try to access the Google Drive of Tero Karras and not my own.

Hi @ahripanto,

What we saw initially, which led us to write the pydrive optionality, was that we got a quota limit error with the original download script (which matches the script provided by Nvidia) even though we were able to manually download the same file from Nvidia's google drive. Using pydrive eliminated this issue, because it is using exactly the same API as in google drive. However, if the quota limit is indeed exceeded and you can't download a file manually, pydrive would not solve the issue. In that case that's a hard limit put by Google and there's nothing that can be done about it except waiting for the quota to be released.

The script indeed tries to access Nvidia's google drive and not your own. The dataset is just shared with you, it's not actually located on your personal google drive, there are 2 reasons for that:

  1. Storing the full dataset would require about 1TB of space, not everyone has that in their personal Google Drive account.
  2. To avoid any copyright issues. If a person requests to delete his image from the original FFHQ dataset (Nvidia provided that option) it will automatically be removed from our dataset as well. If you hold a copy of the dataset, that image won't be removed, and you (and us for providing a script to do that) would be violating copyrights.

Hi @ royorel,
thanks for your explaining answer. Currently i working on a solution, to get the dataset on AWS Bucket to share it over Torrent to avoid this quota limit for all other users.
I will report when i successful backup this dataset.

@woshixiaozhou I'm asking once again to use English.

@xuedue
Hi,

It seems that you have downloaded the dataset successfully. Could you please share the image (50958.png 256x256 pixel) with me because there is something wrong with this image (0 bytes) in my case. Thanks lot!

Hi @wangtingwei1993,

We cannot directly share images. However, the non cropped in-the-wild image can be found at:
https://drive.google.com/file/d/13T3T9oVe0KfjRdjcOmLMjsPSQtK8ZlbX/view?usp=sharing

After downloading it you can apply the rest of the script to it