yuesongtian/AlphaGAN

Cannot install requirements.txt on Google Colab

Opened this issue · 5 comments

Using "! pip install -r requirements.txt" on Google Colab gives the error:

"ERROR: Invalid requirement: '_libgcc_mutex=0.1=main' (from line 4 of requirements.txt)
Hint: = is not a valid operator. Did you mean == ?"

Please advice on how to resolve this issue

Hi, the requirements.txt is exported via my conda environment. libgcc is not a python library. Could you comment this and retry?

Thank you.

@VinulaUthsara Were you able to execute this repo on google colab? Could you help me out if you have.

Hi @Brym-Gyimah,

The following is the link to the Google Colab ipynb which I worked on to make AlphaGan run in a Colab notebook.
https://colab.research.google.com/drive/1FP16LD34CKzOf9FDQptOmrHg78M3k79E?usp=sharing

This includes several commands which install dependencies which I believe are required to run AlphaGAN on Colab. This repo is then cloned into the ipynb notebook using ‘! git clone ’, changes the directory to ‘/content/AlphaGAN’ and creates 2 directories ‘logs’ and ‘fid_stat. ’
It is also mandatory to upload a FID Stat npz file to the ‘fid_stat’ directory for the implementation to run. For convenience the ‘cifar10’ and ‘stl10’ fid stat files can be found in the following Google drive link (extracted from https://github.com/chengaopro/AdversarialNAS):

https://drive.google.com/drive/folders/1dfzptTN-sc-hfq9mLx2gOZCOsRLrqqTX?usp=sharing

After uploading the fid stat file of ‘cifar10’ (according to my experiments) you can run the following command:

! CUDA_VISIBLE_DEVICES=0 python search.py --gen Network_gen_Auto --dis Discriminator --gf_dim 256 --df_dim 128 --fix_alphas_epochs -1 --only_update_w_g --gen_normal_opr PRIMITIVES_NORMAL_GEN_wo_skip_none_sep --inner_steps 20 --worst_steps 20 --outter_steps 20 --exp_name search_test --eval_every 4 --batch_size 32 --dataset cifar10

This will start running the code as follows:

WARNING:tensorflow:From /content/AlphaGAN/utils/inception_score.py:23: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead. Experiment dir : logs/search_test-EXP-20220407-154647 04/07 03:46:47 PM gpu device = 0 04/07 03:46:47 PM args = Namespace(arch_learning_rate=0.0003, arch_weight_decay=0.001, batch_size=32, beta1=0.0, beta2=0.9, bottom_width=4, channels=3, cutout=False, cutout_length=16, d_lr=0.0002, d_spectral_norm=True, data='./data', dataset='cifar10', df_dim=128, dis='Discriminator', dis_batch_size=64, dis_normal_opr='PRIMITIVES_NORMAL_DIS', dis_with_bn=False, drop_path_prob=0.3, epochs=100, eval=True, eval_batch_size=100, eval_every=4, exp_name='search_test', fix_alphas_epochs=-1, g_lr=0.0002, gen='Network_gen_Auto', gen_batch_size=64, gen_normal_opr='PRIMITIVES_NORMAL_GEN_wo_skip_none_sep', gen_up_opr='PRIMITIVES_UP', gen_with_bn=False, gf_dim=256, gpu=0, grad_clip=5, grow=False, grow_epoch=[20, 60, 80], img_size=32, init_channels=16, inner_steps=20, lamina=1.0, latent_dim=128, layers=8, learning_rate=0.025, learning_rate_min=0.001, load_path='', model_path='saved_models', momentum=0.9, n_critic=5, num_classes=10, num_eval_imgs=50000, only_update_alpha=False, only_update_alpha_g=False, only_update_w=False, only_update_w_g=True, outter_steps=20, parallel=False, report_freq=50, restrict_dis_grow=False, save='logs/search_test-EXP-20220407-154647', seed=2, start_epoch=0, t=0.5, train_portion=0.5, unrolled=False, update_alphas=True, use_gumbel=False, use_train_val=False, weight_decay=0.0003, with_dis_worst=True, worst_steps=20) WARNING:tensorflow:From /content/AlphaGAN/utils/inception_score.py:82: FastGFile.__init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version. Instructions for updating: Use tf.gfile.GFile. 04/07 03:46:49 PM From /content/AlphaGAN/utils/inception_score.py:82: FastGFile.__init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version. Instructions for updating: Use tf.gfile.GFile. WARNING:tensorflow:From /content/AlphaGAN/utils/inception_score.py:83: The name tf.GraphDef is deprecated. Please use tf.compat.v1.GraphDef instead. 04/07 03:46:49 PM From /content/AlphaGAN/utils/inception_score.py:83: The name tf.GraphDef is deprecated. Please use tf.compat.v1.GraphDef instead. WARNING:tensorflow:From /content/AlphaGAN/utils/inception_score.py:87: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead. 04/07 03:46:49 PM From /content/AlphaGAN/utils/inception_score.py:87: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead. debug@: model_file is /tmp/classify_image_graph_def.pb debug@: length of skip_up_op is 3 debug@: length of skip_up_op is 3 debug@: length of skip_up_op is 3 04/07 03:46:59 PM generator param size = 27.469571MB 04/07 03:46:59 PM discriminator param size = 1.053824MB Downloading[ https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz](https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz) to ./data/cifar-10-python.tar.gz 170499072it [00:04, 38227816.78it/s] Extracting ./data/cifar-10-python.tar.gz to ./data 04/07 03:47:07 PM length of train_queue is 391 04/07 03:47:07 PM length of valid_queue is 391 /usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of lr_scheduler.step()beforeoptimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step()beforelr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at[ https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate](https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate) "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning) /usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:715: UserWarning: To get the last learning rate computed by the scheduler, please use get_last_lr(). "please use get_last_lr().", UserWarning) 04/07 03:47:07 PM epoch 1 lr 2.003947e-04 04/07 03:47:07 PM epoch 1 gen_lr 2.000000e-04 04/07 03:47:07 PM epoch 1 dis_lr 2.000000e-04 04/07 03:47:07 PM gen_genotype = Genotype_gen(normal={'1': [('conv_5x5', 0), ('sep_conv_7x7', 1), ('conv_1x1', 2)], '2': [('sep_conv_7x7', 0), ('conv_1x1', 1), ('conv_3x3', 2)], '3': [('conv_5x5', 0), ('sep_conv_3x3', 1), ('conv_3x3', 2)]}, up={'1': [('deconv', 0), ('nearest', 1)], '2': [('deconv', 0), ('deconv', 1)], '3': [('deconv', 0), ('nearest', 1)]}, skip_2=[('bilinear', 0)], skip_3=[('bilinear', 0), ('bilinear', 1)]) up_1: tensor([[0.3335, 0.3333, 0.3333], [0.3333, 0.3335, 0.3332]], device='cuda:0', grad_fn=<SoftmaxBackward0>) up_2: tensor([[0.3338, 0.3331, 0.3331], [0.3338, 0.3330, 0.3333]], device='cuda:0', grad_fn=<SoftmaxBackward0>) up_3: tensor([[0.3336, 0.3331, 0.3333], [0.3332, 0.3338, 0.3330]], device='cuda:0', grad_fn=<SoftmaxBackward0>) /usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:3635: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. "See the documentation of nn.Upsample for details.".format(mode) /content/AlphaGAN/functions.py:536: UserWarning: This overload of add_ is deprecated: add_(Number alpha, Tensor other) Consider using one of the following signatures instead: add_(Tensor other, *, Number alpha) (Triggered internally at ../torch/csrc/utils/python_arg_parser.cpp:1050.) avg_p.mul_(0.999).add_(0.001, p.data) 04/07 03:47:21 PM step 0: g_loss is -20.683176040649414, d_loss is 20.62350082397461 04/07 03:47:32 PM step 5: g_loss is 9.945480346679688, d_loss is 4.540972709655762 04/07 03:47:43 PM step 10: g_loss is 1.6088637113571167, d_loss is 4.476414203643799 04/07 03:47:53 PM step 15: g_loss is 8.172449111938477, d_loss is 2.17665958404541 04/07 03:48:04 PM step 20: g_loss is -2.778402805328369, d_loss is 4.770327568054199 debug@: length of skip_up_op is 3 debug@: length of skip_up_op is 3 debug@: length of skip_up_op is 3 5% 21/391 [00:30<08:49, 1.43s/it] 5% 21/391 [01:25<25:04, 4.07s/it]

But this does still encounter the following RuntimeError.

RuntimeError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 11.17 GiB total capacity; 10.53 GiB already allocated; 12.81 MiB free; 10.62 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I believe this is due to the fact that AlphaGAN requires 24G of video memory according to ‘https://github.com/yuesongtian/AlphaGAN/issues/6’ and this level of resource could not be met via Google Colab. In order to remedy this I tried reducing the batch size to 32 as seen in the attribute “--batch_size 32”. But this still did not resolve the above mentioned issue caused by the resource limitation.

I hope this will assist you in finding a solution from here onwards. Please be kind enough to get back if you are able to return with a positive result regarding this.

@VinulaUthsara When I executed the cell of "!pip install -r requirements.txt", I get this error
"ERROR: Invalid requirement: '_libgcc_mutex=0.1=main' (from line 4 of requirements.txt)
Hint: = is not a valid operator. Did you mean == ?".

Can you help me on how you resolved this.

Hi @Brym-Gyimah,

It is not necessary to resolve the error you mentioned. You may encounter several such errors in the notebook. You can continue running the commands and following the instructions in my previous reply to get the AlphaGAN code running.