fjxmlzn/DoppelGANger

module 'zmq.backend.cython.socket' has no attribute 'get'

dstan11 opened this issue · 15 comments

I met some problems when I run scheduler.start(). It says module 'zmq.backend.cython.socket' has no attribute 'get'
and Can't get attribute 'get' on <module 'zmq.backend.cython.socket' from 'E:\\Users\\shand\\anaconda3\\envs\\DoppelGANger\\lib\\site-packages\\zmq\\backend\\cython\\socket.cp35-win_amd64.pyd'>
and Can't pickle <cyfunction Socket.get at 0x000001FCDFCC71B8>: it's not found as zmq.backend.cython.socket.get

I am not sure why you see these errors.

Could you please post here:

  1. The complete error log
  2. How you install the Python environment and the packages
  3. The list of the installed Python packages and versions

So that I can reproduce these errors and debug it?

Thanks!

I created a notebook which has the same content with main.py under DoppelGANger/DoppelGANger/example_training folder.

if __name__ == "__main__":
    from gan_task import GANTask
    from config import config
    from gpu_task_scheduler.gpu_task_scheduler import GPUTaskScheduler
    scheduler = GPUTaskScheduler(config=config, gpu_task_class=GANTask)
    scheduler.start() 
  1. error log
    error.txt
  2. python version 3.5.2
    packages.txt

Thanks. Can you try directly executing it instead of from Jupiter notebook?

Yes. I tried python main.py under DoppelGANger/DoppelGANger/example_training folder through Terminal. It seems no error came up. However, the program is still running after 3 hours. I have no idea how long it supposed to be. By the way, GPU Performance didn't change after I run the program.

Thanks.

You can look at worker.log in subfolders of results folder for the training progress.

If the code isn't using GPU, then

  1. Make sure that you installed tensorflow-gpu instead of tensorflow
  2. You can check worker.log and see if there are any error messages about loading Cuda library.

Sorry to disturb you again. I didn't find results folder. Can you show me where it is?

Thanks!

It should be on the same level as example_training folder. It is configured in config.py: "result_root_folder": "../results/"

Thank you for the reply! I updated python version to 3.7 and tensorflow-gpu version to 1.1.4. Now the program works.

Great!!

It has a new error message.

Traceback (most recent call last):
  File "E:\Users\shand\anaconda3\envs\DoppelGANger2\Scripts\start_gpu_task-script.py", line 33, in <module>
    sys.exit(load_entry_point('GPUTaskScheduler', 'console_scripts', 'start_gpu_task')())
  File "f:\github clone folder\gputask\gputaskscheduler\gpu_task_scheduler\start_gpu_task.py", line 23, in main
    worker.main()
  File "F:\Github clone folder\DoppelGANger\DoppelGANger\example_training\gan_task.py", line 124, in main
    gan.train(restore=restore)
  File "..\gan\doppelganger.py", line 918, in train
    self.visualize(epoch_id, batch_id, global_id)
  File "..\gan\doppelganger.py", line 801, in visualize
    sub1(features, attributes, lengths, None, None, None, "free")
  File "..\gan\doppelganger.py", line 749, in sub1
    ground_truth_lengths=ground_truth_lengths)
  File "<__array_function__ internals>", line 6, in savez
  File "E:\Users\shand\anaconda3\envs\DoppelGANger2\lib\site-packages\numpy\lib\npyio.py", line 645, in savez
    _savez(file, args, kwds, False)
  File "E:\Users\shand\anaconda3\envs\DoppelGANger2\lib\site-packages\numpy\lib\npyio.py", line 743, in _savez
    zipf = zipfile_factory(file, mode="w", compression=compression)
  File "E:\Users\shand\anaconda3\envs\DoppelGANger2\lib\site-packages\numpy\lib\npyio.py", line 119, in zipfile_factory
    return zipfile.ZipFile(file, *args, **kwargs)
  File "E:\Users\shand\anaconda3\envs\DoppelGANger2\lib\zipfile.py", line 1240, in __init__
    self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: '../results/aux_disc-False,dataset-google,epoch-400,epoch_checkpoint_freq-1,extra_checkpoint_freq-5,run-0,sample_len-1,self_norm-False,\\sample\\epoch_id-0,batch_id-199,global_id-199,type-free,samples.npz'

Could you please try modifying "result_root_folder": "../results/" in config.py to "result_root_folder": "..\\results\\", since you are in windows and the directory separator should be \. And then delete results folder and run again.

Let me know if it doesn't work.

It doesn't work. It has the same error message.

I think another potential problem is that windows does not allow , in filenames. You can change , by adding test_config_string_separator="-" or others in scheduler_config section of config.py. (see https://github.com/fjxmlzn/GPUTaskScheduler for the detailed explanation.)

But I just want to double-check if there are other issues: could you please show me the directory structure of F:\Github clone folder\DoppelGANger\DoppelGANger\ after this error happens?

F:\Github clone folder\DoppelGANger\DoppelGANger\
folder
F:\Github clone folder\DoppelGANger\DoppelGANger\results
results
F:\Github clone folder\DoppelGANger\DoppelGANger\results\aux_disc-False,dataset-google,epoch-400,epoch_checkpoint_freq-1,extra_checkpoint_freq-5,run-0,sample_len-1,self_norm-False,
3

Thanks. Could you please email me the current code and worker.log and let me check it: zinanl AT andrew.cmu.edu