module 'zmq.backend.cython.socket' has no attribute 'get'

Question

module 'zmq.backend.cython.socket' has no attribute 'get'

dstan11 opened this issue 4 years ago · 15 comments

I met some problems when I run scheduler.start(). It says module 'zmq.backend.cython.socket' has no attribute 'get'
and Can't get attribute 'get' on <module 'zmq.backend.cython.socket' from 'E:\\Users\\shand\\anaconda3\\envs\\DoppelGANger\\lib\\site-packages\\zmq\\backend\\cython\\socket.cp35-win_amd64.pyd'>
and Can't pickle <cyfunction Socket.get at 0x000001FCDFCC71B8>: it's not found as zmq.backend.cython.socket.get

fjxmlzn commented 4 years ago

Great!!

Answer 1 · 2020-07-16T14:35:33.000Z

I am not sure why you see these errors.

Could you please post here:

The complete error log
How you install the Python environment and the packages
The list of the installed Python packages and versions

So that I can reproduce these errors and debug it?

Thanks!

Answer 2 · 2020-07-16T15:58:00.000Z

I created a notebook which has the same content with main.py under DoppelGANger/DoppelGANger/example_training folder.

if __name__ == "__main__":
    from gan_task import GANTask
    from config import config
    from gpu_task_scheduler.gpu_task_scheduler import GPUTaskScheduler
    scheduler = GPUTaskScheduler(config=config, gpu_task_class=GANTask)
    scheduler.start()

error log
error.txt
python version 3.5.2
packages.txt

Answer 3 · 2020-07-16T16:24:53.000Z

Thanks. Can you try directly executing it instead of from Jupiter notebook?

Answer 4 · 2020-07-17T01:34:50.000Z

Yes. I tried python main.py under DoppelGANger/DoppelGANger/example_training folder through Terminal. It seems no error came up. However, the program is still running after 3 hours. I have no idea how long it supposed to be. By the way, GPU Performance didn't change after I run the program.

Thanks.

Answer 5 · 2020-07-17T01:52:24.000Z

You can look at worker.log in subfolders of results folder for the training progress.

If the code isn't using GPU, then

Make sure that you installed tensorflow-gpu instead of tensorflow
You can check worker.log and see if there are any error messages about loading Cuda library.

Answer 6 · 2020-07-17T02:31:51.000Z

Sorry to disturb you again. I didn't find results folder. Can you show me where it is?

Thanks!

Answer 7 · 2020-07-17T03:59:42.000Z

It should be on the same level as example_training folder. It is configured in config.py: "result_root_folder": "../results/"

Answer 8 · 2020-07-17T05:56:38.000Z

Thank you for the reply! I updated python version to 3.7 and tensorflow-gpu version to 1.1.4. Now the program works.

Answer 9 · 2020-07-17T06:12:02.000Z

It has a new error message.

Traceback (most recent call last):
  File "E:\Users\shand\anaconda3\envs\DoppelGANger2\Scripts\start_gpu_task-script.py", line 33, in <module>
    sys.exit(load_entry_point('GPUTaskScheduler', 'console_scripts', 'start_gpu_task')())
  File "f:\github clone folder\gputask\gputaskscheduler\gpu_task_scheduler\start_gpu_task.py", line 23, in main
    worker.main()
  File "F:\Github clone folder\DoppelGANger\DoppelGANger\example_training\gan_task.py", line 124, in main
    gan.train(restore=restore)
  File "..\gan\doppelganger.py", line 918, in train
    self.visualize(epoch_id, batch_id, global_id)
  File "..\gan\doppelganger.py", line 801, in visualize
    sub1(features, attributes, lengths, None, None, None, "free")
  File "..\gan\doppelganger.py", line 749, in sub1
    ground_truth_lengths=ground_truth_lengths)
  File "<__array_function__ internals>", line 6, in savez
  File "E:\Users\shand\anaconda3\envs\DoppelGANger2\lib\site-packages\numpy\lib\npyio.py", line 645, in savez
    _savez(file, args, kwds, False)
  File "E:\Users\shand\anaconda3\envs\DoppelGANger2\lib\site-packages\numpy\lib\npyio.py", line 743, in _savez
    zipf = zipfile_factory(file, mode="w", compression=compression)
  File "E:\Users\shand\anaconda3\envs\DoppelGANger2\lib\site-packages\numpy\lib\npyio.py", line 119, in zipfile_factory
    return zipfile.ZipFile(file, *args, **kwargs)
  File "E:\Users\shand\anaconda3\envs\DoppelGANger2\lib\zipfile.py", line 1240, in __init__
    self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: '../results/aux_disc-False,dataset-google,epoch-400,epoch_checkpoint_freq-1,extra_checkpoint_freq-5,run-0,sample_len-1,self_norm-False,\\sample\\epoch_id-0,batch_id-199,global_id-199,type-free,samples.npz'

Answer 10 · 2020-07-17T06:47:57.000Z

Could you please try modifying "result_root_folder": "../results/" in config.py to "result_root_folder": "..\\results\\", since you are in windows and the directory separator should be \. And then delete results folder and run again.

Let me know if it doesn't work.

Answer 11 · 2020-07-17T08:10:41.000Z

It doesn't work. It has the same error message.

Answer 12 · 2020-07-17T14:48:04.000Z

I think another potential problem is that windows does not allow , in filenames. You can change , by adding test_config_string_separator="-" or others in scheduler_config section of config.py. (see https://github.com/fjxmlzn/GPUTaskScheduler for the detailed explanation.)

But I just want to double-check if there are other issues: could you please show me the directory structure of F:\Github clone folder\DoppelGANger\DoppelGANger\ after this error happens?

Answer 13 · 2020-07-17T15:41:27.000Z

F:\Github clone folder\DoppelGANger\DoppelGANger\

F:\Github clone folder\DoppelGANger\DoppelGANger\results

F:\Github clone folder\DoppelGANger\DoppelGANger\results\aux_disc-False,dataset-google,epoch-400,epoch_checkpoint_freq-1,extra_checkpoint_freq-5,run-0,sample_len-1,self_norm-False,

Answer 14 · 2020-07-17T16:00:08.000Z

Thanks. Could you please email me the current code and worker.log and let me check it: zinanl AT andrew.cmu.edu