shenwanxiang/bidd-aggmap

NameError: name 'data' is not defined for Windows aggmap

shenwanxiang opened this issue · 1 comments

RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\AdminCOOP\anaconda3\envs\aggmap\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "C:\Users\AdminCOOP\anaconda3\envs\aggmap\lib\site-packages\aggmap\utils\calculator.py", line 41, in _fuc
return _calculate(i1, i2)
File "C:\Users\AdminCOOP\anaconda3\envs\aggmap\lib\site-packages\aggmap\utils\calculator.py", line 23, in _calculate
x1 = data[:, i1]
NameError: name 'data' is not defined
"""

The above exception was the direct cause of the following exception:

NameError Traceback (most recent call last)
Cell In[1], line 11
8 dfy = pd.get_dummies(pd.Series(data.target))
10 # AggMap object definition, fitting, and saving
---> 11 mp = AggMap(dfx, metric = 'correlation')
12 mp.fit(cluster_channels=5, emb_method = 'umap', verbose=0)
13 mp.save('agg.mp')

File ~\anaconda3\envs\aggmap\lib\site-packages\aggmap\map.py:176, in AggMap.init(self, dfx, metric, by_scipy, n_cpus, info_distance)
174 self.info_distance = D.clip(0, np.inf)
175 else:
--> 176 D = calculator.pairwise_distance(dfx.values, n_cpus=n_cpus, method=metric)
177 D = np.nan_to_num(D,copy=False)
178 D_ = squareform(D)

File ~\anaconda3\envs\aggmap\lib\site-packages\aggmap\utils\calculator.py:67, in pairwise_distance(npydata, n_cpus, method)
65 N = data.shape[1]
66 lst = list(_yield_combinations(N))
---> 67 res = MultiProcessUnorderedBarRun(_fuc, lst, n_cpus=n_cpus)
68 dist_matrix = np.zeros(shape = (N,N))
69 for x,y,v in tqdm(res,ascii=True):

File ~\anaconda3\envs\aggmap\lib\site-packages\aggmap\utils\multiproc.py:111, in MultiProcessUnorderedBarRun(func, deal_list, n_cpus)
109 res_list = []
110 with pbar(total = len(deal_list), ascii=True) as pb:
--> 111 for res in p.imap_unordered(func, deal_list):
112 pb.update(1)
113 res_list.append(res)

File ~\anaconda3\envs\aggmap\lib\multiprocessing\pool.py:868, in IMapIterator.next(self, timeout)
866 if success:
867 return value
--> 868 raise value

NameError: name 'data' is not defined

You shouldn't expect the values of global variables that you set in the parent process to be automatically propagated to the child processes.

Your code happens to work on Unix-like platforms because on those platforms multiprocessing uses fork(). This means that every child processes gets a copy of the parent process's address space, including all global variables.

This isn't the case on Windows; every variable from the parent process that needs to be accessed by the child has to be explicitly passed down or placed in shared memory.

Once you do this, your code will work on both Unix and Windows.

Ref: https://stackoverflow.com/questions/6596617/python-multiprocess-diff-between-windows-and-linux