Describe the bug
The bug happens when loading the DUD-E and scPDB dataset

To Reproduce
Steps to reproduce the behavior:

  1. Just run the demo from your websites
    from tdc.generation import SBDD
    data = SBDD(name='dude')

Expected behavior
Get the data object

Found local copy for 1/2 file...
Found local copy for 2/2 file...
Processing (this may take long)...
100%|██████████| 102/102 [07:33<00:00, 4.44s/it]
processing done, 0/40490 fails

ValueError Traceback (most recent call last)
/export/disk1/why/database/PL_interaciton_dataset/script/tmp.ipynb 单元格 1 in ()
1 from tdc.generation import SBDD
----> 2 data = SBDD(name='dude')

File /export/disk3/why/software/Anaconda/conda/envs/RDKit/lib/python3.8/site-packages/tdc/generation/sbdd.py:44, in SBDD.init(self, name, path, print_stats, return_pocket, threshold, remove_protein_Hs, remove_ligand_Hs, keep_het, save)
42 protein, ligand = bi_distribution_dataset_load(name, path, multiple_molecule_dataset_names, return_pocket, threshold, remove_protein_Hs, remove_ligand_Hs, keep_het)
43 if save:
---> 44 np.savez(os.path.join(path, name + '.npz'),
45 protein_coord=protein['coord'],
46 protein_atom=protein['atom_type'],
47 ligand_coord=ligand['coord'],
48 ligand_atom=ligand['atom_type'],
49 )
50 self.save = save
52 self.ligand = ligand

File <array_function internals>:200, in savez(*args, **kwargs)

File /export/disk3/why/software/Anaconda/conda/envs/RDKit/lib/python3.8/site-packages/numpy/lib/npyio.py:615, in savez(file, *args, **kwds)
531 @array_function_dispatch(_savez_dispatcher)
532 def savez(file, *args, **kwds):
533 """Save several arrays into a single file in uncompressed .npz format.
535 Provide arrays as keyword arguments to store them under the
614 """
--> 615 _savez(file, args, kwds, False)

File /export/disk3/why/software/Anaconda/conda/envs/RDKit/lib/python3.8/site-packages/numpy/lib/npyio.py:716, in _savez(file, args, kwds, compress, allow_pickle, pickle_kwargs)
714 for key, val in namedict.items():
715 fname = key + '.npy'
--> 716 val = np.asanyarray(val)
717 # always force zip64, gh-10776
718 with zipf.open(fname, 'w', force_zip64=True) as fid:

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (40592,) + inhomogeneous part.


  • OS: Linux
  • Python version: 3.8.13
  • TDC version: 0.3.8
  • Any other relevant information: None

Additional context

Thanks for raising this issue! @yuanqidu could you help take a look - thanks!

@yuanqidu can you help with this one?