lilab-bcb/pegasus

Error reporting when run "pg.fle(data, rep=harmony_key)"

Closed this issue · 6 comments

Hi there! We tried to use FLE on our data (160K cells * 20K genes).
There was an error:
cmd: pg.fle(data, rep=harmony_key)

2022-01-04 04:11:52,498 - pegasus.tools.graph_operations - INFO - Function 'construct_graph' finished in 7.93s.
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
<ipython-input-15-f937e907024f> in <module>
----> 1 pg.fle(data, rep=harmony_key)

~/.conda/envs/pegasus/lib/python3.7/site-packages/pegasusio/decorators.py in wrapper_timer(*args, **kwargs)
     10                 def wrapper_timer(*args, **kwargs):
     11                         start = time.perf_counter()
---> 12                         result = func(*args, **kwargs)
     13                         end = time.perf_counter()
     14                         message = f"Function '{func.__name__}' finished in {{:.{precision}f}}s.".format(end - start)

~/.conda/envs/pegasus/lib/python3.7/site-packages/pegasus/tools/visualization.py in fle(data, file_name, n_jobs, rep, K, full_speed, target_change_per_node, target_steps, is3d, memory, random_state, out_basis)
    504         is3d,
    505         memory,
--> 506         random_state,
    507     )
    508 

~/.conda/envs/pegasus/lib/python3.7/site-packages/pegasus/tools/visualization.py in calc_force_directed_layout(W, file_name, n_jobs, target_change_per_node, target_steps, is3d, memory, random_state, init)
    215         memory=memory,
    216         random_state=random_state,
--> 217         init=init,
    218     )
    219 

~/.conda/envs/pegasus/lib/python3.7/site-packages/forceatlas2/__init__.py in forceatlas2(file_name, graph, n_jobs, target_change_per_node, target_steps, is3d, memory, random_state, init)
     64         command.extend(["--coords", init_coord_file])
     65 
---> 66     check_call(command)
     67 
     68     fle_coords = pd.read_csv(output_coord_file, header=0, index_col=0, sep="\t").values

~/.conda/envs/pegasus/lib/python3.7/subprocess.py in check_call(*popenargs, **kwargs)
    361         if cmd is None:
    362             cmd = popenargs[0]
--> 363         raise CalledProcessError(retcode, cmd)
    364     return 0
    365 

CalledProcessError: Command '['java', '-Djava.awt.headless=true', '-Xmx8g', '-cp', '/cluster/home/zfli/.conda/envs/pegasus/lib/python3.7/site-packages/forceatlas2/ext/forceatlas2.jar:/cluster/home/zfli/.conda/envs/pegasus/lib/python3.7/site-packages/forceatlas2/ext/gephi-toolkit-0.9.2-all.jar', 'kco.forceatlas2.Main', '--input', '/tmp/tmpcc78vlwq.net', '--output', '/tmp/tmpcc78vlwq.coords', '--nthreads', '20', '--seed', '0', '--targetChangePerNode', '2.0', '--targetSteps', '5000', '--2d']' returned non-zero exit status 1.

Can anyone help me?

Hi @Zifeng-L . Could you please provide the JAVA info on your computer? For example, the output of java --version.

Hi @yihming, I have checked the version of JAVA.

java -version
openjdk version "1.8.0_302"
OpenJDK Runtime Environment (build 1.8.0_302-b08)
OpenJDK 64-Bit Server VM (build 25.302-b08, mixed mode)

And when I used a small dataset (e.g. 10k cell * 30k genes), it worked well.

bli25 commented

@Zifeng-L , is it possible that you can share with us an example dataset that can trigger this error?

@Zifeng-L , is it possible that you can share with us an example dataset that can trigger this error?

Sure, but the data is a little big and when I subset this dataset, it can work well. So how can I send the example for triggering?

@Zifeng-L I suspect this failure is due to out-of-memory issue.

We have a public dataset for testing: https://storage.googleapis.com/terra-featured-workspaces/Cumulus/MantonBM_nonmix.zarr.zip, which has number of cells and genes close to your data.

To test FLE on the data, you can follow the steps in Pegasus analysis tutorial from QC until FLE. And during the test, you may also want to use tools like top or htop to track the memory usage in real time.

I'll also test at my side about how large memory the analysis on this dataset should require when I have time.

Sincerely,
Yiming

@yihming Thanks for your help! I will try to test the dataset in a larger memory cluster in HPC. Hope it can work well! Thanks again!