error when running grnboost

Question

error when running grnboost

Closed this issue 6 years ago · 2 comments

Hello,

I am implementing pySCENIC program and ran into a problem with grnboost package. I followed the instructions and wrote my code similar to this:
//
import pandas as pd
from arboreto.utils import load_tf_names
from arboreto.algo import grnboost2
if name == 'main':
# load the data
ex_matrix = pd.read_csv(<ex_path>, sep='\t')
tf_names = load_tf_names(<tf_path>)
network = grnboost2(expression_data=ex_matrix, tf_names=tf_names)
//
pySCENIC works fine with small data set of 250 genes; however, for bigger data set that I am testing out (~2000 genes or more), this is the error that I got:

UserWarning: Large object of size 1.17 MB detected in task graph:
(["('from-delayed-7f2fea60c7dfbbfb0ec7f83dc75b83af ... af', 19972)"],)
Consider scattering large objects ahead of time
with client.scatter to reduce scheduler burden and
keep data on workers

future = client.submit(func, big_data)    # bad

big_future = client.scatter(big_data)     # good
future = client.submit(func, big_future)  # good

% (format_bytes(len(b)), s))

The program stuck at this point and never finished when I ran it on Macbook Pro (2.6Hz i7). I also tried the command-line version as pyscenic grnboost -o OUTPUT @grn_args.txt in which grn_args.txt contains names of expression matrix and known TF file; expression matrix input have cell IDs as rows and genes as columns.
What would you think is the issue here?

Thank you,
Diep

Answer 1 · 2018-06-09T09:53:52.000Z

Hello,

Note that the GRN inference step is a very intensive computational step, which might take hours to days on a laptop. GRNBoost2 was designed to run on 1 or multiple big machines (e.g. dual 12-core Xeon CPU, 128GB ram), on a laptop you might run into memory problems and very long execution times.

In some cases, increasing the worker memory limit helps:

client = Client(LocalCluster(memory_limit=8e9))

On a Mac you can use the system monitor to see what is happening. On Linux we typically use htop.

kind regards,
Thomas

Answer 2 · 2018-06-15T03:24:09.000Z

Hi Thomas,

I can get the results from grnboost. The output file will have 3 columns. What can I do if I want import this result back to R and run the scenic pipeline ? I found the code was missing in the tutorial.

Best,
Peng