Murali-group/Beeline

Some problem of reproducing Fig.5 results

JaneJiayiDong opened this issue · 1 comments

Hello, sorry for bothering. I am facing some issues in reproducing the results of Fig.5 of the paper. I downloaded the data (BEELINE-data and Networks) from Zenodo and used the the generateExpInputs.py.

  1. I used the expression data (mESC) and the network(Non-Specific-ChIP-seq-network.csv), and set other parameter as default. The mistake is as follows:
Traceback (most recent call last):
  File "generateExpInputs_raw.py", line 171, in <module>
    print("\n#TFs: %d, #Genes: %d, #Edges: %d, Density: %.3f" % (nTFs,nGenes,netDF.shape[0],netDF.shape[0]/((nTFs*nGenes)-nTFs)))
ZeroDivisionError: division by zero

I found that the Gene names in Non-Specific-ChIP-seq-network.csv are uppercase, which is different from ExpressionData.csv, so I add
expr_df.index = expr_df.index.to_series().apply(lambda x:x.upper())
before
expr_df.to_csv(opts.outPrefix+'-ExpressionData.csv')
The result is:
#TFs: 27, #Genes: 144, #Edges: 264, Density: 0.068

  1. After looking the issues #65 , I try to reproduce the results for the hESC datasets using the STRING ground truth net, and the result is:
    #TFs: 28, #Genes: 82, #Edges: 112, Density: 0.049

I need some help for these problems. Maybe there are some steps for data preprocessing while I ignore them, please give me some advice.

Thank you
Best wishes
Jiayi Dong

After my check again, I found that it is just a simple error. If the following modifications are made, we can get the same results as the Fig 5.

print("\nReading %s" % (expr_file))
expr_df = pd.read_csv(expr_file, header=0, index_col=0)
expr_df.index = expr_df.index.to_series().apply(lambda x:x.upper())
print("\nReading %s" % (gene_ordering_file))
gene_df = pd.read_csv(gene_ordering_file, header=0, index_col=0)
gene_df.index = gene_df.index.to_series().apply(lambda x:x.upper())