xiaoyeye/CNNC

Error while training a new model

adyprat opened this issue · 6 comments

Hi,
I'm trying to follow your documentation on predicting edge scores on your example data. I ran into the following issues:

  • I get the following error when I ran the command KERAS_BACKEND=theano python predict_no_y.py 9 NEPDF_data/ trained_models/KEGG_keras_cnn_trained_model_shallow.h5

The output is:
Using Theano backend.
select <class 'type'>
(309, 64, 32, 1) x_test samples
Traceback (most recent call last):
File "predict_no_y.py", line 81, in
model.load_weights(model_path)
File "anaconda3/envs/cnnc/lib/python3.7/site-packages/keras/engine/saving.py", line 458, in load_wrapper
return load_function(*args, **kwargs)
File "anaconda3/envs/cnnc/lib/python3.7/site-packages/keras/engine/network.py", line 1208, in load_weights
with h5py.File(filepath, mode='r') as f:
File "anaconda3/envs/cnnc/lib/python3.7/site-packages/h5py/_hl/files.py", line 394, in init
swmr=swmr)
File "anaconda3/envs/cnnc/lib/python3.7/site-packages/h5py/_hl/files.py", line 170, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 85, in h5py.h5f.open
OSError: Unable to open file (truncated file: eof = 8650752, sblock->base_addr = 0, stored_eof = 8661656)

I'm using Linux OS and installed the packages using Anaconda. The h5 files are obtained after extracting the .rar file containing the models. Could you provide the original .hd5 files without using .rar compression?

  • So I tried training my own model using KERAS_BACKEND=theano python train_new_model/train_with_labels_wholedatax.py 9 NEPDF_data/ 3. Then I got another error:

Using Theano backend.
0 12
1 12
2 144
3 48
4 15
5 18
6 3
7 12
8 45
Traceback (most recent call last):
File "train_new_model/train_with_labels_wholedatax.py", line 63, in
(x_train, y_train,count_set_train) = load_data_TF2(whole_data_TF,data_path)
File "train_new_model/train_with_labels_wholedatax.py", line 51, in load_data_TF2
yydata_x = yydata_array.astype('int')
ValueError: invalid literal for int() with base 10: 'olfr1136\tgnal'

I'm not sure how to fix this.

  • Also, the usage for scripts get_xy_label_data_cnn_combine_from_database.py and predict_no_y differ from documentation and the actual scripts. Which is the correct command?

I'll appreciate it if you can help me with this/update your documentation accordingly.

Thank you,
-Aditya

another reminder,
all command lines are just used for demo. If you want to run the whole real data, plz replace "mmukegg_new_new_unique_rand_labelx_num_sy.txt" with "mmukegg_new_new_unique_rand_labelx_num.txt", and replace all "9" with '3057', which is the real number of separations

@xiaoyeye Thank you for looking into this.

I have another question about line 128 in get_xy_label_data_cnn_combine_from_database.py

128: HT_bulk = (log10(H_bulk / 43261 + 10 ** -4) + 4)/4

I understand that in line 134 you would divide by 43261 since that is the number of cells in sc data, why divide bulk data by 43261 instead of 249?

Thanks,
Aditya

And, perhaps the range of [0,1] is not necessary, once all train and test data are uniformly normalized, the network should work well.

Sure, I'll go ahead and use sample size for normalization.
I was just curious because in order to have a range of (0,1] I thought we should just divide by 249 for bulk data, since dividing the bulk data's histogram values by 43261 would have a range of at most (0,249/43261] which is approximately (0,0.006].
Thank you for taking the time to address my comments.

-Aditya