Goosang-Yu/genet

DeepPrimeGuideRNA: input sequences in uppercase or lowercase changes the output!

francoiskroll opened this issue · 6 comments

Is this the right repo to raise the issue? Or you prefer in DeepPrime repo? I'll put it here as this is where I got the example from documentation.

Input sequences in uppercase or lowercase give different outputs, which surely is not right...

Here is just taking the example from documentation:

# UPPERCASE #
target    = 'ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG'
pbs       = 'GGCAAGGGTGT'
rtt       = 'CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAA'
edit_len  = 1
edit_pos  = 34
edit_type = 'sub'

pegrna = DeepPrimeGuideRNA('pegRNA_test', target=target, pbs=pbs, rtt=rtt, 
                           edit_len=edit_len, edit_pos=edit_pos, edit_type=edit_type)

pe2max_score = pegrna.predict('PE2max')
print(pe2max_score) # gives 3.768321990966797

# LOWERCASE #
target    = 'ataaaagacaacacccttgccttgtggagttttcaaagctcccagaaactgagaagaactataacctgcaaatg'
pbs       = 'ggcaagggtgt'
rtt       = 'cgtctcagtttctgggagctttgaaaactccacaa'
edit_len  = 1
edit_pos  = 34
edit_type = 'sub'

pegrna = DeepPrimeGuideRNA('pegRNA_test', target=target, pbs=pbs, rtt=rtt, 
                           edit_len=edit_len, edit_pos=edit_pos, edit_type=edit_type)

pe2max_score = pegrna.predict('PE2max')
print(pe2max_score) # gives 0.10642468929290771

From what I can tell, this issue does not seem to occur with DeepPrime(...), i.e. whether wt_seq & mut_seq are lowercase or uppercase gives the same dataframe after pegrna.predict(...).

Oh... Thank you for uncovering this critical issue.

It seems that the problem arose due to the lack of code to convert the input sequence to .upper().
It appears that the issue arises when the input sequence contains lowercase letters, causing the GC count feature to be improperly counted, resulting in unexpected outputs.

# genet/predict/PrimeEditor.py/DeepPrimeGuideRNA
'GC_count_PBS'               : [pbs.count('G') + pbs.count('C')],
'GC_count_RTT'               : [rtt.count('G') + rtt.count('C')],
'GC_count_RT-PBS'            : [self.rtpbs.count('G') + self.rtpbs.count('C')],

Firstly, inputting all sequences in uppercase should yield the correct DeepPrime score.
Could you please keep this issue open? I will work on a hot fix release as soon as possible to address the bug.

Thank you for finding and reporting this important bug!

P.S. I prefer this repo for discussing about GenET! Thank you.

I've fixed the bugs you reported and incorporated them into this version update. Thanks for bringing up important matters!

A few additional points:

  • I've made it so that it doesn't matter whether you input PBS / RTT sequences as DNA (T) or RNA (U).
  • However, the direction of the protospacer sequence must be fixed from 5' to 3'.
  • These changes will be applied from genet version 0.15.0 onwards.

If you have plans to test, please go ahead and let me know the results. I would appreciate it. If you don't have plans or if there are no issues even after checking, I'll close this issue.

PS. With this update, there have been significant changes in the input format of DeepPrime (not DeepPrimeGuideRNA). Please keep this in mind for future use!

Amazing! That input for genet 0.15.0 looks great.

I gave it a try but having some sort of installation issue, sorry! I am guessing related to #86. Here is what I tried:

In Terminal:

conda create -n deepprime2
conda activate deepprime2
conda config --env --set subdir osx-64
conda install python=3.10
pip install genet

It said it worked.

Then in a Jupyter Notebook:

from genet.predict import DeepPrime

This runs forever... e.g. print('hey') works so I do not think it is a Python/Jupyter issue. I do not think it's the same as #84 as here it's getting stuck at import phase. Is it still missing some dependency?

I apologize for the inconvenience you’ve been experiencing.
Let’s test the import of some key packages individually to identify any problems.

Can you please check each one separately to see if there are any issues?

import Bio
import RNA
import torch
import tensorflow
import silence_tensorflow
import genet.predict.PrimeEditor

Thank you for your patience! 🙏

No problem, happy to help.

tensorflow & import genet.predict.PrimeEditor keep going forever (well, I killed after 2 min). All the rest imports fine (import torch took 30 sec but did finish).

Here is output from conda list, if that's helpful:

# packages in environment at /Users/francoiskroll/miniconda3/envs/deepprime2:
#
# Name                    Version                   Build  Channel
absl-py                   2.1.0                    pypi_0    pypi
appnope                   0.1.4              pyhd8ed1ab_0    conda-forge
asttokens                 2.4.1              pyhd8ed1ab_0    conda-forge
astunparse                1.6.3                    pypi_0    pypi
biopython                 1.83                     pypi_0    pypi
bzip2                     1.0.8                h6c40b1e_5  
ca-certificates           2024.3.11            hecd8cb5_0  
cachetools                5.3.3                    pypi_0    pypi
certifi                   2024.2.2                 pypi_0    pypi
charset-normalizer        3.3.2                    pypi_0    pypi
comm                      0.2.2              pyhd8ed1ab_0    conda-forge
cramjam                   2.8.3                    pypi_0    pypi
debugpy                   1.8.1           py310h5daac23_0    conda-forge
decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
editdistance              0.8.1                    pypi_0    pypi
exceptiongroup            1.2.0              pyhd8ed1ab_2    conda-forge
executing                 2.0.1              pyhd8ed1ab_0    conda-forge
fastparquet               2024.2.0                 pypi_0    pypi
filelock                  3.13.4                   pypi_0    pypi
flatbuffers               1.12                     pypi_0    pypi
fsspec                    2024.3.1                 pypi_0    pypi
gast                      0.4.0                    pypi_0    pypi
genet                     0.15.0                   pypi_0    pypi
google-auth               2.29.0                   pypi_0    pypi
google-auth-oauthlib      0.4.6                    pypi_0    pypi
google-pasta              0.2.0                    pypi_0    pypi
grpcio                    1.62.2                   pypi_0    pypi
h5py                      3.11.0                   pypi_0    pypi
idna                      3.7                      pypi_0    pypi
importlib-metadata        7.1.0              pyha770c72_0    conda-forge
importlib_metadata        7.1.0                hd8ed1ab_0    conda-forge
ipykernel                 6.29.3             pyh3cd1d5f_0    conda-forge
ipython                   8.22.2             pyh707e725_0    conda-forge
jedi                      0.19.1             pyhd8ed1ab_0    conda-forge
jinja2                    3.1.3                    pypi_0    pypi
jupyter_client            8.6.1              pyhd8ed1ab_0    conda-forge
jupyter_core              5.7.2           py310h2ec42d9_0    conda-forge
keras                     2.9.0                    pypi_0    pypi
keras-preprocessing       1.1.2                    pypi_0    pypi
libclang                  18.1.1                   pypi_0    pypi
libcxx                    16.0.6               hd57cbcb_0    conda-forge
libffi                    3.4.4                hecd8cb5_0  
libsodium                 1.0.18               hbcb3906_1    conda-forge
markdown                  3.6                      pypi_0    pypi
markupsafe                2.1.5                    pypi_0    pypi
matplotlib-inline         0.1.7              pyhd8ed1ab_0    conda-forge
mpmath                    1.3.0                    pypi_0    pypi
ncurses                   6.4                  hcec6c5f_0  
nest-asyncio              1.6.0              pyhd8ed1ab_0    conda-forge
networkx                  3.3                      pypi_0    pypi
numpy                     1.26.4                   pypi_0    pypi
oauthlib                  3.2.2                    pypi_0    pypi
openssl                   3.2.1                hd75f5a5_1    conda-forge
opt-einsum                3.3.0                    pypi_0    pypi
packaging                 24.0               pyhd8ed1ab_0    conda-forge
pandas                    2.2.2                    pypi_0    pypi
parso                     0.8.4              pyhd8ed1ab_0    conda-forge
pexpect                   4.9.0              pyhd8ed1ab_0    conda-forge
pickleshare               0.7.5                   py_1003    conda-forge
pip                       23.3.1          py310hecd8cb5_0  
platformdirs              4.2.1              pyhd8ed1ab_0    conda-forge
prompt-toolkit            3.0.42             pyha770c72_0    conda-forge
protobuf                  3.19.6                   pypi_0    pypi
psutil                    5.9.8           py310hb372a2b_0    conda-forge
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
pyarrow                   16.0.0                   pypi_0    pypi
pyasn1                    0.6.0                    pypi_0    pypi
pyasn1-modules            0.4.0                    pypi_0    pypi
pygments                  2.17.2             pyhd8ed1ab_0    conda-forge
python                    3.10.14              h5ee71fb_0  
python-dateutil           2.9.0.post0              pypi_0    pypi
python_abi                3.10                    2_cp310    conda-forge
pytz                      2024.1                   pypi_0    pypi
pyzmq                     26.0.2          py310hdd8d2da_0    conda-forge
readline                  8.2                  hca72f7f_0  
regex                     2024.4.16                pypi_0    pypi
requests                  2.31.0                   pypi_0    pypi
requests-oauthlib         2.0.0                    pypi_0    pypi
rsa                       4.9                      pypi_0    pypi
setuptools                68.2.2          py310hecd8cb5_0  
silence-tensorflow        1.2.1                    pypi_0    pypi
six                       1.16.0             pyh6c4a22f_0    conda-forge
sqlite                    3.41.2               h6c40b1e_0  
stack_data                0.6.2              pyhd8ed1ab_0    conda-forge
support-developer         1.0.5                    pypi_0    pypi
sympy                     1.12                     pypi_0    pypi
tensorboard               2.9.1                    pypi_0    pypi
tensorboard-data-server   0.6.1                    pypi_0    pypi
tensorboard-plugin-wit    1.8.1                    pypi_0    pypi
tensorflow                2.9.3                    pypi_0    pypi
tensorflow-estimator      2.9.0                    pypi_0    pypi
tensorflow-io-gcs-filesystem 0.36.0                   pypi_0    pypi
termcolor                 2.4.0                    pypi_0    pypi
tk                        8.6.12               h5d9f67b_0  
torch                     2.2.2                    pypi_0    pypi
tornado                   6.4             py310hb372a2b_0    conda-forge
tqdm                      4.66.2                   pypi_0    pypi
traitlets                 5.14.3             pyhd8ed1ab_0    conda-forge
typing_extensions         4.11.0             pyha770c72_0    conda-forge
tzdata                    2024.1                   pypi_0    pypi
urllib3                   2.2.1                    pypi_0    pypi
viennarna                 2.6.4                    pypi_0    pypi
wcwidth                   0.2.13             pyhd8ed1ab_0    conda-forge
werkzeug                  3.0.2                    pypi_0    pypi
wheel                     0.41.2          py310hecd8cb5_0  
wrapt                     1.16.0                   pypi_0    pypi
xz                        5.4.6                h6c40b1e_0  
zeromq                    4.3.5                h93d8f39_0    conda-forge
zipp                      3.17.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               h4dc903c_0  

I did not change anything in that environment after my reply above with conda create -n deepprime2 etc.

The environment where I have genet 0.14.0 installed & working looks like:

# packages in environment at /Users/francoiskroll/miniconda3/envs/deepprime:
#
# Name                    Version                   Build  Channel
abseil-cpp                20210324.2           h23ab428_0  
absl-py                   0.15.0             pyhd3eb1b0_0  
aiohttp                   3.9.3            py38h6c40b1e_0  
aiosignal                 1.2.0              pyhd3eb1b0_0  
appnope                   0.1.4              pyhd8ed1ab_0    conda-forge
asttokens                 2.4.1              pyhd8ed1ab_0    conda-forge
astunparse                1.6.3                      py_0  
async-timeout             4.0.3            py38hecd8cb5_0  
attrs                     23.1.0           py38hecd8cb5_0  
backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
biopython                 1.83                     pypi_0    pypi
blas                      1.0                         mkl  
blinker                   1.6.2            py38hecd8cb5_0  
brotli-python             1.0.9            py38he9d5cce_7  
c-ares                    1.19.1               h6c40b1e_0  
ca-certificates           2024.3.11            hecd8cb5_0  
cached-property           1.5.2                      py_0  
cachetools                4.2.2              pyhd3eb1b0_0  
certifi                   2024.2.2           pyhd8ed1ab_0    conda-forge
cffi                      1.16.0           py38h6c40b1e_0  
charset-normalizer        2.0.4              pyhd3eb1b0_0  
click                     8.1.7            py38hecd8cb5_0  
comm                      0.2.2              pyhd8ed1ab_0    conda-forge
cramjam                   2.8.3                    pypi_0    pypi
cryptography              41.0.3           py38ha2381d6_0  
debugpy                   1.8.1            py38h1f5f77c_0    conda-forge
decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
editdistance              0.8.1                    pypi_0    pypi
executing                 2.0.1              pyhd8ed1ab_0    conda-forge
fastparquet               0.8.3                    pypi_0    pypi
flatbuffers               1.12                     pypi_0    pypi
frozenlist                1.4.0            py38h6c40b1e_0  
fsspec                    2024.3.1                 pypi_0    pypi
gast                      0.4.0              pyhd3eb1b0_0  
genet                     0.14.1                   pypi_0    pypi
giflib                    5.2.1                h6c40b1e_3  
google-auth               2.6.0              pyhd3eb1b0_0  
google-auth-oauthlib      0.4.4              pyhd3eb1b0_0  
google-pasta              0.2.0              pyhd3eb1b0_0  
grpc-cpp                  1.39.1               h3acd2d4_1    conda-forge
grpcio                    1.39.0           py38h4924b5d_0    conda-forge
h5py                      3.1.0           nompi_py38h5142359_100    conda-forge
hdf5                      1.10.6               h10fe05b_1  
icu                       68.1                 h23ab428_0  
idna                      3.4              py38hecd8cb5_0  
importlib-metadata        7.0.1            py38hecd8cb5_0  
importlib_metadata        7.0.1                hd8ed1ab_0    conda-forge
intel-openmp              2023.1.0         ha357a0b_43548  
ipykernel                 6.29.3             pyh3cd1d5f_0    conda-forge
ipython                   8.12.2             pyhd1c38e8_0    conda-forge
jedi                      0.19.1             pyhd8ed1ab_0    conda-forge
jpeg                      9e                   h6c40b1e_1  
jupyter_client            8.6.1              pyhd8ed1ab_0    conda-forge
jupyter_core              5.7.2            py38h50d1736_0    conda-forge
keras                     2.6.0              pyhd3eb1b0_0  
keras-preprocessing       1.1.2              pyhd3eb1b0_0  
krb5                      1.20.1               hdba6334_1  
libcurl                   8.2.1                ha585b31_0  
libcxx                    16.0.6               hd57cbcb_0    conda-forge
libedit                   3.1.20230828         h6c40b1e_0  
libev                     4.33                 h9ed2024_1  
libffi                    3.4.4                hecd8cb5_0  
libgfortran               5.0.0           11_3_0_hecd8cb5_28  
libgfortran5              11.3.0              h9dfd629_28  
libnghttp2                1.52.0               h1c88b7d_1  
libpng                    1.6.39               h6c40b1e_0  
libprotobuf               3.16.0               hcf210ce_0    conda-forge
libsodium                 1.0.18               hbcb3906_1    conda-forge
libssh2                   1.10.0               hdb2fb19_2  
libzlib                   1.2.13               h8a1eda9_5    conda-forge
llvm-openmp               18.1.3               hb6ac08f_0    conda-forge
markdown                  3.4.1            py38hecd8cb5_0  
markupsafe                2.1.3            py38h6c40b1e_0  
matplotlib-inline         0.1.7              pyhd8ed1ab_0    conda-forge
mkl                       2023.1.0         h8e150cf_43560  
mkl-service               2.4.0            py38h6c40b1e_1  
mkl_fft                   1.3.8            py38h6c40b1e_0  
mkl_random                1.2.4            py38ha357a0b_0  
multidict                 6.0.4            py38h6c40b1e_0  
ncurses                   6.4                  hcec6c5f_0  
nest-asyncio              1.6.0              pyhd8ed1ab_0    conda-forge
numpy                     1.19.5           py38h3cdbb29_5  
numpy-base                1.19.5           py38hff596df_5  
oauthlib                  3.2.2            py38hecd8cb5_0  
openssl                   1.1.1w               h8a1eda9_0    conda-forge
opt_einsum                3.3.0              pyhd3eb1b0_1  
packaging                 23.2             py38hecd8cb5_0  
pandas                    1.4.4                    pypi_0    pypi
parso                     0.8.4              pyhd8ed1ab_0    conda-forge
perl                      5.32.1          0_h435f0c2_perl5  
pexpect                   4.9.0              pyhd8ed1ab_0    conda-forge
pickleshare               0.7.5                   py_1003    conda-forge
pip                       23.3.1           py38hecd8cb5_0  
platformdirs              3.10.0           py38hecd8cb5_0  
pooch                     1.7.0            py38hecd8cb5_0  
prompt-toolkit            3.0.42             pyha770c72_0    conda-forge
prompt_toolkit            3.0.42               hd8ed1ab_0    conda-forge
protobuf                  3.16.0           py38ha048514_0    conda-forge
psutil                    5.9.8            py38hae2e43d_0    conda-forge
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
pyarrow                   15.0.2                   pypi_0    pypi
pyasn1                    0.4.8              pyhd3eb1b0_0  
pyasn1-modules            0.2.8                      py_0  
pycparser                 2.21               pyhd3eb1b0_0  
pygments                  2.17.2             pyhd8ed1ab_0    conda-forge
pyjwt                     2.4.0            py38hecd8cb5_0  
pyopenssl                 23.2.0           py38hecd8cb5_0  
pysocks                   1.7.1                    py38_1  
python                    3.8.18               h218abb5_0  
python-dateutil           2.9.0.post0              pypi_0    pypi
python_abi                3.8                      2_cp38    conda-forge
pytz                      2024.1                   pypi_0    pypi
pyzmq                     26.0.0           py38hf69f452_0    conda-forge
re2                       2021.09.01           he49afe7_0    conda-forge
readline                  8.2                  hca72f7f_0  
regex                     2024.4.16                pypi_0    pypi
requests                  2.31.0           py38hecd8cb5_1  
requests-oauthlib         1.3.0                      py_0  
rsa                       4.7.2              pyhd3eb1b0_1  
scipy                     1.10.1           py38hf241641_1  
setuptools                68.2.2           py38hecd8cb5_0  
silence-tensorflow        1.2.1                    pypi_0    pypi
six                       1.15.0             pyhd3eb1b0_0  
snappy                    1.1.10               hcec6c5f_1  
sqlite                    3.41.2               h6c40b1e_0  
stack_data                0.6.2              pyhd8ed1ab_0    conda-forge
support-developer         1.0.5                    pypi_0    pypi
tbb                       2021.8.0             ha357a0b_0  
tensorboard               2.11.0                   py38_0  
tensorboard-data-server   0.6.1            py38h7242b5c_0  
tensorboard-plugin-wit    1.6.0                      py_0  
tensorflow                2.6.0            py38h52b2510_1    conda-forge
tensorflow-base           2.6.0            py38h1615122_1    conda-forge
tensorflow-estimator      2.6.0            py38h02c4698_1    conda-forge
termcolor                 1.1.0            py38hecd8cb5_1  
tk                        8.6.12               h5d9f67b_0  
torch                     1.11.0                   pypi_0    pypi
tornado                   6.4              py38hae2e43d_0    conda-forge
tqdm                      4.66.2                   pypi_0    pypi
traitlets                 5.14.3             pyhd8ed1ab_0    conda-forge
typing_extensions         3.7.4.3            pyha847dfd_0  
urllib3                   2.1.0            py38hecd8cb5_1  
viennarna                 2.6.4           py38pl5321hda9a618_0    bioconda
wcwidth                   0.2.13             pyhd8ed1ab_0    conda-forge
werkzeug                  2.3.8            py38hecd8cb5_0  
wheel                     0.41.2           py38hecd8cb5_0  
wrapt                     1.12.1           py38haf1e3a3_1  
xz                        5.4.6                h6c40b1e_0  
yarl                      1.9.3            py38h6c40b1e_0  
zeromq                    4.3.5                h93d8f39_0    conda-forge
zipp                      3.17.0           py38hecd8cb5_0  
zlib                      1.2.13               h8a1eda9_5    conda-forge

So I would guess tensorflow 2.9.3 (cannot import properly) vs. tensorflow 2.6.0 (works with genet 0.14.1) has something to do with it... Note that Python is different too.

Does that help?

I think this issue seems to be a separate bug, so I've opened a new issue.
Can we continue the discussion in #92?

Thank you!