CalabreseLab/seekr

seekr_domain_pearson percentile function overwrites data in rows with identical names

Opened this issue · 6 comments

Leaving note for future fix -- seekr_domain_pearson percentile function overwrites data in rows with identical names; r value output does not overwrite data in rows with identical names

@jmcalabr Would it be easy for you to upload a small dataset that triggers this bug? Also, which function (or command-line call) was run? seekr has a few percentile functions: percentileofscore, calc_percentiles and calc_internal_percentiles.

Hey Jessime, thanks! abosolutely nothing critical, but pasting data to use below. For the percentile to work in seekr_domain_pearson, I think you would need to pass an additional large set of sequences, like all gencode lncRNAs. Also, I am not aware/have forgotten about those other percentile functions -- are they documented in the help file/description for seeks-command-line? If not, that would be another tiny item to update.

seekr_domain_pearson testdata.fa testseq.fa mean4.npy std4.npy -rp gencode.vM25.lncRNA_transcripts.fa -k 4 -r r_values.csv -p percentiles.csv -s 40 -w 400

testdata.fa

5p_tarSL_1_407_407
CTTCAGAGTGCGCGAACATGAAGCACAGAACCCACCAGGGCATAGAGACTCAAAACTCCGGAGTGCGTAATACGCCCTCCCGCACGTGCGTTTGGCAAATTATCATTGGATATTAGAGAGCCCCACGCATAACAAGTTACCCACCAACGTCCCTGGTCCACTTAAATCATGACGATGTGTCGGGCAACGTTAGAATGGAATGGTATGTCGGATGCCCGCGAAGACGGGGGGATTAGGGTTAATGTCAGATGCTTACCCGACGTGCATCCATGTCGGTTGCGTACCTGAAAGCGGGTCGTCAGGAATTGAGAATCAGGCCCAAAGGATGATATCCAGGATCCACCGATATGGCTTACCGGTGGTTATTGTTAGTCGCCATCTGGCCTTGGGCCATGAGGTAGCTCGCA
5p_tarSL_1_407_407
CTGCGTCGAATCAGGTTTTCTACGGCACGAAGGTAGGTTGTTAAGCACGGTGTGGGCGAGTGGAGACTTAGGTACGACAAGGGACAGCAGCCAAAATGCACGTGTCACCGTCGGTACAACTACCTATCACGTGGTAACGCTTCAACTAAGCCATTTACTTAAAGAAGAAGAATCCCTTTCTTGTTTCGTAGTTCGTCTCATGTCTCCGTACGGTCGAAGGCTGCAAGTGAACTGACACTTACATAATGAGCAAAATCGTGTTATGCGACAGCGATACCTTAGGAAAGTAAGGTCACAAAATGAATAGATGCATGTGTGGGGGACATATGACAAGCACTGTTGATGATTCAGCCTCGCAGCAGAATGTTGGTGGCGATGTCTCGTACCGCTAATTGTCCTTCGACATT
5p_tarSL_1_407_407
ATTGACGGAACTCTGGTGTGAAATCCAGGACGAAACCACCGTTAGGCCCGTACAATTCTGGAGGCAGGCTCTTACTGAATCGGCTAACGTAGTCGAAACTAAAGTCACGCACTATTCCAAAGGGATCTCATAAAGCATGAAACATAACCTTCGGCGACATTGGGCGCAGTTACGCATACGTATAGAAAGTCCTCTCTGGCTATGCGTTCGTCTTGAGGGATAGGCTGAAAGTCCCCATCTTCAGTAAAAAATCTAGGTTTAGAAGAGTTCCGACGGGCATGGGCCGAGTACCGACCGCTGCACGGTGGCTGCAAGCTGGCCCCTAGTTGGATCGTGCGCTCTCTCTAGTGCTGGTGACCGCAAATATATGAGAAATAGCTAGCTCCGCGGAACCATGACTGTACGGT
5p_tarSL_1_407_407
TGCGAAAAACTAAGTTGAGTCTTTCTTGGTGGGAATGGACTCTTGCACTGAATGGGAAACGTAGATACCTTGGGTACCACCCAGGGGAGGTACTAAGCCTGTGCGTGGAACCTTAACGAAGATAGGAAAGTTATTCGTGTGTGATGTCATGGACCCGGAAAAATAGACCTCCAAGCATGGTATCCTAGGTGGTTATCCTTCCTATTCTGTGCTGCATAGGAGTCCATTGCCATAAGGGAAAGAAATAGCAGGGTTAGCGTCGATGGGATAAGAAGACCGCGCAGTACCGGAGTTCAGAGAGTCAGGAACAAACATCCGCTGGGATTGAGCCGGGGGAAAGGTCGGGACGAAGATAAGCCATGGGAGGAGGGGAATCACTTCACCTGGGGAAGCCAAGCAAACACTCG
5p_tarSL_1_407_407
GGAAAACGGACGTGATGAAAGGGAGAGAGGAGTGAACAGAGTCACGCTAGGCATCAGCAGCTAGTCGCCGGGCCCCGCACTTCGAATCAGGAGTGGCTTACGCGGGATTGGATATCCGTTGCAATGGTCACTATGAAGATTATATCTCGAGACGGTCGACATCAAAAACAGAGCATTTAACCCGTACAGTGCGTGCTACGTAGACGCCGAACCTATCCTCAATAGGACTATGGTTGTTGCGCCGAAGAAATCTAGCGGAGGGGTAAAGTGTAAGTAGCAAAGATGAGCGCTCAAATTGTGCCATTTACCGGAACTTATTCGCGCTGGTGTCCGATGTACTCGCCTAGGTACTTTGATAGCTGGCTCCCTTGAGGATGCATCTCGGGCATAGCATGCAAATTTGCGGG

testseq.fa

full_length_1_9173_9173
GGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTGGCTAACTAGGGAACCCACTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTCAAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAAAGTAAAGCCAGAGGAGATCTCTCGACGCAGGACTCGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGGCGGCGACTGGTGAGTACGCCAAAAATTTTGACTAGCGGAGGCTAGAAGGAGAGAGATGGGTGCGAGAGCGTCGGTATTAAGCGGGGGAGAATTAGATAAATGGGAAAAAATTCGGTTAAGGCCAGGGGGAAAGAAACAATATAAACTAAAACATATAGTATGGGCAAGCAGGGAGCTAGAACGATTCGCAGTTAATCCTGGCCTTTTAGAGACATCAGAAGGCTGTAGACAAATACTGGGACAGCTACAACCATCCCTTCAGACAGGATCAGAAGAACTTAGATCATTATATAATACAATAGCAGTCCTCTATTGTGTGCATCAAAGGATAGATGTAAAAGACACCAAGGAAGCCTTAGATAAGATAGAGGAAGAGCAAAACAAAAGTAAGAAAAAGGCACAGCAAGCAGCAGCTGACACAGGAAACAACAGCCAGGTCAGCCAAAATTACCCTATAGTGCAGAACCTCCAGGGGCAAATGGTACATCAGGCCATATCACCTAGAACTTTAAATGCATGGGTAAAAGTAGTAGAAGAGAAGGCTTTCAGCCCAGAAGTAATACCCATGTTTTCAGCATTATCAGAAGGAGCCACCCCACAAGATTTAAATACCATGCTAAACACAGTGGGGGGACATCAAGCAGCCATGCAAATGTTAAAAGAGACCATCAATGAGGAAGCTGCAGAATGGGATAGATTGCATCCAGTGCATGCAGGGCCTATTGCACCAGGCCAGATGAGAGAACCAAGGGGAAGTGACATAGCAGGAACTACTAGTACCCTTCAGGAACAAATAGGATGGATGACACATAATCCACCTATCCCAGTAGGAGAAATCTATAAAAGATGGATAATCCTGGGATTAAATAAAATAGTAAGAATGTATAGCCCTACCAGCATTCTGGACATAAGACAAGGACCAAAGGAACCCTTTAGAGACTATGTAGACCGATTCTATAAAACTCTAAGAGCCGAGCAAGCTTCACAAGAGGTAAAAAATTGGATGACAGAAACCTTGTTGGTCCAAAATGCGAACCCAGATTGTAAGACTATTTTAAAAGCATTGGGACCAGGAGCGACACTAGAAGAAATGATGACAGCATGTCAGGGAGTGGGGGGACCCGGCCATAAAGCAAGAGTTTTGGCTGAAGCAATGAGCCAAGTAACAAATCCAGCTACCATAATGATACAGAAAGGCAATTTTAGGAACCAAAGAAAGACTGTTAAGTGTTTCAATTGTGGCAAAGAAGGGCACATAGCCAAAAATTGCAGGGCCCCTAGGAAAAAGGGCTGTTGGAAATGTGGAAAGGAAGGACACCAAATGAAAGATTGTACTGAGAGACAGGCTAATTTTTTAGGGAAGATCTGGCCTTCCCACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGACCAGAGCCAACAGCCCCACCAGAAGAGAGCTTCAGGTTTGGGGAAGAGACAACAACTCCCTCTCAGAAGCAGGAGCCGATAGACAAGGAACTGTATCCTTTAGCTTCCCTCAGATCACTCTTTGGCAGCGACCCCTCGTCACAATAAAGATAGGGGGGCAATTAAAGGAAGCTCTATTAGATACAGGAGCAGATGATACAGTATTAGAAGAAATGAATTTGCCAGGAAGATGGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTATCAAAGTAAGACAGTATGATCAGATACTCATAGAAATCTGCGGACATAAAGCTATAGGTACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGATTGGCTGCACTTTAAATTTTCCCATTAGTCCTATTGAGACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCAAAAGTTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAGTAGAAATTTGTACAGAAATGGAAAAGGAAGGAAAAATTTCAAAAATTGGGCCTGAAAATCCATACAATACTCCAGTATTTGCCATAAAGAAAAAAGACAGTACTAAATGGAGAAAATTAGTAGATTTCAGAGAACTTAATAAGAGAACTCAAGATTTCTGGGAAGTTCAATTAGGAATACCACATCCTGCAGGGTTAAAACAGAAAAAATCAGTAACAGTACTGGATGTGG

Mauro, I don't get notifications whenever issues are opened up here, unfortunately. Did Jessime ever get to this bug? I'd be happy to take a looksie

Hey Dan,

  1. Nope, haven't gotten around to fixing this.
  2. If you do want to get email notifications (no pressure to), you can click the Watch button on the top right.

lmao how many years of using git and I never knew that. I could give this a look and pass along to you for final approval/style editing :)

Sure; I'd be happy to review a PR if you put one up