lmdu/pyfastx

Segmentation fault using .composition on each contig

nextgenusfs opened this issue · 2 comments

Thanks @lmdu for the great tool. I noticed I'm getting a segmentation fault when trying to call the .composition function on individual contigs. For comparison .gc_content works fine in this context and even more strangely is that .composition works on the whole sequence. I've seen the behavior with multiple multi-fasta sequences. There is nothing too remarkable about the genomes/multi-fasta sequences -- they do have regions of soft masked bases (lowercase).

Python 3.9.13 | packaged by conda-forge | (main, May 27 2022, 17:01:00) 
[Clang 13.0.1 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyfastx
>>> pyfastx.__version__
'1.1.0'
>>> fa = pyfastx.Fasta('CBS-122913.masked.fasta')
>>> len(fa)
115
>>> fa.keys()
<FastaKeys> contains 115 keys
>>> fa.composition
{'A': 5728672, 'C': 5224569, 'G': 5230753, 'N': 397, 'T': 5735224}
>>> for f in fa.keys():
...     contig = fa[f]
...     print(contig.name, contig.gc_content)
... 
scaffold_1 48.69263458251953
scaffold_2 48.33203887939453
scaffold_3 47.254817962646484
scaffold_4 48.95073699951172
scaffold_5 48.34445571899414
scaffold_6 48.24189376831055
scaffold_7 48.847633361816406
scaffold_8 49.396209716796875
scaffold_9 49.21478271484375
scaffold_10 47.9789924621582
scaffold_11 48.8577766418457
scaffold_12 47.94032287597656
scaffold_13 48.50809860229492
scaffold_14 49.025367736816406
scaffold_15 47.12371063232422
scaffold_16 48.617897033691406
scaffold_17 48.09880447387695
scaffold_18 47.51862716674805
scaffold_19 48.00046157836914
scaffold_20 48.72396469116211
scaffold_21 48.74768829345703
scaffold_22 47.80195617675781
scaffold_23 48.13816452026367
scaffold_24 47.15768814086914
scaffold_25 47.371437072753906
scaffold_26 43.535911560058594
scaffold_27 47.332523345947266
scaffold_28 48.37013626098633
scaffold_29 46.33742904663086
scaffold_30 47.933868408203125
scaffold_31 49.608646392822266
scaffold_32 48.665618896484375
scaffold_33 49.433197021484375
scaffold_34 48.782493591308594
scaffold_35 48.834434509277344
scaffold_36 43.52204895019531
scaffold_37 49.82286834716797
scaffold_38 47.51904296875
scaffold_39 49.28739929199219
scaffold_40 45.94437789916992
scaffold_41 48.9925537109375
scaffold_42 48.93977737426758
scaffold_43 47.75377655029297
scaffold_44 36.31761169433594
scaffold_45 49.05009078979492
scaffold_46 47.420738220214844
scaffold_47 47.54635238647461
scaffold_48 48.67685317993164
scaffold_49 47.861183166503906
scaffold_50 47.46112060546875
scaffold_51 33.48347091674805
scaffold_52 49.78142547607422
scaffold_53 33.104854583740234
scaffold_54 41.71234130859375
scaffold_55 45.80345916748047
scaffold_56 52.756893157958984
scaffold_57 46.658477783203125
scaffold_58 55.15912628173828
scaffold_59 51.0510368347168
scaffold_60 39.74724197387695
scaffold_61 43.557743072509766
scaffold_62 34.11146545410156
scaffold_63 38.85691452026367
scaffold_64 35.173377990722656
scaffold_65 36.18036651611328
scaffold_66 34.77259826660156
scaffold_67 48.49277877807617
scaffold_68 36.5165901184082
scaffold_69 36.67202377319336
scaffold_70 36.398834228515625
scaffold_71 36.51321792602539
scaffold_72 35.740966796875
scaffold_73 34.044681549072266
scaffold_74 32.13038635253906
scaffold_75 33.51669692993164
scaffold_76 34.7519416809082
scaffold_77 34.39479064941406
scaffold_78 35.233097076416016
scaffold_79 42.18278121948242
scaffold_80 29.303022384643555
scaffold_81 33.31261444091797
scaffold_82 47.63681411743164
scaffold_83 38.0119514465332
scaffold_84 35.35012435913086
scaffold_85 35.09886932373047
scaffold_86 34.42961502075195
scaffold_87 59.021995544433594
scaffold_88 36.12348175048828
scaffold_89 34.69807434082031
scaffold_90 35.4184455871582
scaffold_91 33.65752029418945
scaffold_92 32.3078727722168
scaffold_93 33.73524856567383
scaffold_94 37.416481018066406
scaffold_95 38.65781784057617
scaffold_96 45.47086715698242
scaffold_97 34.915313720703125
scaffold_98 36.34886169433594
scaffold_99 35.82434844970703
scaffold_100 34.24390411376953
scaffold_101 32.18368148803711
scaffold_102 47.853271484375
scaffold_103 61.5595703125
scaffold_104 38.80778503417969
scaffold_105 45.805625915527344
scaffold_106 35.68025588989258
scaffold_107 48.58888626098633
scaffold_108 33.502689361572266
scaffold_109 34.21965408325195
scaffold_110 35.85944366455078
scaffold_111 36.659664154052734
scaffold_112 36.92753601074219
scaffold_113 37.22358703613281
scaffold_114 20.990991592407227
scaffold_115 52.25225067138672
>>> for f in fa.keys():
...     contig = fa[f]
...     print(contig.name, contig.composition)
... 
Segmentation fault: 11
lmdu commented

Thank you for reporting this issue. I will fix it in new version.

lmdu commented

Try the new version 2.0.0. We have added support more characters in sequence.