baoilleach/deepsmiles

incorrect closure decoding when N>=100

Closed this issue · 2 comments

The following meaningless DeepSMILES:

Bbbbbb2522222522534b52522534bbb25222225225342522534b52b2522222522534b52522534bbb25222225225342522534b5252b52bbbb2522222522534b5252b5b6

converts to:

Bb28%11b%13%14%16%19b%12%21b1345679%10%20b123456789%10%11%12%13%15%17%18%23%29%32%36%39b%14%15%16%17%18%19%20%21%34%41%42b%33%40%45%51%54b%22%24%25%26%27%28%30%31%35%37%38%56%57%59%62b%22%23%24%25%26%27%28%29%30%31%32%33%34%35%36%37%38%39%40%41%43%55%64b%42%43%44%46%47%48%49%50%52%53%63b%44%45%46%47%48%49%50%51%52%53%54%55%56%58%60%61%66%72%75%79%82b%57%58%59%60%61%62%63%64%77%84%85%87b%76%83%89b%65%67%68%69%70%71%73%74%78%80%81b%65%66%67%68%69%70%71%72%73%74%75%76%77%78%79%80%81%82%83%84%86%88b%85%86%87%88%90b%89%90%92%98%101b%103%104%106b%102%108%109b%91%93%94%95%96%97%99%100b%91%92%93%94%95%96%97%98%99%100%101%102%103%105%107b%104%105%106%107b%108b%109

This shows that decoder doesn't handle %(NNN) closures correctly, when NNN>=100. For example, the last four characters - "%109" - should be "%(109)".

This is likely because of this line in decode.py:

           smi_bcsymbol = "%d" % digit if digit < 10 else "%%%d" % digit

which should likely be something more like:

           if digit < 10:
             smi_bcsymbol = str(digit)
           elif digit < 100:
             smi_bcsymbol = "%" + str(digit)
           else:
             smi_bcsymbol = "%(" + str(digit) + ")"

Will do. Thanks.

Done.