milaboratory/mixcr

Custom Reference error parsing fasta: Unknown letter '>'

Closed this issue · 1 comments

I am trying to generate my own reference by running the following code:
buildLibrary --debug -f --v-genes-from-fasta IGHV.fasta --v-gene-feature VRegion --j-genes-from-fasta IGHJ.fasta --d-genes-from-fasta IGHD.fasta --c-genes-from-fasta IGHC.fasta --chain IGH --taxon-id 29078 --species efuscus efuscus-IGH.json.gz
I have exported the reference fasta directly from Geneious Prime, checked that there are no spurious line changes, and copied it directly into a new file from a text editor, and every time the software seems to be having trouble parsing the fasta. If I remove the lines that cause issues, it moves to the next entry. I have rearranged the genes and it "picks" a new problem line; I have also tried shortening the file and it picks another problem line.

Here is the information on the error:

Version: 4.6.0; built=Sat Dec 09 11:48:42 PST 2023; rev=c9fafa41fe; lib=repseqio.v4.0
OS: Linux
Java: 11.0.13
 
picocli.CommandLine$ExecutionException: Error while running command buildLibrary java.lang.IllegalArgumentException: Can't get feature [[L2EndFR1Begin, VEnd]] from Chr24_pIGHV1-2a
        at com.milaboratory.mixcr.cli.Main.registerExceptionHandlers$lambda-12(SourceFile:395)
        at picocli.CommandLine.execute(CommandLine.java:2088)
        at com.milaboratory.mixcr.cli.Main.main(SourceFile:101)
Caused by: java.lang.IllegalArgumentException: Can't get feature [[L2EndFR1Begin, VEnd]] from Chr24_pIGHV1-2a
        at com.milaboratory.o.Gh.a(SourceFile:49)
        at com.milaboratory.o.Gh.getFeature(SourceFile:44)
        at io.repseq.core.VDJCGene.getFeature(SourceFile)
        at io.repseq.cli.InferAnchorPointsAction.go(SourceFile:158)
        at io.repseq.cli.InferAnchorPointsAction.go(SourceFile:61)
        at io.repseq.cli.jcomander.JCommanderBasedMain.main(SourceFile:157)
        at io.repseq.cli.Main.main(SourceFile:94)
        at com.milaboratory.mixcr.cli.CommandBuildLibrary.fromFasta(SourceFile:367)
        at com.milaboratory.mixcr.cli.CommandBuildLibrary.mkLibraryForGeneType(SourceFile:312)
        at com.milaboratory.mixcr.cli.CommandBuildLibrary.run1(SourceFile:381)
        at com.milaboratory.mixcr.cli.MiXCRCommandWithOutputs.run0(SourceFile:69)
        at com.milaboratory.mixcr.cli.MiXCRCommand.run(SourceFile:37)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1939)
        at picocli.CommandLine.access$1300(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2352)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2314)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
        at picocli.CommandLine$RunLast.execute(CommandLine.java:2316)
        at com.milaboratory.mixcr.cli.Main.registerLogger$lambda-27(SourceFile:514)
        at picocli.CommandLine.execute(CommandLine.java:2078)
        ... 1 more
Caused by: java.lang.IllegalArgumentException: Unknown letter '>'
        at com.milaboratory.core.sequence.Alphabet.symbolToCodeWithException(SourceFile:266)
        at com.milaboratory.o.bg.a(SourceFile:139)
        at com.milaboratory.o.bg.a(SourceFile:35)
        at com.milaboratory.o.bh.getRegion(SourceFile:107)
        at com.milaboratory.o.ca$a.getRegion(SourceFile:124)
        at com.milaboratory.o.bX.a(SourceFile:119)
        at com.milaboratory.o.bX.getRegion(SourceFile:167)
        at com.milaboratory.o.FX.getSequence(SourceFile:1039)
        at com.milaboratory.o.FK.getFeature(SourceFile:48)
        at io.repseq.core.PartitionedSequenceCached.getFeature$lambda-0(SourceFile:32)
        at java.base/java.util.HashMap.computeIfAbsent(HashMap.java:1134)
        at io.repseq.core.PartitionedSequenceCached.getFeature(SourceFile:31)
        at com.milaboratory.o.Gf.getFeature(SourceFile:1000)
        at com.milaboratory.o.Gh.a(SourceFile:47)
        ... 21 more

I solved this problem. I had to remove all of the pseudogenes, add an empty line at the end of my fasta files, AND make sure to remove the .mifdx files between EACH new attempt (i.e. the -f or --force is NOT enough; it will overwrite the final library but won't redo the index).