immunomind/immunarch

Extraction of clonal count from IMGT airr output

AnnaSurace opened this issue · 2 comments

❓ Questions and Help

We have a set of listed tutorials available on the website.

Hi Immunarch team,

I have IMGT output data from bulk BCR-sequencing in the AIRR format which overall gets nicely imported with your package. However, it doesn't recognize the number of clones. IMGT has the number "hidden" in the sequence ID (example M0148812000000000K54C3111182268314671__38_0_0_0_0) meaning there are 38 reads for this particular clone.
Have I missed anything in your documentation or is your package at the moment not recognizing this?

Thank you for your help.
Best wishes,
Anna

Hi Anna,

Thank you for using Immunarch! We don't recognize this, and I don't think we will recognize this. The AIRR ecosystem is mature enough, and there is a fantastic AIRR Data Standard format specification, along with Python and R packages to write and read it. The creators of software tools for analyzing raw sequencing data should support the AIRR data format. I think it is important for both the ecosystem and productive research. What you can do, is:

  1. Write to IMGT developers to fully support the AIRR format;
  2. Write a script to extract clonal count, update the IMGT files, and read them into Immunarch.

I'm sorry for this inconvenience, but we either fix data input/output problems, which are the responsibilities of upstream analysis tool developers, or we focus on moving Immunarch and downstream analytics forward. In the future, Immunarch will support the AIRR standard data format only.

More information on the future of Immunarch is here: https://b-t.cr/t/immunarch-will-significantly-evolve-but-it-will-break-things-and-we-need-your-help/1123