Extraction of clonal count from IMGT airr output

Question

Extraction of clonal count from IMGT airr output

AnnaSurace opened this issue a year ago · 2 comments

❓ Questions and Help

We have a set of listed tutorials available on the website.

Hi Immunarch team,

I have IMGT output data from bulk BCR-sequencing in the AIRR format which overall gets nicely imported with your package. However, it doesn't recognize the number of clones. IMGT has the number "hidden" in the sequence ID (example M0148812000000000K54C3111182268314671__38_0_0_0_0) meaning there are 38 reads for this particular clone.
Have I missed anything in your documentation or is your package at the moment not recognizing this?

Thank you for your help.
Best wishes,
Anna

Answer 1 · 2023-10-23T18:47:22.000Z

Hi Anna,

Thank you for using Immunarch! We don't recognize this, and I don't think we will recognize this. The AIRR ecosystem is mature enough, and there is a fantastic AIRR Data Standard format specification, along with Python and R packages to write and read it. The creators of software tools for analyzing raw sequencing data should support the AIRR data format. I think it is important for both the ecosystem and productive research. What you can do, is:

Write to IMGT developers to fully support the AIRR format;
Write a script to extract clonal count, update the IMGT files, and read them into Immunarch.

I'm sorry for this inconvenience, but we either fix data input/output problems, which are the responsibilities of upstream analysis tool developers, or we focus on moving Immunarch and downstream analytics forward. In the future, Immunarch will support the AIRR standard data format only.

More information on the future of Immunarch is here: https://b-t.cr/t/immunarch-will-significantly-evolve-but-it-will-break-things-and-we-need-your-help/1123

Answer 2 · 2023-10-24T08:22:42.000Z

Hi Vadim, Thank you for getting back to me and for providing Immunarch to the community. Your visions for it are great and I am looking forward to its future developments. Of course that's perfectly fine and I just extract it myself. It is best to use an AIRR Data Standard, however I wasn't sure if you wanted IMGT as the gold standard of annotation to be working with the pipeline as it is. If I may just point you to a bug in the germline.R code, while I am at it. Your function to generate the germline sequence calculate_germlines_parallel <- function(data, threads, sample_name) { if (threads == 1) { cluster <- NA } else { cluster <- makeCluster(threads) clusterExport(cluster, c("generate_germline_sequence", "align_and_find_j_start", "sample_name"), envir = environment() ) } currently doesn't accept several threads as when it creates the thread clusters it is "missing the object generate germline sequence" and will not run. Best wishes, Anna From: Vadim I. Nazarov ***@***.***> Sent: Monday, October 23, 2023 7:48 PM To: immunomind/immunarch ***@***.***> Cc: Anna Surace ***@***.***>; Author ***@***.***> Subject: Re: [immunomind/immunarch] Extraction of clonal count from IMGT airr output (Issue #382) Hi Anna, Thank you for using Immunarch! We don't recognize this, and I don't think we will recognize this. The AIRR ecosystem is mature enough, and there is a fantastic AIRR Data Standard format specification, along with Python and R packages to write and read it. The creators of software tools for analyzing raw sequencing data should support the AIRR data format. I think it is important for both the ecosystem and productive research. What you can do, is: 1. Write to IMGT developers to fully support the AIRR format; 2. Write a script to extract clonal count, update the IMGT files, and read them into Immunarch. I'm sorry for this inconvenience, but we either fix data input/output problems, which are the responsibilities of upstream analysis tool developers, or we focus on moving Immunarch and downstream analytics forward. In the future, Immunarch will support the AIRR standard data format only. More information on the future of Immunarch is here: https://b-t.cr/t/immunarch-will-significantly-evolve-but-it-will-break-things-and-we-need-your-help/1123 - Reply to this email directly, view it on GitHub<#382 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AQS6VMGC55TKHAT6EBNA2FTYA23UJAVCNFSM6AAAAAA6DQBBHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZVHAYTEOJQGY>. You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>