ambiguity in clonotype assignment

Question

ambiguity in clonotype assignment

lincoln-harris opened this issue 6 years ago · 7 comments

Hi
@sdarmanis and I have been digging into the sequence level alignment of cells that are assigned to the same clonogroup and have been noticing some strange things. It seems that the CDR3 sequence doesnt have to be a perfect match for two cells to be assigned to the same clonogroup. For example, in this figure all of these cells have been assigned to the same clonogroup, yet the bottom 3 have very different CDR3 sequence than the rest (apologies for the poor resolution)

Why is Tracer assigning these to the same clonogroup?
Thanks
Lincoln

Answer 1 · 2018-11-30T21:41:14.000Z

Hi Lincoln, Please could you send me the tracer output directories for these cells? Thanks, Mike

…

On 30 Nov 2018, at 21:39, Lincoln Harris ***@***.***> wrote: Hi @sdarmanis <https://github.com/sdarmanis> and I have been digging into the sequence level alignment of cells that are assigned to the same clonogroup and have been noticing some strange things. It seems that the CDR3 sequence doesnt have to be a perfect match for two cells to be assigned to the same clonogroup. For example, in this figure all of these cells have been assigned to the same clonogroup, yet the bottom 3 have very different CDR3 sequence than the rest (apologies for the poor resolution) <https://user-images.githubusercontent.com/33501625/49316061-94057680-f4a4-11e8-800b-a0e41eb09ee3.png> Why is Tracer assigning these to the same clonogroup? Thanks Lincoln — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#82>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABFwhm5MuanAHFtS6jkueMQcewG4yTkaks5u0aWPgaJpZM4Y8bzd>.

Answer 2 · 2018-11-30T22:10:04.000Z

yep, output folder is here
https://github.com/czbiohub/sclung_adeno/tree/master/TCR_analysis/filtered_TCRAB_summary
its a private repo but i just gave you read access, i think

Answer 3 · 2018-11-30T22:14:47.000Z

Yep, got it. Thanks. What are the cell names of those cells in the alignment you sent originally. If you send me those, I can dig into the summary output and see if anything obvious stands out. If not, I will probably need the per-cell TraCeR output directories so that I can look at the intermediate outputs from the alignments, assembly and parsing stages. I'll let you know if that's the case. Cheers, Mike

…

On 30 Nov 2018, at 22:10, Lincoln Harris ***@***.***> wrote: yep, output folder is here https://github.com/czbiohub/sclung_adeno/tree/master/TCR_analysis/filtered_TCRAB_summary <https://github.com/czbiohub/sclung_adeno/tree/master/TCR_analysis/filtered_TCRAB_summary> its a private repo but i just gave you access, i think — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#82 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABFwhsFmUyrPIvkkJrPKcwLkZgUq1Kcnks5u0ay8gaJpZM4Y8bzd>.

Answer 4 · 2018-11-30T23:00:26.000Z

yep, its clonogroup 1, so the cell names are:

The odd ones out are K20_B003659, E22_B003659 and G21_B000883. Im noticing that even the flanking V and J regions are different within these 'odd' cells.
Thanks a lot!

Answer 5 · 2018-12-03T11:06:27.000Z

Hi, I've had a look at this and it helps to look at the `clonotype_network_with_identifiers.pdf` network graph (https://github.com/czbiohub/sclung_adeno/blob/master/TCR_analysis/filtered_TCRAB_summary/clonotype_network_with_identifiers.pdf <https://github.com/czbiohub/sclung_adeno/blob/master/TCR_analysis/filtered_TCRAB_summary/clonotype_network_with_identifiers.pdf>) to work out what's going on. Here you can see how the cells connect to each other within the clonogroup 1 subgraph. Of the 'odd' cells E22_B003659 and K20_B003659 look to be genuinely clonally related to each other and share both an alpha and a beta sequence. However, E22_B003659 doesn't share sequences with *any* other cell in that subgraph. It gets connected to the rest because K20_B003659 shares another beta sequence with K22_B003659 which itself shares a *different* betas sequence with a lot of the rest of the graph (it's really easier to look at this in the PDF rather than try to explain it!). It is, of course, hard to say whether this is genuine sharing or low-level experimental contamination (or some other artefact) so interpret it with caution. G21_B00083 has the TRBV27 beta sequence that's common throughout this subgraph but also has another TRBV sequence that isn't seen in any of the other cells. It also has two alpha sequences that aren't seen anywhere else - this could be biologically explained because beta recombines first during T cell development followed by rounds of proliferation before alpha recombines in the progeny. This means that it is not unexpected to see cells with the same beta but different alphas. More generally, this is an example of TraCeR's permissive rules around grouping cells into clonotypes where any shared sequences will suck cells into a subgraph. When we wrote this, experiments were of the scale where it was tractable to inspect the network graphs to check for things like this although it's now apparent that this is not so easy with larger experiments. I'd be happy to accept any pull requests that make changes to add options improving these representations. If you're interested in doing that, have a look at the `Summariser` class (https://github.com/Teichlab/tracer/blob/master/tracerlib/tasks.py#L643 <https://github.com/Teichlab/tracer/blob/master/tracerlib/tasks.py#L643>) which calls `tracer_func.draw_network_from_cells` (https://github.com/Teichlab/tracer/blob/master/tracerlib/tracer_func.py#L813 <https://github.com/Teichlab/tracer/blob/master/tracerlib/tracer_func.py#L813>) . This constructs the graphs using NetworkX (v1, https://networkx.github.io/documentation/networkx-1.1/ <https://networkx.github.io/documentation/networkx-1.1/>) with each node being an object of class `Cell` (https://github.com/Teichlab/tracer/blob/master/tracerlib/core.py#L10 <https://github.com/Teichlab/tracer/blob/master/tracerlib/core.py#L10>). To make clonotype definitions more stringent you'd probably want to change the rules about how edges are added to the graphs. Hope that's helpful. Let me know if you want to discuss anything else. All the best, Mike

…

On 30 Nov 2018, at 23:00, Lincoln Harris ***@***.***> wrote: yep, its clonogroup 1, so the cell names are: <https://user-images.githubusercontent.com/33501625/49319152-527ac880-f4b0-11e8-9e05-18fd5360ab10.png> The odd ones out are K20_B003659, E22_B003659 and G21_B000883. Im noticing that even the flanking V and J regions are different within these 'odd' cells. Thanks a lot! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#82 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABFwhqcqVN6CnhDKZGbF-Owaok0zGXRaks5u0biLgaJpZM4Y8bzd>.

Answer 6 · 2018-12-03T21:32:28.000Z

Thanks a lot. So maybe a worthwhile modification is to define a clonogroup as only those cells that share an A and B? Or otherwise accept that when dealing with clonogroups of this size, youre going to see some messiness.

Answer 7 · 2018-12-04T10:42:52.000Z

No problem.

I think that the best way to do it would be to have an option that sets the level of stringency required for assigning cells to clonotypes as they are reported in the summary.

Ideally this would be coupled to better reporting and visualisation to make it easier to assess the structure of the clonotypes and how tenuously cells are connected to them.