Determine which receptor arms to define clones by
Closed this issue · 7 comments
Is your feature request related to a problem?
I'm working on a plasma cell pathology where we expect lots of the clones to have orphan VJ chains, and we'd like to use BCRseq to trace expanded clones. Unfortunately, Dandelion misses these because they don't have a functional BCR.
Describe the solution you'd like
I'd like another parameter for define_clones, e.g. receptor_arms="both"
, which can take values "VJ", "VDJ", "both" (the latter being the default).
I'm still trying to parse through the source of tl.define_clones but I think all the necessary parts should lie there.
Describe alternatives you've considered
I tried removing all VDJ chains from the Dandelion object but that fails.
I tried clustering VJ chains from vdjx.data based on changeo_clone_id but they don't get clone_id assigned.
Additional context
No response
Hi @Ngort, happy to support.
Just want to clarify a few things:
I suppose you mean tl.find_clones
? as define_clones just points to changeo's DefineClones.py
at the moment there is a single check at
dandelion/dandelion/tools/_tools.py
Line 157 in 159d716
which basically would assert that the
.data
must contain VDJ, otherwise it would not perform the clustering.
I'm thinking i can can add an ifelse
at the end of this entire chunk, and "duplicate" this section (as a separate function to reduce code bloat):
dandelion/dandelion/tools/_tools.py
Lines 346 to 590 in 159d716
So then i envisage that the VJ
clone_id in this mode would just look like NoVDJ_1_2_3
and vice versa in the VDJ
mode, it would be 1_2_3_NoVJ
or something of the like.
Does this look like something you are after?
I think that would fit!
OK ! please bear with me as i try to implement it. Do you have a deadline you need this by?
I'm trying to submit an abstract by Sunday, but obviously I don't mean to rush your generous work! In the meantime, do you think there's a faster makeshift way for me to access the light chain clusters presumably generated during the tl.find_clones workflow?
Sorry, I stand corrected. I did mean to point to define_clones
, the wrapper of DefineClones.py
, although the same statement can be made about find_clones
. My understanding is that DefineClones.py
simply aligns sequences of any locus, regardless of whether they're paired to each other in a barcoded cell, and define_clones
creates the cell clusters (clone_id). Thus, I'd assume there's a way to use changeo's wrapper with just VJ data. Is this accurate?
Based on the documentation:
Clone definition is based on the following criterion:
Identical V- and J-gene usage in the VDJ chain (IGH/TRB/TRD).
Identical CDR3 junctional/CDR3 sequence length in the VDJ chain.
VDJ chain junctional/CDR3 sequences attains a minimum of % sequence similarity, based on hamming distance. The similarity cut-off is tunable (default is 85%; change to 100% if analyzing TCR data).
VJ chain (IGK/IGL/TRA/TRG) usage. If cells within clones use different VJ chains, the clone will be splitted following the same conditions for VDJ chains in (1-3) as above.
I guess I could try giving all the barcodes the same VDJ data and see what happens?
Ah i’m not sure… probably? If you change back to the previous release, the code based for define_clones is still there and you could try and run it.
Sorry as this code is implemented by the changeo people i can only offer limited assistance (don’t really know what it does)