zktuong/dandelion

Kernel dies while generating clone network

asadsaba opened this issue · 33 comments

Description of the question

Hi Kelvin,
Thanks for your previous replies and help. I am getting problem in generating clone network with more than 20k cells.
Could you please guide me which parameters to consider? thanks

Minimal example

No response

Any error message produced by the code above

No response

OS information

No response

Version information

No response

Additional context

No response

Hi Saba, perhaps you are running into memory issues? try increasing to ~80gb if you can (i typically set this as the limit in my jobs). You can also consider setting layout_method = 'mod_fr' so everything stays within python. I have encountered some weird behaviors with sfdp #219 mode recently.

Thanks Kelvin.I will try increasing the memory.

I have a couple of questions please.
1: Is there a way that I can plot multiple grid barplots as currently I am able to plot a sinlge barplot.
2: How can I look for clones in a specific leiden/louvian cluster?

Thanks heaps

1: Is there a way that I can plot multiple grid barplots as currently I am able to plot a sinlge barplot.

you would have to play with matplotlib/pandas plotting unfortunately

2: How can I look for clones in a specific leiden/louvian cluster?

subset to the cluster of choice in the anndata object (after transfer) and then look at the clone_id column

great, thanks alot.

Also
Could you please help me in understanding why while defining Clones:
The VDJ chain junctional/CDR3 sequences attains a minimum of % sequence similarity, based on hamming distance. The similarity cut-off is tunable (default is 85%; change to 100% if analyzing TCR data).

Why it is 85% for BCRs?Is it conventional?

yes it's just an anecdotally derived cut-off. different papers seem to choose anywhere between 80 to 90%.

Thanks Kelvin.

Sorry, I have one more question.
Can I use Seurat object UMAP embeddings as I have demultiplexed and clustered our data into Seurat. I just want to use same embeddings or cluster information to keep it consistent.

You can - the generate_network itself ist only for visualisation purposes.

You can in fact choose not to compute the layout (which i believe will circumvent the problem in this issue) by adding compute_layout = False. The embedding information will not be generated but the graph (connectivitives) will still be populated.

You can then also use scirpy to visualise the clones:
https://sc-dandelion.readthedocs.io/en/latest/notebooks/1c_dandelion_scirpy.html

Thanks.
If I am understanding it right, it means I can initialise the Seurat metadata with the network generated by tl.transfer?

the interaction with Seurat could be patchy - i have not used it in a while but there is a preliminary tutorial on how to achieve it:
https://sc-dandelion.readthedocs.io/en/latest/notebooks/6_dandelion_running_from_R-10x_data.html

Yes, you are right.
Is there a way that I can demultiplex in scanpy and then generate the clusters?

what do you mean demultiplex?

Then scanpy workflow is more or less a clone of the the seurat workflow so you will be able to achieve similar results.

Our data is hashtagged so I used Seurat Seurat function HTODemux() to assign single cells back to their sample origins and remove the doublets to filter high quality cells.

How can I do this in scanpy?

ah i see.

Well you can always do that in Seurat first, then convert to h5ad (https://github.com/cellgeni/sceasy) and proceed from there on?

there's also this: https://scanpy.readthedocs.io/en/stable/generated/scanpy.external.pp.hashsolo.html

also check out this issue scverse/scanpy#351

Thanks heaps Kelvin!! Will look into these docs.

Hi Kelvin,
below is how my anndata looks like. Can I filter multiple columns at the same time like e.g I want to see the distribution og 'clone_id_size' in 'samples' with specific value in 'hash.ID'

While trying to filter, it gives an error that anndata doesn't accepts the & operator.
Screen Shot 2022-12-15 at 3 04 35 pm

you need to do something like:

adata[(adata.obs['clone_id_size'] == some_value) & (adata.obs['sample_id'] == some_other_value)]

more than 1 match:

adata[(adata.obs['clone_id_size'].isin([val1, val2, val3])) & (adata.obs['sample_id'].isin([valx, valy]))]

Thanks for the support Kelvin. I have a question please. Can I open the anndata object initialised with network in R? Its just because I want to keep the UMAP visualisations consistent.

Also could you please help me in understanding each column of check_contig output e.g v_call_genotyped_VDJ. It will enable me to analyse the results properly. Thanks heaps

Thanks for the support Kelvin. I have a question please. Can I open the anndata object initialised with network in R? Its just because I want to keep the UMAP visualisations consistent.

I think you just need to transfer the .obsm slots to the relevant reduced dim slots in the R object. Not sure where seurat/sce stores the neighorhood graphs so i can't help with plotting the edges unfortunately.

Also could you please help me in understanding each column of check_contig output e.g v_call_genotyped_VDJ. It will enable me to analyse the results properly. Thanks heaps

Other than the specific columns i've described in the tutorial, the rest are basically the vdj info collaped to that cell. So, v_call_genotyped_VDJ is the v_call annotation for the VDJ chain (IGH) (genotyped just means it's post tigger). VJ is for the ligh chain.

Thanks Kelvin.
I have one more question please. What these numbers in clone-id mean and if this id had size 54 what it means ? How can I filter the shared and unique clones?
e.g
B_106_3_2_49_2_2.

Also I have increased the memory to ~80gb to run tl.generate.network but unfortunately it runs but doesn't shows any output.

Hi Asad,

Thanks Kelvin.
I have one more question please. What these numbers in clone-id mean and if this id had size 54 what it means ? How can I filter the shared and unique clones?
e.g
B_106_3_2_49_2_2.

This is covered in extreme detail in my tutorial. https://sc-dandelion.readthedocs.io/en/latest/notebooks/3_dandelion_findingclones-10x_data.html

Also I have increased the memory to ~80gb to run tl.generate.network but unfortunately it runs but doesn't shows any output.

You need to continue with the tutorial to reach the visualisations https://sc-dandelion.readthedocs.io/en/latest/notebooks/4_dandelion_visualization-10x_data.html

Thanks Kelvin. From the tutorial 106_indicates if the contigs use the same V and J genes in the VDJ chain. I just want to clarify what the number 106 presents. Does it means the number of contigs with same V and J genes?

Thanks alot Kelvin

Is the lack of output related to #219 ?

if so, maybe use the other layout method (slower but original method)

Hi Kelvin, I have tried layout_method = 'mod_fr' method to generate network but it doesn't shows any output.

Screen Shot 2022-12-22 at 11 32 55 am

I have increased the memory to 90GB

You’ve called it wrongly.

should just be:

ddl.tl.generate_network(vdj, …)

as it modifies in place, you don’t need to add vdj =

Thanks alot Kelvin. It worked by increasing memory and calling ddl.tl.generate_network(vdj).

great!