Kernel dies while generating clone network

Question

Kernel dies while generating clone network

asadsaba opened this issue 2 years ago · 33 comments

Description of the question

Hi Kelvin,
Thanks for your previous replies and help. I am getting problem in generating clone network with more than 20k cells.
Could you please guide me which parameters to consider? thanks

Minimal example

No response

Any error message produced by the code above

No response

OS information

No response

Version information

No response

Additional context

No response

zktuong commented 2 years ago

great!

Answer 1 · 2022-12-14T14:18:25.000Z

Hi Saba, perhaps you are running into memory issues? try increasing to ~80gb if you can (i typically set this as the limit in my jobs). You can also consider setting layout_method = 'mod_fr' so everything stays within python. I have encountered some weird behaviors with sfdp #219 mode recently.

Answer 2 · 2022-12-14T14:25:07.000Z

Thanks Kelvin.I will try increasing the memory.

I have a couple of questions please.
1: Is there a way that I can plot multiple grid barplots as currently I am able to plot a sinlge barplot.
2: How can I look for clones in a specific leiden/louvian cluster?

Thanks heaps

Answer 3 · 2022-12-14T14:30:56.000Z

1: Is there a way that I can plot multiple grid barplots as currently I am able to plot a sinlge barplot.

you would have to play with matplotlib/pandas plotting unfortunately

2: How can I look for clones in a specific leiden/louvian cluster?

subset to the cluster of choice in the anndata object (after transfer) and then look at the clone_id column

Answer 4 · 2022-12-14T14:35:16.000Z

great, thanks alot.

Also
Could you please help me in understanding why while defining Clones:
The VDJ chain junctional/CDR3 sequences attains a minimum of % sequence similarity, based on hamming distance. The similarity cut-off is tunable (default is 85%; change to 100% if analyzing TCR data).

Why it is 85% for BCRs?Is it conventional?

Answer 5 · 2022-12-14T14:55:26.000Z

yes it's just an anecdotally derived cut-off. different papers seem to choose anywhere between 80 to 90%.

Answer 6 · 2022-12-14T14:57:05.000Z

Thanks Kelvin.

Sorry, I have one more question.
Can I use Seurat object UMAP embeddings as I have demultiplexed and clustered our data into Seurat. I just want to use same embeddings or cluster information to keep it consistent.

Answer 7 · 2022-12-14T15:01:29.000Z

You can - the generate_network itself ist only for visualisation purposes.

You can in fact choose not to compute the layout (which i believe will circumvent the problem in this issue) by adding compute_layout = False. The embedding information will not be generated but the graph (connectivitives) will still be populated.

You can then also use scirpy to visualise the clones:
https://sc-dandelion.readthedocs.io/en/latest/notebooks/1c_dandelion_scirpy.html

Answer 8 · 2022-12-14T15:10:00.000Z

Thanks.
If I am understanding it right, it means I can initialise the Seurat metadata with the network generated by tl.transfer?

Answer 9 · 2022-12-14T15:11:45.000Z

the interaction with Seurat could be patchy - i have not used it in a while but there is a preliminary tutorial on how to achieve it:
https://sc-dandelion.readthedocs.io/en/latest/notebooks/6_dandelion_running_from_R-10x_data.html

Answer 10 · 2022-12-14T15:14:01.000Z

Yes, you are right.
Is there a way that I can demultiplex in scanpy and then generate the clusters?

Answer 11 · 2022-12-14T15:15:59.000Z

what do you mean demultiplex?

Then scanpy workflow is more or less a clone of the the seurat workflow so you will be able to achieve similar results.

Answer 12 · 2022-12-14T15:20:13.000Z

Our data is hashtagged so I used Seurat Seurat function HTODemux() to assign single cells back to their sample origins and remove the doublets to filter high quality cells.

Answer 13 · 2022-12-14T15:22:14.000Z

How can I do this in scanpy?

Answer 14 · 2022-12-14T15:24:14.000Z

ah i see.

Well you can always do that in Seurat first, then convert to h5ad (https://github.com/cellgeni/sceasy) and proceed from there on?

there's also this: https://scanpy.readthedocs.io/en/stable/generated/scanpy.external.pp.hashsolo.html

Answer 15 · 2022-12-14T15:26:16.000Z

also check out this issue scverse/scanpy#351

Answer 16 · 2022-12-14T15:26:26.000Z

Thanks heaps Kelvin!! Will look into these docs.

Answer 17 · 2022-12-15T04:08:43.000Z

Hi Kelvin,
below is how my anndata looks like. Can I filter multiple columns at the same time like e.g I want to see the distribution og 'clone_id_size' in 'samples' with specific value in 'hash.ID'

While trying to filter, it gives an error that anndata doesn't accepts the & operator.

Answer 18 · 2022-12-15T10:37:14.000Z

you need to do something like:

adata[(adata.obs['clone_id_size'] == some_value) & (adata.obs['sample_id'] == some_other_value)]

more than 1 match:

adata[(adata.obs['clone_id_size'].isin([val1, val2, val3])) & (adata.obs['sample_id'].isin([valx, valy]))]

Answer 19 · 2022-12-20T00:38:56.000Z

Thanks for the support Kelvin. I have a question please. Can I open the anndata object initialised with network in R? Its just because I want to keep the UMAP visualisations consistent.

Answer 20 · 2022-12-20T00:44:34.000Z

Also could you please help me in understanding each column of check_contig output e.g v_call_genotyped_VDJ. It will enable me to analyse the results properly. Thanks heaps

Answer 21 · 2022-12-20T20:02:23.000Z

Thanks for the support Kelvin. I have a question please. Can I open the anndata object initialised with network in R? Its just because I want to keep the UMAP visualisations consistent.

I think you just need to transfer the .obsm slots to the relevant reduced dim slots in the R object. Not sure where seurat/sce stores the neighorhood graphs so i can't help with plotting the edges unfortunately.

Also could you please help me in understanding each column of check_contig output e.g v_call_genotyped_VDJ. It will enable me to analyse the results properly. Thanks heaps

Other than the specific columns i've described in the tutorial, the rest are basically the vdj info collaped to that cell. So, v_call_genotyped_VDJ is the v_call annotation for the VDJ chain (IGH) (genotyped just means it's post tigger). VJ is for the ligh chain.

Answer 22 · 2022-12-20T21:48:19.000Z

Thanks Kelvin.
I have one more question please. What these numbers in clone-id mean and if this id had size 54 what it means ? How can I filter the shared and unique clones?
e.g
B_106_3_2_49_2_2.

Answer 23 · 2022-12-20T21:54:42.000Z

Also I have increased the memory to ~80gb to run tl.generate.network but unfortunately it runs but doesn't shows any output.

Answer 24 · 2022-12-20T22:53:10.000Z

Hi Asad,

Thanks Kelvin.
I have one more question please. What these numbers in clone-id mean and if this id had size 54 what it means ? How can I filter the shared and unique clones?
e.g
B_106_3_2_49_2_2.

This is covered in extreme detail in my tutorial. https://sc-dandelion.readthedocs.io/en/latest/notebooks/3_dandelion_findingclones-10x_data.html

Also I have increased the memory to ~80gb to run tl.generate.network but unfortunately it runs but doesn't shows any output.

You need to continue with the tutorial to reach the visualisations https://sc-dandelion.readthedocs.io/en/latest/notebooks/4_dandelion_visualization-10x_data.html

Answer 25 · 2022-12-20T23:10:12.000Z

Thanks Kelvin. From the tutorial 106_indicates if the contigs use the same V and J genes in the VDJ chain. I just want to clarify what the number 106 presents. Does it means the number of contigs with same V and J genes?

Answer 26 · 2022-12-20T23:18:16.000Z

Nope it’s just a umber. Like a barcode. Kelvin On 20 Dec 2022, at 11:10 PM, asadsaba ***@***.***> wrote: Thanks Kelvin. From the tutorial 106_indicates if the contigs use the same V and J genes in the VDJ chain. I just want to clarify what the number 106 presents. Does it means the number of contigs with same V and J genes? — Reply to this email directly, view it on GitHub [github.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_zktuong_dandelion_issues_235-23issuecomment-2D1360434438&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=NnH1lFEAbZToqib-c1bFKCDR6VzAy7mQ1sbB2q4qbXQ&m=DSeiKjSEnMtfGFUSHl0KGn9dzHNT0YeZHh6_sLRD9jmt632MZNU1EPCcSPh0-Dkw&s=SzlJO2cz41UxxZ-8HOBccfDBLDqeK2MMu8qEdHzQfac&e=>, or unsubscribe [github.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AGIAJI7LB5MJYKBCJD2A3Y3WOI4F5ANCNFSM6AAAAAAS5Z3FLU&d=DwMCaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=NnH1lFEAbZToqib-c1bFKCDR6VzAy7mQ1sbB2q4qbXQ&m=DSeiKjSEnMtfGFUSHl0KGn9dzHNT0YeZHh6_sLRD9jmt632MZNU1EPCcSPh0-Dkw&s=j7hbiqVkOdhT5efWau3Qp5EMYMkTNnnE2qhmX0l3cqI&e=>. You are receiving this because you commented.Message ID: ***@***.***>

…

-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

Answer 27 · 2022-12-21T01:11:39.000Z

Thanks alot Kelvin

Answer 28 · 2022-12-21T06:43:09.000Z

Is the lack of output related to #219 ?

if so, maybe use the other layout method (slower but original method)

Answer 29 · 2022-12-22T00:34:31.000Z

Hi Kelvin, I have tried layout_method = 'mod_fr' method to generate network but it doesn't shows any output.

Answer 30 · 2022-12-22T00:35:17.000Z

I have increased the memory to 90GB

Answer 31 · 2022-12-22T06:52:25.000Z

You’ve called it wrongly.

should just be:

ddl.tl.generate_network(vdj, …)

as it modifies in place, you don’t need to add vdj =

Answer 32 · 2022-12-22T13:31:28.000Z

Thanks alot Kelvin. It worked by increasing memory and calling ddl.tl.generate_network(vdj).