[QUESTION] Cannot combine mutiple samples via ddl.read_10x_vdj

Question

[QUESTION] Cannot combine mutiple samples via ddl.read_10x_vdj

sbenjamaporn opened this issue 2 years ago · 8 comments

I try to combine mutiple samples in ddl.read_10x_vdj by applying with you tutorial as follows;

first we read in the 2 bcr files

samples = ['A', 'B']
bcr_files = []
for sample in samples:
folder_location = sample
bcr_files.append(ddl.read_10x_vdj(folder_location, filename_prefix='filtered', verbose = True))
bcr = bcr_files[0].append(bcr_files[1:])
bcr.reset_index(inplace = True, drop = True)
bcr

The result show that 'Dandelion' object has no attribute 'append'.
Is there a way to combine mutiple samples using ddl.read_10x_vdj ?

Answer 1 · 2022-06-24T07:20:24.000Z

hi @sbenjamaporn,

the issue you are facing is because ddl.read_10x_vdj returns a Dandelion object. So your options are:

samples = ['A', 'B']
bcr_files = []
for sample in samples:
    folder_location = sample
    bcr_files.append(ddl.read_10x_vdj(folder_location, filename_prefix='filtered', verbose = True))
    # bcr_files is now a list of Dandelion objects

# so from here, either:
bcr = ddl.concat(bcr_files) # this will add __0 and __1 to the end of the sequence_ids if they are not unique, which isn't ideal

# adjust the sequence_ids and cell ids first before continuing:
bcr_files[0].data['cell_id'] = ['A_' + x for x in bcr_files[0].data['cell_id']]
bcr_files[1].data['cell_id'] = ['B_' + x for x in bcr_files[1].data['cell_id']]
bcr_files[0].data['sequence_id'] = ['A_' + x for x in bcr_files[0].data['sequence_id']]
bcr_files[1].data['sequence_id'] = ['B_' + x for x in bcr_files[1].data['sequence_id']]

bcr2 = ddl.concat(bcr_files) # this should form ok now.

Alternatively, you can also just edit the actual file (just make a copy of it) prior to ddl.read_10x_vdj and it should read in ok. As long as the barcodes match up to how you have it in your AnnData.

Answer 2 · 2022-06-28T06:44:07.000Z

Thank you so much, it works now

Answer 3 · 2022-06-29T08:38:44.000Z

no worries! just a heads up, i will be releasing a new version today because i found some formatting bugs in the .metadata slot. In case you are wanting to try it now, you can do pip install git+https://www.github.com/zktuong/dandelion.git@v0.2.4 -U

Answer 4 · 2022-07-23T12:41:21.000Z

Dear @zktuong ,

Thank you so much for this new version (v0.2.4), I am tring it now, and it works!

I have a question about pre-processing step. Now, I use the output called "filtered_contig_igblast_db-pass_genotyped.tsv" (the original output from dandelion v0.1.11 with singularity). I have read in your document. For v0.2.4, is it necessary to repeat the pre-processing step to get the file"filtered_contig_dandelion.tsv" ? or it also compatible to use the original file with this new version ?

I have attached the screenshot after pre-processing with singularity (v0.1.11) that I did it in last months ago. Do your new version solve this warning ?

Best

Answer 5 · 2022-07-24T15:15:24.000Z

hi @sbenjamaporn, yes the data should be compatible as it's essentially just a file name change. the other changes to the preprocessing steps are just cut offs and should not affect the bulk of the data.

The warnings are from R, and unfortunately would not clear - it doesn't impact on the output though.

Answer 6 · 2022-07-26T14:55:48.000Z

Dear @zktuong

Thank you so much, I will try it for the next time 😄

Answer 7 · 2022-07-31T16:39:27.000Z

Dear @zktuong,

I am following your instruction now!. There are something of this page about find_clones (https://sc-dandelion.readthedocs.io/en/latest/notebooks/3_dandelion_findingclones-10x_data.html) that make me confused. If I understand about clone_id (A_B_C_D_E_F). Should B be defined clones based on VDJ chain ? (Your description said VJ chain). If I am missunderstanding, feel free to let me know.

Best

Answer 8 · 2022-07-31T17:22:27.000Z

oops yes. You are right. It's VDJ. just a typo on the docs. i will fix that up.