zktuong/dandelion

[QUESTION] Cannot combine mutiple samples via ddl.read_10x_vdj

sbenjamaporn opened this issue · 8 comments

Dear @zktuong,

I try to combine mutiple samples in ddl.read_10x_vdj by applying with you tutorial as follows;

first we read in the 2 bcr files

samples = ['A', 'B']
bcr_files = []
for sample in samples:
folder_location = sample
bcr_files.append(ddl.read_10x_vdj(folder_location, filename_prefix='filtered', verbose = True))
bcr = bcr_files[0].append(bcr_files[1:])
bcr.reset_index(inplace = True, drop = True)
bcr

The result show that 'Dandelion' object has no attribute 'append'.
Is there a way to combine mutiple samples using ddl.read_10x_vdj ?

hi @sbenjamaporn,

the issue you are facing is because ddl.read_10x_vdj returns a Dandelion object. So your options are:

samples = ['A', 'B']
bcr_files = []
for sample in samples:
    folder_location = sample
    bcr_files.append(ddl.read_10x_vdj(folder_location, filename_prefix='filtered', verbose = True))
    # bcr_files is now a list of Dandelion objects

# so from here, either:
bcr = ddl.concat(bcr_files) # this will add __0 and __1 to the end of the sequence_ids if they are not unique, which isn't ideal

# adjust the sequence_ids and cell ids first before continuing:
bcr_files[0].data['cell_id'] = ['A_' + x for x in bcr_files[0].data['cell_id']]
bcr_files[1].data['cell_id'] = ['B_' + x for x in bcr_files[1].data['cell_id']]
bcr_files[0].data['sequence_id'] = ['A_' + x for x in bcr_files[0].data['sequence_id']]
bcr_files[1].data['sequence_id'] = ['B_' + x for x in bcr_files[1].data['sequence_id']]

bcr2 = ddl.concat(bcr_files) # this should form ok now.

Alternatively, you can also just edit the actual file (just make a copy of it) prior to ddl.read_10x_vdj and it should read in ok. As long as the barcodes match up to how you have it in your AnnData.

Thank you so much, it works now

no worries! just a heads up, i will be releasing a new version today because i found some formatting bugs in the .metadata slot. In case you are wanting to try it now, you can do pip install git+https://www.github.com/zktuong/dandelion.git@v0.2.4 -U

Dear @zktuong ,

Thank you so much for this new version (v0.2.4), I am tring it now, and it works!

I have a question about pre-processing step. Now, I use the output called "filtered_contig_igblast_db-pass_genotyped.tsv" (the original output from dandelion v0.1.11 with singularity). I have read in your document. For v0.2.4, is it necessary to repeat the pre-processing step to get the file"filtered_contig_dandelion.tsv" ? or it also compatible to use the original file with this new version ?

I have attached the screenshot after pre-processing with singularity (v0.1.11) that I did it in last months ago. Do your new version solve this warning ?
BAAAC160-F4F5-4E54-9027-178DB56ABAAA

Best

hi @sbenjamaporn, yes the data should be compatible as it's essentially just a file name change. the other changes to the preprocessing steps are just cut offs and should not affect the bulk of the data.

The warnings are from R, and unfortunately would not clear - it doesn't impact on the output though.

Dear @zktuong

Thank you so much, I will try it for the next time 😄

Dear @zktuong,

I am following your instruction now!. There are something of this page about find_clones (https://sc-dandelion.readthedocs.io/en/latest/notebooks/3_dandelion_findingclones-10x_data.html) that make me confused. If I understand about clone_id (A_B_C_D_E_F). Should B be defined clones based on VDJ chain ? (Your description said VJ chain). If I am missunderstanding, feel free to let me know.

Best

oops yes. You are right. It's VDJ. just a typo on the docs. i will fix that up.