phbradley/conga

Question: barcodes for cells

s2hui opened this issue · 4 comments

s2hui commented

Hello,
I would like to get the barcodes associated with the Conga clusters (in the image below, my understanding is that there are 41 cells and 26 clonotypes associated with the conga cluster 2/10). Is there an output file I could look at to get this info OR a place in a python data structure that points to this info? Thanks for your help,
@s2hui

image

Great question! There should be a tab-separated text file produced by the graph-vs-graph analysis named <outfile_prefix>_graph_vs_graph.tsv. That file has columns gex_cluster, tcr_cluster, and clone_index (and others). You would look for the lines with 2 in the gex_cluster column and 10 in the tcr_cluster, and get the clone_index column for those. Those clone_index numbers will be the indices into the final anndata object (if you still have it in a jupyter notebook) or if you ran conga from the command line, into the saved anndata object and saved tab-separated text file called <outfile_prefix>_final_obs.tsv. Looking at those (0-indexed) rows in either anndata (adata.obs) or in the final_obs.tsv spreadsheet will give you the barcodes for the representative cells, along with their TCR sequences. Then to get all the cell barcodes you could look for those TCR sequences in the input TCR information, for example the filtered_contigs file (or the conga clones file and barcode mapping files). As a cross-check, note that the graph_vs_graph.tsv file also has TCR amino acid information, so you can double-check you are getting the right cells. Let me know if that's not clear!

Here's a bit of code that does what Phil suggested above:

Open the AnnData object with the conga results

adata_file = 'your_path/some_prefix_conga.h5ad'
adata = sc.read_h5ad(adata_file)

Open the gvg results

gvg_hits_file = 'your_path/some_prefix_graph_vs_graph.tsv'
gvg_hits = pd.read_csv(gvg_hits_file, sep ='\t')

Clone_index in the gvg hits df matches the adata.obs index which contains the cell barcodes
We can append these to the gvg results with this:
gvg_hits['barcode'] = adata.obs.iloc[gvg_hits.clone_index,].index

Resave
gvg_hits.to_csv(gvg_hits_file, sep ='\t', index = False )

s2hui commented

Hi, Thank you for your help! I ran the code provided by @sschattgen and with the following modifications it appears to have worked.

adata_file = 'your_path/some_prefix_final.h5ad'
adata = sc.read_h5ad(adata_file)

@s2hui

Sorry for the typo but glad it worked!