ShahriyariLab/TumorDecon

Error with custom signature matrix

Opened this issue · 0 comments

Hi,

I followed the full_tutorial, I tried to use one h5ad file I processed before to generate the signature matrix.

However, I always got this (basically no signature_matrix):

['Plasma cells' 'Macrophages' 'Endothelial cells' 'T cells' 'B cells'
'Fibroblasts' 'ILC' 'Myelocytes' 'Epithelial cells' 'Mast cells' 'pDC'
'DC']
Reading batch-corrected dataset file pre_sig_celltypist_cell_label_coarse_high.txt...
Generating signature matrix from clustered batch-corrected datasets...
Saving signature matrix to kmeans_signature_matrix_qval_celltypist_cell_label_coarse_high.txt...
finish sig matrix
Signature Matrix:
Empty DataFrame
Columns: [Plasma cells_subtype_1, Plasma cells_subtype_2, Macrophages_subtype_1, Macrophages_subtype_2, Endothelial cells_subtype_1, Endothelial cells_subtype_2, T cells_subtype_1, T cells_subtype_2, B cells_subtype_1, B cells_subtype_2, Fibroblasts_subtype_1, Fibroblasts_subtype_2, ILC_subtype_1, ILC_subtype_2, Myelocytes_subtype_1, Myelocytes_subtype_2, Epithelial cells_subtype_1, Epithelial cells_subtype_2, Mast cells_subtype_1, Mast cells_subtype_2, pDC_subtype_1, pDC_subtype_2, DC_subtype_1, DC_subtype_2, DC_subtype_3, DC_subtype_4]
Index: []
Running CiberSort...
BR197R

Could you please let me know how to feed in with the correct information. I attached my code as following:

bulk_gene = pd.read_csv('14085_Gencode24_protein_coding_passedExpFilter_20Ctl_206UC.txt', sep='\t', index_col = 'Gene')
#bulk_gene = pd.read_csv('first_two_columns.txt', sep='\t', index_col = 'Gene')

single_ref = sc.read_h5ad('h5ad_files_2000HVG/biopsy_RNA.h5ad')
bulk_sc_genes = np.intersect1d(bulk_gene.index, single_ref.var_names)
bulk_gene = bulk_gene.loc[bulk_sc_genes, :].copy()
single_ref_r = single_ref.copy()

single_ref_r = single_ref_r[:, bulk_sc_genes].copy()
cell_subsets_r = single_ref_r.obs['celltypist_cell_label_coarse_high'].to_list()
sc.pp.highly_variable_genes(single_ref_r, flavor='seurat', inplace=True, n_top_genes=3000,
batch_key='celltypist_cell_label_coarse_high')
single_ref_r_hvg = single_ref_r[:, single_ref_r.var.highly_variable > 0]
single_ref_r_hvg

sc.tl.rank_genes_groups(single_ref_r_hvg, groupby = 'celltypist_cell_label_coarse_high', method='t-test', key_added = "t-test")
single_gene_expression_df = pd.DataFrame.sparse.from_spmatrix(single_ref_r_hvg.layers['counts'].T,
index=single_ref_r_hvg.var.index,
columns = single_ref_r_hvg.obs['celltypist_cell_label_coarse_high'].to_list())

dense_array = single_gene_expression_df.values
gene_expression_df = pd.DataFrame(dense_array, index=single_gene_expression_df.index,
columns=single_gene_expression_df.columns)
gene_expression_df = gene_expression_df.rename_axis('Ensembl_Gene_ID')
cell_type_series = pd.Series(single_gene_expression_df.columns)

cell_type = cell_type_series.unique()
print(cell_type)
gene_expression_df.to_csv('pre_sig_celltypist_cell_label_coarse_high.txt', sep='\t', index = True)
td.create_signature_matrix("pre_sig_celltypist_cell_label_coarse_high.txt", cell_type, clustered = True,
outfile="kmeans_signature_matrix_qval_celltypist_cell_label_coarse_high.txt")
print('finish sig matrix')

sig = td.read_sig_file("kmeans_signature_matrix_qval_celltypist_cell_label_coarse_high.txt", geneID='Ensembl_Gene_ID')
print("Signature Matrix:")
print(sig)
ciber_freqs = td.tumor_deconvolve(bulk_gene, 'cibersort',
patient_IDs='ALL',
sig_matrix=sig,
args={'nu':'best', 'scaling':'None', 'print_progress':True})

ciber_freqs.to_csv('Biopsy_coarse_annotations_MK_ciber_freqs.csv')

I really appreciate it!