cortes-ciriano-lab/SComatic

how to convert to the regular VCF file genotype such as 0/0 0/1 1/1

linfanxiao opened this issue · 1 comments

Dear authors,

I have been working with mouse models using scRNA-seq datasets and have obtained genotype information per cell, as shown below:

#CHROM	Start	End	REF	ALT_expected	Cell_type_expected	Num_cells_expected	CB	Cell_type_observed	Base_observed	Num_reads
GL456216.1	15723	15723	A	G,G	Astrocyte,Neuron	19,24	ACACAGTCACGTAACT	Fibroblast	G	4
GL456216.1	15729	15729	G	A,A	Astrocyte,Neuron	20,27	ACACAGTCACGTAACT	Fibroblast	A	4
GL456216.1	15909	15909	T	C,C	Astrocyte,Neuron	54,44	ATGGGTTCATCCTTGC	Fibroblast	C	2
GL456216.1	34825	34825	T	A	Neuron	7	TTCGGTCTCTTTCCAA	Fibroblast	T	1
GL456216.1	34825	34825	T	A	Neuron	7	CCACCATTCGTGGAAG	Fibroblast	T	1
GL456216.1	34825	34825	T	A	Neuron	7	CGAGGCTCAGTATGAA	Fibroblast	T	1
GL456216.1	34825	34825	T	A	Neuron	7	CATCGCTCAGTCACGC	Fibroblast	T	1
GL456216.1	34825	34825	T	A	Neuron	7	TGGTGATAGTACCCTA	Fibroblast	T	1
JH584304.1	53336	53336	C	A	Neuron	4	TCCATGCGTCAGGTAG	Fibroblast	C	1

I have a couple of questions regarding this output. Firstly, I noticed that the Cell_type_expected column is supposed to indicate the specific cell-type somatic mutation, which in this example should be "Fibroblast". Should we filter out cells that do not match their expected mutation?

Secondly, I need to convert the genotype information, represented by letters such as "G" in the second line, into the regular genotype format such as "0/0", "0/1" or "1/1". This is required for lineage tracing and to generate a regular VCF4 format file for phylogenetic analysis.

Thank you for your time and I look forward to your response.

Best regards,
Eric

Dear user,
I'm sorry for not getting back to you sooner. We do not provide a tool to create a vcf file, but we have created a FAQs section to help interpret the output of this functionality.

Thanks for your patience,
Fran