Search all .vcf.gz files of the selected dataset
Don-Isdale opened this issue · 1 comments
Part of #383
Observable outcomes :
This enables the User to search for the given SNP names across all chromosomes of the selected dataset, and request the lookup of Genotype values for those SNP names.
Measure with :
Perform the script either from the command line, or by adding parameters to a request sent by the application, and confirm from the trace that the correct lookup command is performed, and that the results are correct, include all chromosomes of the dataset, and apply to the requested SNP names.
Task Sequence :
-
Implement changes in the lookup request execution, in calling bcftools
This is complete and confirmed by the Test plan and execution in following comment below. -
Connect that change in the call flow in frontend and backend, i.e. add / change parameters passed
This is in progress, as indicated by the completed items in the following design breakdown.
-
[4-8H] vcfGenotypeLookup : in this use case don't pass scope (chromosome) parameter, i.e. it is an optional parameter of the API endpoint; instead request a list of non-soft-link .vcf.gz files and search those via individual API requests. The datasetId parameter is unchanged - it identifies the directory in which the .vcf.gz files reside.
-
[2-3H/6H/0H] vcfGenotypeReceiveResult() : dataset param instead of block; determine block from #CHROM column
This includes :
To enable loading of results from multiple chromosomes :
-
- pass to addFeaturesJson() : dataset as an alternative to block
-
- also request %CHROM column
-
- configure genotype-search .selectedSamples as manageGenotype.vcfGenotypeSamplesSelected
-
- establish dialogMode in manageGenotype : {component : 'genotype-search', datasetId }
-
- VCF header text, for the genotype-search case : lookup dataset from vcfDatasetId and pass requestOptions.datasetVcfFile from dataset vcfFiles
Test
This facility is tested using the prototype 'Genotype Search' panel / dialog, which provides parameters : dataset Id, Sample names, SNP names.
Test Case
From the dataset Id the list of non-link .vcf.gz files are requested.
Results
Server log extract
Confirming that the list of .vcf.gz file names excluding soft-links is requested.
childProcess vcfGenotypeLookup.bash 0 false undefined 0 lb3app/scripts /media/don/Linux0/home/don/new/projects/agribio/markerMapViewer/pretzel.A3/lb4app
+ scope=noLinks
+ cd tmp/vcf/201028_40K_DAS5_samples_XT_exomeIDs
+ '[' noLinks = noLinks ']'
::ffff:127.0.0.1 - - [06/Jun/2024:18:26:52 +0000] "GET /api/Datasets/vcfGenotypeFeaturesCountsStatus?id=201028_40K_DAS5_samples_XT_exomeIDs HTTP/1.1" 200 605 "http://localhost:4200/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:125.0) Gecko/20100101 Firefox/125.0"
API Request
http://localhost:3000/api/Datasets/vcfGenotypeFeaturesCountsStatus?id=201028_40K_DAS5_samples_XT_exomeIDs
params :
id=201028_40K_DAS5_samples_XT_exomeIDs
API Result
(replace-string "\\n" "\n")
{"text":"scope=
3170787 Jun 6 21:15 1A_copy.MAF.vcf.gz
162731 Jun 6 21:15 1A_copy.MAF.vcf.gz.csi
2671431 Aug 9 2022 1A_copy.vcf.gz
165069 Jun 6 21:15 1A_copy.vcf.gz.csi
463961 Jan 29 12:12 1A.MAF.SNPList.vcf.gz
149577 Jan 29 12:12 1A.MAF.SNPList.vcf.gz.csi
3164247 Jan 29 12:10 1A.MAF.vcf.gz
159238 Jan 29 12:10 1A.MAF.vcf.gz.csi
159118 Jan 29 12:07 1A.vcf.gz.csi
463961 Jan 29 16:04 1B.MAF.SNPList.vcf.gz
149577 Jan 29 16:04 1B.MAF.SNPList.vcf.gz.csi
3164248 Jan 29 16:04 1B.MAF.vcf.gz
159236 Jan 29 16:04 1B.MAF.vcf.gz.csi
159118 Jan 29 16:04 1B.vcf.gz.csi
"}
Test Case
In this case there is 1 non-link .vcf.gz file, and this file name is included as parameter in the following request.
Results
Server log extract
Confirming that the correct .vcf.gz is used.
::ffff:127.0.0.1 - - [06/Jun/2024:12:24:49 +0000] "POST /api/Blocks/vcfGenotypeLookupPost HTTP/1.1" 200 - "http://localhost:4200/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:125.0) Gecko/20100101 Firefox/125.0"
The request processing time is 50.705 ms. for /vcfGenotypeLookupPost
vcfGenotypeLookup 201028_40K_DAS5_samples_XT_exomeIDs undefined Numerical 74 [
'query',
'201028_40K_DAS5_samples_XT_exomeIDs',
'1A_copy.vcf.gz',
'',
'',
'',
'',
'-queryStart',
'-H',
'-f',
'%ID\t%POS\t%REF\t%ALT\t%INFO[\t%GT]\n',
'-queryEnd'
]
childProcess vcfGenotypeLookup.bash 0 false undefined 0 lb3app/scripts /media/don/Linux0/home/don/new/projects/agribio/markerMapViewer/pretzel.A3/lb4app
+ bcftoolsCommand query 201028_40K_DAS5_samples_XT_exomeIDs/1A_copy.MAF.vcf.gz '' '' -s ExomeCapture-DAS5-001803,ExomeCapture-DAS5-001365,ExomeCapture-DAS5-002317
+ vcfGz=201028_40K_DAS5_samples_XT_exomeIDs/1A_copy.MAF.vcf.gz
+ echo isecDatasetIdsArray : 0 , vcfGzs 0 , snpNames 1 scaffold38755_1235130 scaffold38755_1337276
+ bcftools query 201028_40K_DAS5_samples_XT_exomeIDs/1A_copy.MAF.vcf.gz -s ExomeCapture-DAS5-001803,ExomeCapture-DAS5-001365,ExomeCapture-DAS5-002317 -H -f '%ID %POS %REF %ALT %INFO[ %GT]
' -i ' ID="scaffold38755_1235130" || ID="scaffold38755_1337276" '
cbWrap null #[1]ID [2]POS [3]REF [4]ALT [5](null) [6]ExomeCapture-DAS5-001803:GT [7]ExomeCapture-DAS5-001365:GT [8]ExomeCapture-DAS5-002317:GT
scaffold38755_1235130 1235130 C T F_MISSING=0.0259067;NS=564;AN=1128; undefined
::ffff:127.0.0.1 - - [06/Jun/2024:12:24:51 +0000] "POST /api/Blocks/vcfGenotypeLookupPost HTTP/1.1" 200 387 "http://localhost:4200/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:125.0) Gecko/20100101 Firefox/125.0"
API Request
http://localhost:3000/api/Blocks/vcfGenotypeLookupPost
POST Data
(replace-string "," ",\n")
{"datasetId":"201028_40K_DAS5_samples_XT_exomeIDs",
"preArgs":{
"samples":"ExomeCapture-DAS5-001803\nExomeCapture-DAS5-001365\nExomeCapture-DAS5-002317",
"requestInfo":false,
"requestFormat":"Numerical",
"requestSamplesAll":false,
"snpPolymorphismFilter":false,
"mafThreshold":0,
"mafUpper":false,
"featureCallRateThreshold":0,
"datasetVcfFile":"1A_copy.vcf.gz",
"snpNames":"scaffold38755_1235130\nscaffold38755_1337276"},
"nLines":100,
"options":{}}
API Result
{"text":"#[1]ID\t[2]POS\t[3]REF\t[4]ALT\t[5](null)\t[6]ExomeCapture-DAS5-001803:GT\t[7]ExomeCapture-DAS5-001365:GT\t[8]ExomeCapture-DAS5-002317:GT
scaffold38755_1235130\t1235130\tC\tT\tF_MISSING=0.0259067;NS=564;AN=1128;MAF=0.150709;AC=170;AC_Het=12\t0/0\t0/0\t0/0
scaffold38755_1337276\t1337276\tG\tC\tF_MISSING=0.0138169;NS=571;AN=1142;MAF=0.400175;AC=457;AC_Het=1\t0/0\t0/0\t1/1
"}