plantinformatics/pretzel

Search all .vcf.gz files of the selected dataset

Don-Isdale opened this issue · 1 comments

Part of #383


Observable outcomes :

This enables the User to search for the given SNP names across all chromosomes of the selected dataset, and request the lookup of Genotype values for those SNP names.

Measure with :

Perform the script either from the command line, or by adding parameters to a request sent by the application, and confirm from the trace that the correct lookup command is performed, and that the results are correct, include all chromosomes of the dataset, and apply to the requested SNP names.


Task Sequence :

  • Implement changes in the lookup request execution, in calling bcftools
    This is complete and confirmed by the Test plan and execution in following comment below.

  • Connect that change in the call flow in frontend and backend, i.e. add / change parameters passed
    This is in progress, as indicated by the completed items in the following design breakdown.


  • [4-8H] vcfGenotypeLookup : in this use case don't pass scope (chromosome) parameter, i.e. it is an optional parameter of the API endpoint; instead request a list of non-soft-link .vcf.gz files and search those via individual API requests. The datasetId parameter is unchanged - it identifies the directory in which the .vcf.gz files reside.

  • [2-3H/6H/0H] vcfGenotypeReceiveResult() : dataset param instead of block; determine block from #CHROM column
    This includes :
    To enable loading of results from multiple chromosomes :

    • pass to addFeaturesJson() : dataset as an alternative to block
    • also request %CHROM column
    • configure genotype-search .selectedSamples as manageGenotype.vcfGenotypeSamplesSelected
    • establish dialogMode in manageGenotype : {component : 'genotype-search', datasetId }
    • VCF header text, for the genotype-search case : lookup dataset from vcfDatasetId and pass requestOptions.datasetVcfFile from dataset vcfFiles

Test

This facility is tested using the prototype 'Genotype Search' panel / dialog, which provides parameters : dataset Id, Sample names, SNP names.

Test Case

From the dataset Id the list of non-link .vcf.gz files are requested.

Results

Server log extract

Confirming that the list of .vcf.gz file names excluding soft-links is requested.

childProcess vcfGenotypeLookup.bash 0 false undefined 0 lb3app/scripts /media/don/Linux0/home/don/new/projects/agribio/markerMapViewer/pretzel.A3/lb4app

+ scope=noLinks
+ cd tmp/vcf/201028_40K_DAS5_samples_XT_exomeIDs
+ '[' noLinks = noLinks ']'

::ffff:127.0.0.1 - - [06/Jun/2024:18:26:52 +0000] "GET /api/Datasets/vcfGenotypeFeaturesCountsStatus?id=201028_40K_DAS5_samples_XT_exomeIDs HTTP/1.1" 200 605 "http://localhost:4200/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:125.0) Gecko/20100101 Firefox/125.0"


API Request

http://localhost:3000/api/Datasets/vcfGenotypeFeaturesCountsStatus?id=201028_40K_DAS5_samples_XT_exomeIDs

params :
id=201028_40K_DAS5_samples_XT_exomeIDs

API Result

(replace-string "\\n" "\n")

{"text":"scope=
 3170787 Jun  6 21:15 1A_copy.MAF.vcf.gz
  162731 Jun  6 21:15 1A_copy.MAF.vcf.gz.csi
 2671431 Aug  9  2022 1A_copy.vcf.gz
  165069 Jun  6 21:15 1A_copy.vcf.gz.csi
  463961 Jan 29 12:12 1A.MAF.SNPList.vcf.gz
  149577 Jan 29 12:12 1A.MAF.SNPList.vcf.gz.csi
 3164247 Jan 29 12:10 1A.MAF.vcf.gz
  159238 Jan 29 12:10 1A.MAF.vcf.gz.csi
  159118 Jan 29 12:07 1A.vcf.gz.csi
  463961 Jan 29 16:04 1B.MAF.SNPList.vcf.gz
  149577 Jan 29 16:04 1B.MAF.SNPList.vcf.gz.csi
 3164248 Jan 29 16:04 1B.MAF.vcf.gz
  159236 Jan 29 16:04 1B.MAF.vcf.gz.csi
  159118 Jan 29 16:04 1B.vcf.gz.csi
"}

Test Case

In this case there is 1 non-link .vcf.gz file, and this file name is included as parameter in the following request.

Results

Server log extract

Confirming that the correct .vcf.gz is used.


::ffff:127.0.0.1 - - [06/Jun/2024:12:24:49 +0000] "POST /api/Blocks/vcfGenotypeLookupPost HTTP/1.1" 200 - "http://localhost:4200/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:125.0) Gecko/20100101 Firefox/125.0"
The request processing time is 50.705 ms. for /vcfGenotypeLookupPost
vcfGenotypeLookup 201028_40K_DAS5_samples_XT_exomeIDs undefined Numerical 74 [
  'query',
  '201028_40K_DAS5_samples_XT_exomeIDs',
  '1A_copy.vcf.gz',
  '',
  '',
  '',
  '',
  '-queryStart',
  '-H',
  '-f',
  '%ID\t%POS\t%REF\t%ALT\t%INFO[\t%GT]\n',
  '-queryEnd'
]
childProcess vcfGenotypeLookup.bash 0 false undefined 0 lb3app/scripts /media/don/Linux0/home/don/new/projects/agribio/markerMapViewer/pretzel.A3/lb4app

+ bcftoolsCommand query 201028_40K_DAS5_samples_XT_exomeIDs/1A_copy.MAF.vcf.gz '' '' -s ExomeCapture-DAS5-001803,ExomeCapture-DAS5-001365,ExomeCapture-DAS5-002317


+ vcfGz=201028_40K_DAS5_samples_XT_exomeIDs/1A_copy.MAF.vcf.gz

+ echo isecDatasetIdsArray : 0 , vcfGzs 0 , snpNames 1 scaffold38755_1235130 scaffold38755_1337276

+ bcftools query 201028_40K_DAS5_samples_XT_exomeIDs/1A_copy.MAF.vcf.gz -s ExomeCapture-DAS5-001803,ExomeCapture-DAS5-001365,ExomeCapture-DAS5-002317 -H -f '%ID	%POS	%REF	%ALT	%INFO[	%GT]
' -i ' ID="scaffold38755_1235130" || ID="scaffold38755_1337276" '

cbWrap null #[1]ID	[2]POS	[3]REF	[4]ALT	[5](null)	[6]ExomeCapture-DAS5-001803:GT	[7]ExomeCapture-DAS5-001365:GT	[8]ExomeCapture-DAS5-002317:GT
scaffold38755_1235130	1235130	C	T	F_MISSING=0.0259067;NS=564;AN=1128; undefined

::ffff:127.0.0.1 - - [06/Jun/2024:12:24:51 +0000] "POST /api/Blocks/vcfGenotypeLookupPost HTTP/1.1" 200 387 "http://localhost:4200/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:125.0) Gecko/20100101 Firefox/125.0"




API Request

http://localhost:3000/api/Blocks/vcfGenotypeLookupPost

POST Data

(replace-string "," ",\n")

{"datasetId":"201028_40K_DAS5_samples_XT_exomeIDs",
  "preArgs":{
    "samples":"ExomeCapture-DAS5-001803\nExomeCapture-DAS5-001365\nExomeCapture-DAS5-002317",
    "requestInfo":false,
    "requestFormat":"Numerical",
    "requestSamplesAll":false,
    "snpPolymorphismFilter":false,
    "mafThreshold":0,
    "mafUpper":false,
    "featureCallRateThreshold":0,
    "datasetVcfFile":"1A_copy.vcf.gz",
    "snpNames":"scaffold38755_1235130\nscaffold38755_1337276"},
  "nLines":100,
  "options":{}}

API Result



{"text":"#[1]ID\t[2]POS\t[3]REF\t[4]ALT\t[5](null)\t[6]ExomeCapture-DAS5-001803:GT\t[7]ExomeCapture-DAS5-001365:GT\t[8]ExomeCapture-DAS5-002317:GT
scaffold38755_1235130\t1235130\tC\tT\tF_MISSING=0.0259067;NS=564;AN=1128;MAF=0.150709;AC=170;AC_Het=12\t0/0\t0/0\t0/0
scaffold38755_1337276\t1337276\tG\tC\tF_MISSING=0.0138169;NS=571;AN=1142;MAF=0.400175;AC=457;AC_Het=1\t0/0\t0/0\t1/1
"}