poseidon-framework/poseidon-hs

Getting more information out of trident list

Opened this issue · 3 comments

After talking with Stephan today, we thought it would be nice to have the option of getting more information about ecah package out of trident list. This would streamline integration into data analysis, and also downstream processes (like thetis).

Below is an example of a TSV with information contained in the POSEIDON.yml of 2 packages as a reference:

poseidon_version	package_dir	title	description	contributors	package_version	last_modified	genotype_format	geno_file	geno_file_chksum	snp_file	snp_file_chksum	ind_file	ind_file_chksum	snp_set	janno_file	janno_file_chksum	sequencing_source_file	sequencing_source_file_chksum	bib_file	bib_file_chksum	readme_file	changelog_file
2.5.0	/Users/lamnidis/poseidon_packages/community-archive/2018_OlaldeNature	2018_OlaldeNature	Ancient genomes from the Bell Beaker period in Europe. Originally AADR v42.4.	Ayshin Ghalichi (ghalichi@shh.mpg.de)	2.1.1	2023-07-11	PLINK	2018_OlaldeNature.bed	e11e8a7ef0b74e964732db0cbe5046f4	2018_OlaldeNature.bim	7a7ef4d4f9c78a0bba32a329b6162dbd	2018_OlaldeNature.fam	95f51d4ef3797b556e6c0154bf8d443d	1240K	2018_OlaldeNature.janno								
2.5.0	/Users/lamnidis/poseidon_packages/community-archive/2018_Lamnidis_Fennoscandia	2018_Lamnidis_Fennoscandia	Ancient genomes from Finland and Russia.	Thiseas Lamnidis (lamnidisi@shh.mpg.de)	2.1.0	2023-07-04	PLINK	2018_Lamnidis_Fennoscandia.bed	74d8d52d45a0d2f6ed1212af5d2f4268	2018_Lamnidis_Fennoscandia.bim	10fe736b07171086524ec92dc5e06a22	2018_Lamnidis_Fennoscandia.fam	90c1b106d15bceccc1e25c34d3060d75	1240K	2018_Lamnidis_Fennoscandia.janno								

trident list --remote --packages --raw already shows some of this information, so adding more columns to the ouput with a dedicated flag would do the trick.

Sound like a good idea to me 👍

What would be a solid interface for this? Just a --verbose (?) flag that adds all of these columns to the output? Or a more sophisticated argument to request specific columns?

IMO spitting out all the info in the YAML file with one flag is enough. It's easy enough to select a subset of columns downstream if need be.
The main thing for my use case here is the package_directory column, which is not in the YANL, but implicitly known (as the path to the file)

This is contingent on an update to the server API, since list also needs to feature --remote, so any listing we perform here must be possible also from the server. So this issue is somewhat contingent on #272 and #273. And I'm working on those.