Correct the mapping on external resources links in Supplementary Information page
Opened this issue ยท 6 comments
We have some conflicts mapping issues in bulk Supplementary Information page, regarding experiment type and ArrayExpress.
In the case of E-PROT-39
, as its experiment type is RNASEQ_MRNA_DIFFERENTIAL
so the external resources are grouped to ENA
and it also contains ArrayExpress
link which is invalid either.
https://www.ebi.ac.uk/gxa/experiments/E-PROT-39/Supplementary%20Information
as agreed on Slack, the accession-to-link resolution should be made independent of experiment type and rely on just the accession style itself
here are the accession to resource mappings:
ArrayExpress accessions
E-MTAB<> -> ArrayExpress
E-ERAD<> -> ArrayExpress
E-GEUV<> -> ArrayExpress
Proteome Exchange accessions - can be viewed in PRIDE (and elsewhere)
PDX<> -> PRIDE
GEO accessions
GSE<> -> GEO
GDS<> -> GEO
INSDC consortium project accessions - can be viewed in ENA (and elsewhere)
ERP<> -> ENA
SRP<> -> ENA
DRP<> -> ENA
BioProject NSDC consortium accessions - can be viewed in ENA (and elsewhere)
PRJEB<> -> ENA
PRJNA<> -> ENA
PRJDB<> -> ENA
EGA accessions
EGAS<> -> EGA
EGAD<> -> EGA
Some E-HCAD experiments (so these would be in SCEA only, not bulk) may have a 'bundle ID' in the secondary accession field in idf but I am not sure if that could be used to search and point to a project in the HCA Data portal
I've added EGA accession mapping to the list above.
Following discussions on Slack and during sprint mtg I suggest to dump the existing display hierarchy as it could accidentally remove valid multiple entries (e.g. for some CURD datasets where more than 1 experiment has been combined into one) and instead display all sources by default. The logic to check for truly synonymous entries may be quite complicated and not worth the effort right now I believe. If we discover cases where displaying all creates problems for users we can reevaluate.
Hi @sfexova, I have implemented the EGA
, ENA
and GEO
resource links, but for ArrayExpress
, it's a bit different, for example, experiment E-MTAB-1913
, in the idf
file, there is only one secondaryAccession
which is ERP003983
pointing to ENA
but there is no secondary accessions pointing to ArrayExpress
except for the experiment accession itself.
So does that mean that ArrayExpress
should look by the experiment accession or the secondary accession or both?
ah, good point!!
yes, for experiments from ArrayExpress it needs to be a bit different - for experiments with the ArrayExpress accession E-MTAB-XX we should look at the experiment accession only and ignore the [secondary accession] pointing to ENA because there we know they are synonymous
ah, good point!! yes, for experiments from ArrayExpress it needs to be a bit different - for experiments with the ArrayExpress accession E-MTAB-XX we should look at the experiment accession only and ignore the [secondary accession] pointing to ENA because there we know they are synonymous
Okay, thanks for the clarification, and how about the others?
E-ERAD<> -> ArrayExpress
E-GEUV<> -> ArrayExpress
Are these the experiment accession or [secondary accession] ?
Thanks.
yes, same rules for E-ERAD and E-GEUV as for the E-MTAB AE accessions > for these, ignore [secondary accession] and use experiment accession to link to ArrayExpress
the mapping rules above were all meant for the [secondary accession] - for cases when these different accession codes appear in the [secondary accession] field in the idf