saezlab/OmnipathR

Clarification about commercial licenses

Opened this issue · 3 comments

Hi,

There are a few data sources that seem to have a non-commercial license:

  • SIGNOR is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. https://signor.uniroma2.it/documentation/
  • LRdb is a curation effort of the public databases incliding FANTOM5, HPRD, HPMR, Guide to Pharmacology and UniProt. HPRD requires commercial license.
  • NCI-PID is a database consists of curated Biocarta and Reactome data. Biocarta requires a commercial license
  • PhosphoPoint Integrates three phospho-protein databases; PhosphoELM (http://phospho.elm.eu.org/dumps/Phospho.Elm_AcademicLicense.pdf), HPRD (commercial license required) and SwissProt
  • The ProtMapper software is available under a BSD2 license but it uses data from PhosphoSitePlus (which requires a commercial license https://www.phosphosite.org/staticLicensing) and UniProt.
  • SignaLink Contains data from ACSN, TheBioGrid, ComPPI, Human Protein Reference Database, InnateDB, IntAct, lncRinter, miR2Disease, miRDeathDB, miRSponge, NPinter, OmniPath, PhosphoSite, PTMCode2, Ramirez et al. 2009, Reactome, Signor, SignaFish, StarBase and TarBase some of which require commercial licenses

I noticed that these data sources appear in the output table , even when seleting the "commercial" option
I am not sure if Omnipath uses a version of such databases that could be "public", for instance SIGNOR ? but I believe that the resources such as LRdb, PhosphoPoint and others should be considered "commercial".

If this is correct, it possible to update the package to remove them if they are indeed not commercial?
Thank you.

Hi,

This is a good point, I'll try to implement it.

The license parameter currently only makes sure that each record is supported by at least one evidence that fits the chosen licensing requirements. All other columns (sources, references) remain unchanged. Recently we introduced a feature that is able to reconstruct the data frames based on any subset of resources. Currently it doesn't care about the license, but it would be easy to fix that.

Thank you!

Hello @deeenes and @InesdeSantiago do you have any update related to this ticket?
I am using the following queries:

options(omnipath.license = 'commercial')
omnipath_show_db()
as.data.frame(get_annotation_resources())
as.data.frame(get_resources(query_type, datasets = NULL, generic_categories = NULL))

but the result is the same as if I run this:

options(omnipath.license = 'academic')
omnipath_show_db()
as.data.frame(get_annotation_resources())
as.data.frame(get_resources(query_type, datasets = NULL, generic_categories = NULL))

Ideally what I would expect is that the sources that cannot be used, wont appear right?

Later looking in your nicely and very well commented code I found this function resource_info() that list in kind of a json :

library(tidyverse)
resource_info_json <- resource_info()
#if you want the json:
jsonlite::toJSON(resource_info_json, auto_unbox = TRUE) %>% jsonlite::prettify()

My suggestions:

  1. would it be possible maybe to include the link to the original URL?(this may help to ping the page and keep an eye in changes in the licensing)
  2. and also include in the documentation that when we see something like this:
    image

(I mean a resource associated with another resource with underscore) this means that the first_resource_name is used by the second_resource_name, and that's the only one to be used for the "Queries" described by the resource_info() object?
image

I understand it is a huge project and not sure if I am big enough to push some changes into it. But I could try, at least with the documentation atm if that helps.
Many thanks for this great effort!