Lighter dependency installation for reading versions

The dependencies are a bit heavy:

Lines 45 to 60 in 5440e5a

    
           install_requires = 
        
               requests 
        
               requests_ftp 
        
               beautifulsoup4 
        
               cachier<=1.5.0 
        
               pystow>=0.1.0 
        
               click 
        
               click_default_group 
        
               dataclasses; python_version < "3.7" 
        
               dataclasses_json 
        
               tabulate 
        
               more_click 
        
               pyyaml 
        
               tqdm 
        
               bioregistry>=0.2.6 
        
               lxml

I imagine this is because these dependencies are required to generate docs/_data/versions.yml.

For related-sciences/ensembl-genes#1, I just want to get the latest version of a resource like:

import bioversions

ensembl_version = bioversions.get_version("ensembl")

Would it make sense to have a dependency set that just supports reading versions that have already been aggregated into versions.yml?

Users can just get this by URL, but is the format stable? Is there also a JSON version that would avoid having to install a yaml parser?

I just made a note in related-sciences/ensembl-genes#1 (comment) on how to do this by directly getting some JSON. The format is stable so I'd say you can depend on it looking like this. Maybe I will add an additional metadata field or two from the bioregistry for convenience in the future, but that wouldn't break anything.

I guess it's the case that there are a lot of dependencies, but most of them are small utilities that I'd expect most environments to have if they're installing any other common stuff. For things like pystow and the bioregistry, I have been careful to keep them as lean as possible so they don't install a lot of transitive dependencies. I would be hesitant to remove some of the dependencies like pyyaml, lxml, beautifulsoup4, requests_ftp because most users won't/shouldn't have to know which ones will be used by each getter. I think it would be pretty confusing to have a lean version of bioversions that just supports looking stuff up in the JSON when most usage of this package directly is to interact with the sites on demand.

most usage of this package directly is to interact with the sites on demand

I see. I like having your CI do the interaction and for us to consume the output.

The JSON approach is simple enough for us:

import requests

url = "https://raw.githubusercontent.com/biopragmatics/bioversions/main/src/bioversions/resources/versions.json"
res_json = requests.get(url).json()
versions = {
    entry["prefix"]: entry["releases"][-1]["version"]
    for entry in res_json["database"]
    if "prefix" in entry
}
ensembl_version = versions["ensembl"]

I'd expect most environments to have if they're installing any other common stuff

I noticed because lot's of packages were added to poetry.lock in related-sciences/ensembl-genes@8f3ac75.

	install_requires =
	requests
	requests_ftp
	beautifulsoup4
	cachier<=1.5.0
	pystow>=0.1.0
	click
	click_default_group
	dataclasses; python_version < "3.7"
	dataclasses_json
	tabulate
	more_click
	pyyaml
	tqdm
	bioregistry>=0.2.6
	lxml