Registry of data portals, catalogs, data repositories and e.t.c.
This is a transitional repository to create registry of all existing open data portals and repositories.
This is the first pillar of the open search engine project. Other pillars include:
- registry of all catalogs (this one)
- datasets raw metadata database
- unified dataset search index and search engine
- datasets backup and file cache
Please take a look at project mindmap to see it's goals and structure.
This registry includes description of the following data catalogs:
- Open data portals
- Geoportals
- Scientific data repositories
- Indicators catalogs
- Microdata catalogs
- Machine learning catalogs
- Data search engines
- API Catalogs
- Data marketplaces
- Other
This project inspired by Re3Data and Fairsharing projects. Key difference is the focus on open data as a broad topic, not just open research data.
Final version of this repository will be reorganized as database with publicly available open API and bulk data dumps.
Warning: this is temporary description and subject of change
Data catalog descriptions are YAML files in data/entities folder. Files separated by country/territory folders and inside each country folder there are folders like scientific, opendata, microdata, geo, search, marketplace, other.
Data.gov YAML file
access_mode:
- open
api: true
api_status: active
catalog_type: Open data portal
content_types:
- dataset
coverage:
- location:
country:
id: US
name: United States
level: 1
endpoints:
- type: ckanapi
url: https://catalog.data.gov/api/3
export_standard: CKAN API
id: catalogdatagov
identifiers:
- id: wikidata
url: https://www.wikidata.org/wiki/Q5227102
value: Q5227102
- id: re3data
url: https://www.re3data.org/repository/r3d100010078
value: r3d100010078
- id: fairsharing
url: https://fairsharing.org/FAIRsharing.6069e1
valye: FAIRsharing.6069e1
langs:
- EN
link: https://catalog.data.gov
name: NETL Energy Data eXchange
owner:
location:
country:
id: US
name: United States
level: 1
name: U.S. Department of Energy
type: Central government
software: CKAN
status: active
tags:
- government
- has_api
Datasets kept in data/datasets folder, right now it's catalogs.jsonl file generated by script builder.py in scripts folder.
Run python builder.py build
in scripts folder to regenerate catalogs.jsonl file from YAML files.
If you find any mistake or you have an additional data catalog to add, please generate pull request or write an issue.
Following data sources used:
- Stac Catalogs https://stacindex.org/catalogs - done
- Dataverse Installations https://iqss.github.io/dataverse-installations/data/data.json - done
- Open Data Inception https://data.opendatasoft.com/explore/dataset/open-data-sources%40public/information/ - done
- CKAN Portals across the world https://datashades.info/ - done
- Geonetwork Showcase https://github.com/geonetwork/doc/blob/develop/source/annexes/gallery/gallery-urls.csv - done
- PxWeb examples https://www.scb.se/en/services/statistical-programs-for-px-files/px-web/pxweb-examples/ - done
- DKAN Community https://getdkan.org/community - done
- Junar Clients https://junar.com/customers/ - done
- Datashades data portals list https://datashades.info/api/portal/list - done
- OpenSDG installations https://open-sdg.org/community - done
- MyCore Installations https://www.mycore.de/site/applications/list/ - done
- Elsevier Pure installations - https://www.elsevier.com/solutions/pure/pure-in-action - done
- CoreTrustSeal Repositories https://amt.coretrustseal.org/certificates - done
- GeoOrchestra installations https://www.georchestra.org/community.html - done
- EUDAT Repositories https://b2find.eudat.eu/organization/
- Data.Europe.eu catalogues https://data.europa.eu/data/catalogues?locale=en
- Re3Data https://www.re3data.org/
- RISources https://risources.dfg.de
- Spanish opendata initiatives https://datos.gob.es/en/accessible-initiatives
- INSPIRE Country catalogs https://inspire-geoportal.ec.europa.eu/overview.html?view=thematicEuOverview&theme=none
- Socrata OpenDataNetwork https://www.opendatanetwork.com/search?q= - done
- ArcGIS Hub search https://hub.arcgis.com/ - done
- Brazilian Catalogs of geodata metadata https://inde.gov.br/Estatisticas/CatalogosMetadados
- Open Data Monitor (outdated, but useful) https://www.opendatamonitor.eu
- List of French open data catalogs https://airtable.com/shrWxHPi2XjLu9xtM/tblwklJPsyayeH5lX
- Brazilian local government (state and municipal) open data portals https://github.com/augusto-herrmann/transparencia-dados-abertos-brasil/blob/main/data/valid/brazilian-transparency-and-open-data-portals.csv
- Russian and CIS countries data catalogs https://datacatalogs.ru
- EntryScape customers (Sweden) https://entryscape.com/en/customers/ - done
- Geolode, catalog of open geodata websites https://geolode.org
- WebCommons Dataset subset http://webdatacommons.org/structureddata/2022-12/stats/schema_org_subsets.html
- Major Smart Cities with Open Data (updated 2019) https://rlist.io/l/major-smart-cities-with-open-data-portals
- Registry of Open Access Repositories http://roar.eprints.org
- IPT: Integrated Publishing Toolkit installations - https://www.gbif.org/ipt
- Geoblacklight showcase - https://geoblacklight.org/showcase/ - done
Source code licensed under MIT license Data licensed under CC-BY 4.0 license