E3SM-Project/esgf_metrics

[Feature]: Reach out to CMCC to see if there is a way to access the ESGF publication/download stats database.

Opened this issue · 0 comments

Source: https://acme-climate.atlassian.net/wiki/spaces/IPD/pages/3974791234/2023.Q4%3A+Finish+phase+2+new+data+publication+features?focusedCommentId=3988095036

Yeah this task is for download and publication stats from ESGF.

ESGF/CMCC does have a database that stores download and publication stats. However, the public ESGF dashboards displaying these stats are not granular enough for our needs. The ESGF API can be queried for publication stats, but not download stats.

The esgf_metrics package collects more granular download stats in E3SM data in Native and CMIP6 formats, but it is only limited to the LLNL node.

I’ll need to reach out to CMCC again to see if there is a way to access their database. This might open up the ability to collect more comprehensive stats across nodes. We can also simplify esgf_metrics to just query this database instead of collecting and parsing logs at the LLNL node.

Also, I’ll need to see if CMCC stores individual HTTP request information such as IP address.

With esgf_metrics, I store parsed logs in a PostgreSQL database that includes IP addresses.