Our main goal is to provide clean, verified, cross-checked and normalized datasets on GHG emissions.
-
Domain knowledge
- Tasks:
- Build a knowledge base on GHG emissions concepts, phenomenon, methods...
- Share and explain main concepts to non-specialist
- Outputs:
- A report for synthesizing concepts related to GHG emissions data
- A wiki for defning the main used concepts
- Tasks:
-
Data cartography
- Tasks:
- Identify and document GHG emissions data sources
- Outputs:
- A cartography of relevant data sources on GHG emissions with documentation
- A datacatalog for
- Tasks:
-
Data exploration reports
- Analyse raw data and provide insights
- Compare various sources
-
Normalized GHG emissions database
- Combine and clean data sources
- Model and maintain an aggregated data base
-
Normalized GHG emissions data sharing
- Develop API to query normalized data source
- Develpop dashs to explore it
There are vaious ways to contribute to this project depending on your skills and motivation.
Data are information collected by observation or measurements to capture and understand some phenomenon. Before exploring open GHG emissions data, we need to:
- Understand the phenomenon behind it: carbon cycle, greenhouse gazes, carbon budget, carbon sinks...
- Know how data is collected: carbon inventories...
- Know why data is collected: carbon budget, carbon pricing...
This knowledge may be shared in the wiki page
- Identifying and referening new data srouces:
You can find is this page a list of identified GHG emissisons data soruces. If you identify a new data source, please add it in the list by specifying some metadata (provider, spatial coverage, format...). If you think that data sources content need to be investigated, please add an issue in the project backlog.
There are several ways to collecte the data. Since we do not have data storage infrastructure, it would be prefereable to use data provided b
- API acesss: If data source can be queried by using an existing API, please document data collection method in the associated readme file.
- http acccess: If data is stored for example in a git repo, please mention the http address used for collecting it.
- Bulk download: If data source is only accessible for download, wa can use kaggle datasets to store and document the raw data. Please, provide sample script fro accessing the data through Kaggle API.
- API: Develop API for querying the normalized data
- Viz: Develop dashboards for exploring contents.
Folder structure options and naming conventions for software projects
project_name/
├── README.md # overview of the project
├── data/ # data files used in the project
├── README.md # describes where data came from
└── sub-folder/ # subdirectories by data source
├── notebooks/ # notebooks for exploring raw data
├── processed_data/ # intermediate files from the analysis
├── manuscript/ # manuscript describing the results
├── app/ # app to visualize the main dashs
├── results/ # results of the analysis (data, tables, figures)
├── src/ # contains all code in the project
│ ├── LICENSE # license for your code
│ ├── requirements.txt # software requirements and dependencies
│ └── ...
└── doc/ # documentation for your project
├── index
└── ...