Download the XCO2 data from the NASA website. Click on this link. In search box type Level2 (L2) data OCO2_Lite_V11r
Download the data in nc4 format and move the same to folder "/nc4_data" in the folder.
World Cities Database is proud to offer a simple, accurate and up-to-date database of the world's cities and towns under a creative commons license. Commercial use is allowed and is built from the ground up using authoritative sources such as the NGIA, US Geological Survey, US Census Bureau, and NASA.
The database is:
- Up-to-date: It was last refreshed in March 2022.
- Comprehensive: Over 4 million unique cities and towns from every country in the world.
- Accurate: Cleaned and aggregated from official sources.
- Includes latitude and longitude coordinates.
A single CSV file is downloaded and then saved into the "/City_data" folder.
Cement dataset was downloaded. Relevant data was then extracted and then saved to the "/Cement_data" directory.
World fire dataset can be found Fire Information for Resource Management System.
Download the csv file year wise and then save the same to the "/fire_data" directory.
WorldPop Hub is the collection of world population year wise
The dataset is available to download in Geotiff and ASCII XYZ format at a resolution of 30 arc (approximately 1km at the equator). The projection is Geographic Coordinate System, WGS84. The units are number of people per pixel. The mapping approach is Random Forest-based dasymetric redistribution.
Dataset has been downloaded from the above website year wise and combined into the single csv file. Then saved into the "/Population" folder
The Global Power Plant Database is a comprehensive, open source database of power plants around the world. It centralizes power plant data to make it easier to navigate, compare and draw insights for one’s own analysis. The database covers approximately 35,000 power plants from 167 countries and includes thermal plants (e.g. coal, gas, oil, nuclear, biomass, waste, geothermal) and renewables (e.g. hydro, wind, solar). Each power plant is geolocated and entries contain information on plant capacity, generation, ownership, and fuel type.
Dataset is downloaded and the saved onto the "/PowerPlant" folder.
All the dependent libraries can be found in the "/requirements.txt" file.
The same can be installed via code pip install -r requirements.txt
Run "/main.py" file and the final data will be extracted. Data is processed, filtered and merged. Final dataset is saved in the path "artifact/date__time/data_ingestion/feature_store"
Every instance is logged. Logs can be found in "/logs"