Harryx etl

Harryx is a service that pulls data from NASA exoplanets to deliver to another service in my ecosystem.

Basically, we just want to know if a planet is of high severity or not. If yes, we get the data and make some transformations to be used. If not, we just skip it.

How to start

First, you need to download the CSV file with all the data, or you can exchange the fetch method in the extraction stage src/stages/extraction to fetchDataFromApi.

However, I regret to inform you that there are some issues with this fetch method in this version. The issue is with the stream, and I will be able to solve it later.

The command to download the CSV file with a large scale of data to be processed is:

wget "https://exoplanetarchive.ipac.caltech.edu/TAP/sync?query=select
+*+from+ps&format=csv" -O "confirmed_planets.csv"

After downloading, move the file to src/storage, so that the application can extract and transform the data inside it.

You can also define the amount of data you want to retrieve.

const planetsDataChunk = await fetchCSVDataInMemoryStream(40) // Quantity of planets to verify

This file you can be found at: File

After completing all of these steps, run the following command.

npm install

# run the script
npm run start:pipeline

Features

Pulls data from exoplanets into NASA
Transforms the data to be loaded into our database
Creates different types of data fetchers based on the variety of ways the NASA platform can get the same data
Creates a database to load the data
Configures Prisma to connect to the database
Configures a bash command to check if the script has already been run

Antonio-Gabriel/harryx-nasa-exoplanet-etl

Harryx etl

How to start

Features