Have your first meltano pipeline run within 5 minutes using this repository, even if you never touched Meltano before.
No install needed, just a GitHub account (and a few spare Codespaces minutes you get for free anyways).
Let's get started!
If you opened this from our homepage, you can go straight to Step 1.
Click "Open on Codespaces", to launch this project into a ready to use web VS-Code version with everything preloaded.
Make sure to open up the README.md inside Codespaces as well.
Notes on codespaces:
- If you at any point get an error "The user denied permission to use Service Worker", then you need to enable third-party cookies. It's a codespaces related problem.
- In our experience, codespaces work best in Chrome or Firefox, not so well in Safari.
- Files in codespaces autosave! No need to save anything.
There's a csv customers.csv with
- customer names, e-mail adresses and ips
- you're going to extract this CSV and load it into an SQL-database, like, right now!
Go ahead, just run
meltano run tap-csv hide-ips target-duckdb
And boom, you're done. Don't believe us? You can use a helper function to check the SQL-database:
./meltano_tut select_db
A few fun things you can notice:
- There are no ip addresses inside the database, right? Check customers.csv, there were there.
- That's because above you added a "mapper" "hide-ips" that is completely customizable and in this case hashes the ips.
- In the console output - Meltano told you at the beginning of the log ... "Schema 'raw' does not exist."
- That is because Meltano has a lot of helper functions. It e.g. creates schemas and tables, should they not already exist.
Feel free to explore the project, or dive right into building it yourself!
Inside the terminal (bottom window) run:
./meltano_tut init
This runs a wrapped "meltano init", adding demo data for you to have fun with.
You can take a look around:
- there is a file "data/customers.csv", it is the one you will be loading into a datawarehouse.
- there are now a bunch of Meltano project files, including the important "meltano.yml"
Add your first extractor to get data from the CSV. Do so by running inside the terminal:
meltano add extractor tap-csv
Then open up the file meltano.yml
, copy the config below, and paste it below pip_url
(that should be line 14).
config:
files:
- entity: raw_customers
path: data/customers.csv
keys: [id]
Your config for tap-csv in meltano.yml
should look like this:
plugins:
extractors:
- name: tap-csv
variant: meltanolabs
pip_url: git+https://github.com/MeltanoLabs/tap-csv.git
config:
files:
- entity: raw_customers
path: data/customers.csv
keys: [id]
Let's test the tap by running:
meltano invoke tap-csv
If everything works as expected, Meltano should extract the CSV and dump it as a "stream" onto standard output inside the terminal.
Next add a loader to load our data into a local duckdb:
meltano add loader target-duckdb
Copy the configuration below and paste it below the pip_url
(into line 23) for target-duckdb in the meltano.yml
file.
config:
filepath: output/my.duckdb
default_target_schema: raw
The config in meltano.yml
for target-duckdb should look like this:
loaders:
- name: target-duckdb
variant: jwills
pip_url: target-duckdb~=0.4
config:
filepath: output/my.duckdb
default_target_schema: raw
Now you can do your first complete EL run by calling meltano run
!
meltano run tap-csv target-duckdb
Perfect!
To view your data you can use our little helper:
./meltano_tut select_db
This will run a SELECT * FROM public.raw_customers
on your duckdb instance and write the output to the terminal.
Great! You've completed your first extract and load run. 🥳
Notice that the data you just viewed had plain IP adresses inside of it? Let's quickly get rid of those!
Add a "mapper" to do slight modifications on the data we're sourcing here.
meltano add mapper transform-field
Then paste the following config below the pip_url
(into line 30) for the transform-field
mapper in your meltano.yml
file.
mappings:
- name: hide-ips
config:
transformations:
- field_id: "ip_address"
tap_stream_name: "raw_customers"
type: "HASH"
The full configuration for the mapper transform-field
should look like this:
mappers:
- name: transform-field
variant: transferwise
pip_url: pipelinewise-transform-field
executable: transform-field
mappings:
- name: hide-ips
config:
transformations:
- field_id: "ip_address"
tap_stream_name: "raw_customers"
type: "HASH"
Now let's re-run our pipeline but this time with the mapper. You run it by calling:
meltano run tap-csv hide-ips target-duckdb
To view the data again, run the helper again:
./meltano_tut select_db
That was fun and quick! Now try to run
meltano dragon
just for the fun of it! 🐉
More things you can explore inside this codespace:
-
Meltano VS Code Extension
Do you see this little dragon on the left hand side?
That's the Meltano VS Code extension. It allows you to view and add all possible taps & targets we currently have on Meltano Hub. Take a look at them!
-
Add another target
Why don't you try to add a second output? Try to add
target-jsonl
and do ameltano run tap-csv target-jsonl
. -
Add another tap
Next, try to add another tap, for instance the
tap-carbon-intensity
, play around with it and push the data into either target.
Once you're done, head over to the docs and check out our great getting started tutorial for more details, add a job and schedule to easily orchestrate your extract & load processes, and deploy it to production.
- Explore different replication methods to run incremental loads instead of full syncs
- Explore deploying to Github Actions.
- Explore using environments to change configuration at runtime
- Explore running dbt and other tools with Meltano