Codespaces Meltano CLI Starter

Have your first meltano pipeline run within 5 minutes using this repository, even if you never touched Meltano before.

No install needed, just a GitHub account (and a few spare Codespaces minutes you get for free anyways).

Let's get started!

Step 0 - Open Codespaces

If you opened this from our homepage, you can go straight to Step 1.

Click "Open on Codespaces", to launch this project into a ready to use web VS-Code version with everything preloaded.

Open Codespaces

Make sure to open up the README.md inside Codespaces as well.

Notes on codespaces:

  • If you at any point get an error "The user denied permission to use Service Worker", then you need to enable third-party cookies. It's a codespaces related problem.
  • In our experience, codespaces work best in Chrome or Firefox, not so well in Safari.
  • Files in codespaces autosave! No need to save anything.

Step 1 - What you're building - let's take a sneek peak

There's a csv customers.csv with

  • customer names, e-mail adresses and ips
  • you're going to extract this CSV and load it into an SQL-database, like, right now!

Go ahead, just run

meltano run tap-csv hide-ips target-duckdb

And boom, you're done. Don't believe us? You can use a helper function to check the SQL-database:

./meltano_tut select_db

A few fun things you can notice:

  1. There are no ip addresses inside the database, right? Check customers.csv, there were there.
  2. That's because above you added a "mapper" "hide-ips" that is completely customizable and in this case hashes the ips.
  3. In the console output - Meltano told you at the beginning of the log ... "Schema 'raw' does not exist."
  4. That is because Meltano has a lot of helper functions. It e.g. creates schemas and tables, should they not already exist.

Feel free to explore the project, or dive right into building it yourself!

Let's go ahead and build it ourselves within 5 minutes

Step 2 - Initialize Meltano Project

Inside the terminal (bottom window) run:

./meltano_tut init

This runs a wrapped "meltano init", adding demo data for you to have fun with.

You can take a look around:

  • there is a file "data/customers.csv", it is the one you will be loading into a datawarehouse.
  • there are now a bunch of Meltano project files, including the important "meltano.yml"

Step 3 - Add your first extractor

Add your first extractor to get data from the CSV. Do so by running inside the terminal:

meltano add extractor tap-csv

Then open up the file meltano.yml, copy the config below, and paste it below pip_url (that should be line 14).

    config:
      files:
      - entity: raw_customers
        path: data/customers.csv
        keys: [id]

Your config for tap-csv in meltano.yml should look like this:

plugins:
  extractors:
  - name: tap-csv
    variant: meltanolabs
    pip_url: git+https://github.com/MeltanoLabs/tap-csv.git
    config:
      files:
      - entity: raw_customers
        path: data/customers.csv
        keys: [id]

Step 4 - Test run your tap

Let's test the tap by running:

meltano invoke tap-csv

If everything works as expected, Meltano should extract the CSV and dump it as a "stream" onto standard output inside the terminal.

Step 5 - Add a loader

Next add a loader to load our data into a local duckdb:

meltano add loader target-duckdb

Copy the configuration below and paste it below the pip_url (into line 23) for target-duckdb in the meltano.yml file.

    config:
      filepath: output/my.duckdb
      default_target_schema: raw

The config in meltano.yml for target-duckdb should look like this:

  loaders:
  - name: target-duckdb
    variant: jwills
    pip_url: target-duckdb~=0.4
    config:
      filepath: output/my.duckdb
      default_target_schema: raw

Step 6 - Run your EL pipeline

Now you can do your first complete EL run by calling meltano run!

meltano run tap-csv target-duckdb

Perfect!

Step 7 - View loaded data

To view your data you can use our little helper:

./meltano_tut select_db

This will run a SELECT * FROM public.raw_customers on your duckdb instance and write the output to the terminal.

Great! You've completed your first extract and load run. 🥳

Step 8 - Remove plain text IP adresses

Notice that the data you just viewed had plain IP adresses inside of it? Let's quickly get rid of those!

Add a "mapper" to do slight modifications on the data we're sourcing here.

meltano add mapper transform-field

Then paste the following config below the pip_url (into line 30) for the transform-field mapper in your meltano.yml file.

    mappings:
    - name: hide-ips
      config:
         transformations:
         - field_id: "ip_address"
           tap_stream_name: "raw_customers"
           type: "HASH"

The full configuration for the mapper transform-field should look like this:

 mappers:
  - name: transform-field
    variant: transferwise
    pip_url: pipelinewise-transform-field
    executable: transform-field
    mappings:
    - name: hide-ips
      config:
          transformations:
          - field_id: "ip_address"
            tap_stream_name: "raw_customers"
            type: "HASH"

Now let's re-run our pipeline but this time with the mapper. You run it by calling:

meltano run tap-csv hide-ips target-duckdb

To view the data again, run the helper again:

./meltano_tut select_db

Step 9 - Celebrate your success 🎉

That was fun and quick! Now try to run

meltano dragon

just for the fun of it! 🐉

Next Steps

More things you can explore inside this codespace:

  • Meltano VS Code Extension

    Do you see this little dragon on the left hand side?

    Dragon

    That's the Meltano VS Code extension. It allows you to view and add all possible taps & targets we currently have on Meltano Hub. Take a look at them!

  • Add another target

    Why don't you try to add a second output? Try to add target-jsonl and do a meltano run tap-csv target-jsonl.

  • Add another tap

    Next, try to add another tap, for instance the tap-carbon-intensity, play around with it and push the data into either target.

Once you're done, head over to the docs and check out our great getting started tutorial for more details, add a job and schedule to easily orchestrate your extract & load processes, and deploy it to production.

(Coming Soon 🏗️) Advanced Tutorial