In this repo we collect the code created in our data engineering tutorials on the dlt REST API source – the easiest way to create pipelines loading data from REST API sources into various destination systems.
This verified REST API source allows us to develop a source connector using just a Python dictionary. This gives us the maximum flexibility of Python becasue we're not restricted to JSON or YAML. At the same time, the code is very easy to read and write.
These video tutorials cover topics, such as:
- Basic Walkthrough: Configure Endpoints with parent-child relationship
- Authentication: Bearer Token and HTTP Basic Auth
- Incremental Loading
We cover in pokemon_pipeline.py:
- Import endpoints
- Adding custom query parameters
- Specify parent-child relationship between endpoints
/berry/{berry_name}
- Install dlt and rest_api source
- Inspect loaded data
poetry install
dlt init rest_api duckdb
python pokemon_pipeline.py
Using streamlit app with GUI in web browser:
dlt pipeline pokemon_pipeline show
Alternatively, using duckdb CLI:
duckdb pokemon_pipeline.duckdb
use pokemon;
select
berry_details.name,
berry_details.size,
berry_details__flavors.flavor__name,
berry_details__flavors.potency
from berry_details
join berry_details__flavors on berry_details._dlt_id = berry_details__flavors._dlt_parent_id;
We cover:
- simple bearer token authentication in github_pipeline.py
- more complex HTTP Basic authentication in freshdesk_pipeline.py
We add the secrets into the hidden .dlt/secrets.toml
. For production use cases, we recommend using environment variables provided by a secrets manager.
More infos on dlt secrets and configs
We load github issues incrementally in github_pipeline.py.
As explained in the video, if you want to load a resource incrementally which was previously loaded with the write_disposition=(append, replace)
we need to reset the pipeline state. You can use:
dlt pipeline github_pipeline drop issues
This avoids dlt from attempting to apply an ALTER TABLE
statement adding a constraint on the ID
field which duckdb does not support at the moment (v0.9.2).
In zoom.py, we implement a connector to the Zoom API to load meeting and webinar information. We implemented the specific OAuth 2.0 implementation for Zoom. Also, we implemented response actions, such as ignoring certain error messages or HTTP status codes.
See tutorial blog post here: How To Create A dlt Source With A Custom Authentication Method