This project uses dbt to extract, transform, and load Airbnb data from public data sources. The goal is to create a reliable and maintainable data pipeline used by BI tools and other analysts.
- Airbnb Data : This s3 bucket contains all the files used in this project.
This project uses the following tools:
- Dbt (Data Build Tool): A popular open-source tool for creating data pipelines that is built on top of SQL
- Snowflake: A cloud data warehousing platform that is used to store and manage the data
- GitHub: A web-based hosting service for version control and collaboration
- Dbt-expectation library package
- Dbt_utils package
All your files and folders are presented as a tree in the file explorer. You can switch from one to another by clicking a file in the tree.
The project is structured as follows:
├── analyses ├── macros ├── models ├── packages.yml ├── README.md ├── seeds ├── snapshots └── tests
seed/
: Contains the raw data to be uploaded tomodels/
: Contains the SQL models for the data pipelineanalysis/
: Contains any analysis files or scriptsmacros/
: Contains any custom macros used in the modelspackages.yml
: Contains external dependencies added to dbttests
: Contains model tests for different models- `snapshot: contains snapshot data on the listings table
README.md
: The file you're currently reading!
- Extract: Raw Airbnb data is extracted from the amazone s3 puplic bucket and stored in the
models/src/
folder. - Transform: The raw data is transformed into a format that is more suitable for analysis and stored in the
models/dim
folder. - Load: The transformed data is loaded into the Snowflake data warehouse and stored in the
models/dim
folder. - Analyze: The data is analyzed using SQL queries and visualizations.
To run the pipeline, follow these steps:
- Clone the repository to your local machine.
- Set up a Snowflake account and create a database and schema.
- Set up a dbt profile to connect to your Snowflake account.
- Run
dbt seed
to load the raw data into thedata/raw
folder. - Run
dbt run
to execute the data pipeline. - Analyze the data using SQL queries or visualizations in the
analysis/
folder.
This project demonstrates how dbt can be used to create a reliable and maintainable data pipeline for Airbnb data. With the help of dbt, we were able to easily extract, transform, and load the data into Snowflake, making it easy to analyze and report on.