This project is a learning-by-doing data model build with dbt-core for an imaginary company selling postcards.
The company sells both directly but also through resellers in the majority of European countries.
This model is used by my other projects:
rawunrefined input datastagingstaging areacorecurated data
- dim_channel
- dim_customer
- dim_date
- dim_geography
- dim_sales_agent
- fact_sales
The data is generated as parquet files by a Python script generator/generate.py using user-defined assets assets.py. These may be adjusted as per needs.
-
Rename
.env.exampleto.env. This will contain relative paths for the database file (datamart.duckdb) and parquet input files -
Rename
shared\db\datamart.duckdb.exampletoshared\db\datamart.duckdbor initiate an empty database there with the same name. -
Create a Python Virtual Environment (ensure at least Python 3.10 is installed)
python3 -m venv .venv
- Add environment variables to the virtual environment
cat .env >> .venv/bin/activate
- Activate the Python venv
source .venv/bin/activate
- Change the working directory to
generator
cd generator
- Install the required packages
pip install -r requirements.txt
- Generate the data
python3 generate.py
The generated data will be under shared/parquet.
- Ensure the virtual environment is activated
source .venv/bin/activate
- Change directory to
postcard_company
cd postcard_company
-
Run
dbt depsto install dependencies -
Run
dbt seedto import the seed (static) data -
Run
dbt compileto compile the project -
Run
dbt runto run the models -
Run
dbt testto run the tests
