2023-06-21 19:12:03 +02:00
|
|
|
# Sustainability score
|
|
|
|
|
2023-06-26 08:48:37 +02:00
|
|
|
This DAG orchestrates the ingestion and transformation of prodcuts from
|
|
|
|
Target's website to compute their sustainability score.
|
|
|
|
|
|
|
|
## Steps
|
|
|
|
|
|
|
|
* create_products_table: create the prodcuts table with it's schema
|
|
|
|
* etl_pipeline: run the apache beam etl process
|
|
|
|
* dbt_run: run `dbt run` to apply transformations
|
|
|
|
* dbt_test: run `dbt test` to test the data quality
|
|
|
|
|
|
|
|
## Config
|
|
|
|
|
|
|
|
The following parameters are available:
|
|
|
|
|
|
|
|
* `input`: location of the CSV input file
|
|
|
|
* `beam_etl_path`: location of the apache beam pipeline
|
|
|
|
* `products_table`: products_table table name
|
|
|
|
|
|
|
|
I decided not to configure the rest of the table locations because that makes
|
|
|
|
more sense to be defined in DBT.
|
2023-06-26 10:04:28 +02:00
|
|
|
|
|
|
|
Ideally, I would parametrize the dbt path as well. But the `cwd` parameter of
|
|
|
|
`BashOperator` is not a template and implementing that is not worth it for such
|
|
|
|
a minor improvement.
|