dayrize-usecase/dags/sustainability_score/README.md

27 lines
870 B
Markdown

# Sustainability score
This DAG orchestrates the ingestion and transformation of prodcuts from
Target's website to compute their sustainability score.
## Steps
* create_products_table: create the prodcuts table with it's schema
* etl_pipeline: run the apache beam etl process
* dbt_run: run `dbt run` to apply transformations
* dbt_test: run `dbt test` to test the data quality
## Config
The following parameters are available:
* `input`: location of the CSV input file
* `beam_etl_path`: location of the apache beam pipeline
* `products_table`: products_table table name
I decided not to configure the rest of the table locations because that makes
more sense to be defined in DBT.
Ideally, I would parametrize the dbt path as well. But the `cwd` parameter of
`BashOperator` is not a template and implementing that is not worth it for such
a minor improvement.