7d898c6297 | ||
---|---|---|
.. | ||
sql | ||
README.md | ||
__init__.py |
README.md
Sustainability score
This DAG orchestrates the ingestion and transformation of prodcuts from Target's website to compute their sustainability score.
Steps
- create_products_table: create the prodcuts table with it's schema
- etl_pipeline: run the apache beam etl process
- dbt_run: run
dbt run
to apply transformations - dbt_test: run
dbt test
to test the data quality
Config
The following parameters are available:
input
: location of the CSV input filebeam_etl_path
: location of the apache beam pipelineproducts_table
: products_table table name
I decided not to configure the rest of the table locations because that makes more sense to be defined in DBT.
Ideally, I would parametrize the dbt path as well. But the cwd
parameter of
BashOperator
is not a template and implementing that is not worth it for such
a minor improvement.