dayrize-usecase/dags/sustainability_score/README.md

870 B

Sustainability score

This DAG orchestrates the ingestion and transformation of prodcuts from Target's website to compute their sustainability score.

Steps

  • create_products_table: create the prodcuts table with it's schema
  • etl_pipeline: run the apache beam etl process
  • dbt_run: run dbt run to apply transformations
  • dbt_test: run dbt test to test the data quality

Config

The following parameters are available:

  • input: location of the CSV input file
  • beam_etl_path: location of the apache beam pipeline
  • products_table: products_table table name

I decided not to configure the rest of the table locations because that makes more sense to be defined in DBT.

Ideally, I would parametrize the dbt path as well. But the cwd parameter of BashOperator is not a template and implementing that is not worth it for such a minor improvement.