dayrize-usecase/dags/sustainability_score
Ricard Illa 7d898c6297 fix: airflow's BashOperator's cwd cannot be templated 2023-06-26 10:04:28 +02:00
..
sql feat: implemented incremental model for scored products 2023-06-25 22:41:58 +02:00
README.md fix: airflow's BashOperator's cwd cannot be templated 2023-06-26 10:04:28 +02:00
__init__.py fix: airflow's BashOperator's cwd cannot be templated 2023-06-26 10:04:28 +02:00

README.md

Sustainability score

This DAG orchestrates the ingestion and transformation of prodcuts from Target's website to compute their sustainability score.

Steps

  • create_products_table: create the prodcuts table with it's schema
  • etl_pipeline: run the apache beam etl process
  • dbt_run: run dbt run to apply transformations
  • dbt_test: run dbt test to test the data quality

Config

The following parameters are available:

  • input: location of the CSV input file
  • beam_etl_path: location of the apache beam pipeline
  • products_table: products_table table name

I decided not to configure the rest of the table locations because that makes more sense to be defined in DBT.

Ideally, I would parametrize the dbt path as well. But the cwd parameter of BashOperator is not a template and implementing that is not worth it for such a minor improvement.