dayrize-usecase/etl
Ricard Illa b28ddc350d doc: added readme to etl 2023-06-26 12:09:50 +02:00
..
src feat: added pylint and pytest for etl 2023-06-26 11:53:40 +02:00
Dockerfile feat: added pylint and pytest for etl 2023-06-26 11:53:40 +02:00
Makefile feat: added pylint and pytest for etl 2023-06-26 11:53:40 +02:00
README.md doc: added readme to etl 2023-06-26 12:09:50 +02:00
dev-requirements.txt feat: added pylint and pytest for etl 2023-06-26 11:53:40 +02:00
justfile feat: added pylint and pytest for etl 2023-06-26 11:53:40 +02:00
pyproject.toml feat: added pylint and pytest for etl 2023-06-26 11:53:40 +02:00
requirements.txt feat: added pylint and pytest for etl 2023-06-26 11:53:40 +02:00

README.md

This is the ETL ppipeline to read elements from a CSV file, parsing/cleaning them up and inserting into a PostgreSQL It has been tested only with DirectRunner, but it could be moved to run on DataFlow easily.

Running

This is intended to be scheduled by Airflow but it the necessary packages are available it can also be run manually with:

python3 /etl/main.py \
    --runner=DirectRunner \
    --input="$CSV_INPUT_FILE" \
    --pg_hostname="$PG_HOSTNAME" \
    --pg_port="$PG_PORT" \
    --pg_username="$PG_USERNAME" \
    --pg_password="$PG_PASSWORD" \
    --pg_database="$PG_DATABASE" \
    --pg_table="$PG_TABLE"

Testing and linting

To help with development and testing a Dockerfile, a Makefile and justfile files are also provided.

The Makefile provides a mechanism to

  • automate the generation of dev-requirements.txt and requirements.txt out of pyproject.toml
  • automate the creation of a python virtual environment which contains the right python version (installed by pyenv) and the packages defined in pyproject.toml
  • automate the building of an OCI image with the necessary dependencies

The provided Dockerfile is used to build an image with the necessary packages to run pytest and pylint.

The provided justfile provides the commands to run pytest and pylint from a container.

If just is installed, pytest and pylint can be run like so:

just test
just lint