This is the ETL ppipeline to read elements from a CSV file, parsing/cleaning them up and inserting into a PostgreSQL It has been tested only with DirectRunner, but it could be moved to run on DataFlow easily. ## Running This is intended to be scheduled by Airflow but it the necessary packages are available it can also be run manually with: ```sh python3 /etl/main.py \ --runner=DirectRunner \ --input="$CSV_INPUT_FILE" \ --pg_hostname="$PG_HOSTNAME" \ --pg_port="$PG_PORT" \ --pg_username="$PG_USERNAME" \ --pg_password="$PG_PASSWORD" \ --pg_database="$PG_DATABASE" \ --pg_table="$PG_TABLE" ``` ## Testing and linting To help with development and testing a `Dockerfile`, a `Makefile` and `justfile` files are also provided. The `Makefile` provides a mechanism to * automate the generation of `dev-requirements.txt` and `requirements.txt` out of `pyproject.toml` * automate the creation of a python virtual environment which contains the right python version (installed by pyenv) and the packages defined in `pyproject.toml` * automate the building of an OCI image with the necessary dependencies The provided `Dockerfile` is used to build an image with the necessary packages to run `pytest` and `pylint`. The provided `justfile` provides the commands to run `pytest` and `pylint` from a container. If [`just`](https://github.com/casey/just) is installed, `pytest` and `pylint` can be run like so: ```sh just test just lint ```