doc: added readme to etl
parent
4ba29e7e1d
commit
b28ddc350d
|
@ -0,0 +1,47 @@
|
|||
This is the ETL ppipeline to read elements from a CSV file, parsing/cleaning
|
||||
them up and inserting into a PostgreSQL
|
||||
It has been tested only with DirectRunner, but it could be moved to run on
|
||||
DataFlow easily.
|
||||
|
||||
## Running
|
||||
|
||||
This is intended to be scheduled by Airflow but it the necessary packages are
|
||||
available it can also be run manually with:
|
||||
|
||||
```sh
|
||||
python3 /etl/main.py \
|
||||
--runner=DirectRunner \
|
||||
--input="$CSV_INPUT_FILE" \
|
||||
--pg_hostname="$PG_HOSTNAME" \
|
||||
--pg_port="$PG_PORT" \
|
||||
--pg_username="$PG_USERNAME" \
|
||||
--pg_password="$PG_PASSWORD" \
|
||||
--pg_database="$PG_DATABASE" \
|
||||
--pg_table="$PG_TABLE"
|
||||
```
|
||||
|
||||
## Testing and linting
|
||||
|
||||
To help with development and testing a `Dockerfile`, a `Makefile` and
|
||||
`justfile` files are also provided.
|
||||
|
||||
The `Makefile` provides a mechanism to
|
||||
|
||||
* automate the generation of `dev-requirements.txt` and `requirements.txt` out
|
||||
of `pyproject.toml`
|
||||
* automate the creation of a python virtual environment which contains the
|
||||
right python version (installed by pyenv) and the packages defined in `pyproject.toml`
|
||||
* automate the building of an OCI image with the necessary dependencies
|
||||
|
||||
The provided `Dockerfile` is used to build an image with the necessary packages
|
||||
to run `pytest` and `pylint`.
|
||||
|
||||
The provided `justfile` provides the commands to run `pytest` and `pylint` from
|
||||
a container.
|
||||
|
||||
If [`just`](https://github.com/casey/just) is installed, `pytest` and `pylint` can be run like so:
|
||||
|
||||
```sh
|
||||
just test
|
||||
just lint
|
||||
```
|
Loading…
Reference in New Issue