Notes after I asked what might be helpful for people in terms of an ETL testing framework

Josh Wilson - I can do unit test processor and queries, but testing that they mesh together properly is almost impossible, type-checking doesn’t work very properly

Might be useful to have both a unit testing solution and an integration testing solution

Taking the argo definition and backing it out to make sure it works properly is basically an exercise in hope

All of the argo definitions have –db=prod, hard code into all the definitions Basel run argo.test, ingest some code, kick up a manifest, actually run an end to end ETL (parametrize the environment, provide some bazel targets to build the targets, test the argo components and the ETL)

Want to have two levels of solution - testing a new loader in isolation, and also test that things all hang together.

It’s possible linting things could catch the obvious mistakes that we make (type checking would work, but click doesn’t support type checking)

Redshift to redshift makes it much easier to write mock test, can switch out a producer for fake producer, but once you do that, plugging in fake producers made writing tests a lot easier (maybe can try to do that for Kafka stuff)

This fits in the context of the broader Hypothesis testing note 202005141716

uid: 202005131031 tags: #theorem #meetings


Date
February 22, 2023