user
11/08/2022, 1:29 PMSven Balnojan
11/08/2022, 1:30 PMvisch
11/08/2022, 1:38 PMvisch
11/08/2022, 1:38 PMvisch
11/08/2022, 1:40 PMSven Balnojan
11/08/2022, 1:42 PMSven Balnojan
11/08/2022, 7:46 PMSven Balnojan
11/09/2022, 10:47 AMSven Balnojan
11/10/2022, 12:13 PMSven Balnojan
11/11/2022, 10:04 AMalexander_butler
11/15/2022, 5:05 AM4. Have assumptions on raw data and test them (on the raw ingested data), but don't let the ingestion fail if the assumptions fail! (Raw data is as it is, your assumptions might be wrong)@Sven Balnojan @visch I think this is an important question. Personally I think it makes more sense in most cases barring lots of noise / quantity to load all data and express the business logic that would've been a test as essentially a dead letter queue model during staging. Then you can test for any records in that model with just a
select *
yet you expressively declared business logic that says you dont want that stuff mucking up your downstream in the immediate-term but you DO want to act on it. You also dont want to take down downstream reporting (probably)
What do you think?alexander_butler
11/15/2022, 5:06 AMstg_stripe__payments
and a stg_stripe__payments_quarantine
or some suchSven Balnojan
11/15/2022, 7:16 AMvisch
11/15/2022, 1:23 PMpat_nadolny
11/15/2022, 2:50 PMalexander_butler
11/15/2022, 7:18 PMcase when
statements to just patch it in staging but push the original to quarantine
There could be a dbt test of some sort which looks at a models upstreams, appends _quarantine
, checks if the relation exists, then performs action pass or fail based on business specification on non-zero result