Hi everyone we use singer tap with log based replication and Meltano #best-practices

Hi everyone, we use singer tap with log based repl...

alexandre_brehelin

04/17/2022, 3:09 PM

Hi everyone, we use singer tap with log based replication and dbt for our ELT. We are still hesitating between dbt test and great expectations to add quality test in our process. But, in our mind, neither between dbt test and GE able us to test the replication quality. Do you have any advise to monitor and test the EL step in log based replication ? Our first idea, is to compare at sometime the row number from data source against the loading data with GE

taylor

04/18/2022, 2:48 PM

It depends a lot on the system you’re pulling from. If it’s from a database (which it sounds like it is 😄 ) then row counts can be good, but they’re prone to false positives due to timing. If there’s any sort of lag in the extract then row counts can be consistently off. Then you get into threshold setting, etc. One workaround is to do time-based row counts, but again that depends on what the upstream data source looks like (i.e. it wouldn’t capture row updates, just additions / deletions). Taking random samples of data can be good as well but you don’t want to bog down either system too much.

alexandre_brehelin

04/19/2022, 11:06 AM

Thank you Taylor, We think your proposal is good first approach. We are very suprised to know that any tool/library able us to industrialize comparaison between data from Extraction to Load.

alexandre_brehelin

04/19/2022, 11:07 AM

We think many side effect can occur with bin log replication.

alexandre_brehelin

04/19/2022, 11:07 AM

Thanks 🙂

3 Views

Open in Slack

Previous Next