hey folks, i'm wondering what the community feels...
# best-practices
h
hey folks, i'm wondering what the community feels about this scenario: • tap 1 is refreshed every day at midnight • tap 2 is refreshed every 4 hours • there are some downstream transformations depending on both tap 1 & tap 2's tables. when / how often do you trigger downstream transformations that depend on data from both taps? what questions do you ask (checks do you run) prior to triggering the downstream transformation that uses both tap 1 & tap 2? I'm expecting this question may generate some opinionated answers, and I definitely welcome any experience / anecdote you would be comfortable sharing ❤️ thanks in advance!
a
How long does each tap and models take? If we assume tap1 is ok to run more often, I would consider the following: tap1 -> models tap2 -> tap1 -> models
h
Thanks Aaron!
I am wondering if anyone has build any strategies around data freshness. something like checking the freshness for tap 1 & tap 2, and then building the common models, and what the pros / cons of this approach might be.
a
Isn’t this the use case for incremental dbt models? Alternatively using pipeline stages as the check and conditionally trigger would work?
s
You could use
source freshness
in dbt to check if the source has been updated (by meltano or something else) and then go ahead with your scheduled pipeline