Hi, we are building a reporting delivery app to qu...
# best-practices
h
Hi, we are building a reporting delivery app to query warehouse database powered by Meltano. Meltano ELT jobs run every N intervals to update the warehouse database. Some of reports require a guarantee of "fresh enough" data up to past N minutes. Curious what are the common practices to check for data freshness or "last ELT success run timestamp" before executing a query in the warehouse? Is there a Meltano built-in metadata table in the warehouse database that we can query?
p
@huiming the system database has a runs table that you could probably get this information from. Or theres also the state get command to check on your job bookmarks if theyre all running using incremental state but the system database is probably a better bet.
h
Our reporting delivery app currently has access to query the warehouse database only, not the Meltano system database. Is there a common alternative to get this information directly by querying the warehouse database?
p
One option is to enable the loader to add metadata columns (example for target-snowflake) then you can query the max value of those metadata columns in the warehouse tables to see the most recent
_sdc_extracted_at
or
_sdc_batched_at
is
h
ah okay, this option will allow us to check the "source table" freshness correct? is there a way to determine the "mart models" freshness, or the "overall warehouse" freshness? some context: we need to generate a few "high-accuracy" summary reports only when all mart models that the reports depend on are "fresh enough". otherwise, the reporting delivery app will pause and check again in few minutes.
t
If you're using dbt you can use something like this to record when dbt models were run and then look at the run start/end timestamps to see if they're "recent enough". We use that approach to show when the source data in our reports was last updated.
m
We use https://hub.meltano.com/extractors/tap-meltano to pull Meltano run data into the warehouse where we can query it more easily
h
@thomas_briggs Do you reuse a specific dbt package to record the start/end timestamp?
I came across this post about maintaining a "freshness" table: https://bostata.com/staying-fresh-with-freshness-tables/ Do we think models freshness check is a common must-have feature? It looks like everyone use a slightly different solution to solve this problem currently.
a
New idea. ‘meltano run tap target newutility:insert’ Use the last step of the meltano tasks to insert the metadata into a warehouse table. Feels the same as the tap-meltano solution. (we never imagined it being used for this!)
h
Do we think this could become an official turn-key feature in Meltano? Or it should be the transformer's responsibility?
a
Either way, it's a good requirement to raise an issue for. Seems like a good approach to reporting. I like the idempotent nature, keep trying until it's ready. I'd guess the more typical approach is pipelines with dependencies.
h
typical approach is pipelines with dependencies.
@aaron_phethean do you refer to the reporting scenario? just wanna pick your brain on this: do you mean another common option is to trigger the reports exports as the last step of a ELT job?
a
Yeah. Essentially a sequence of pipelines, or one big pipeline. I like your design a lot more!