Hey guys, I would be interested in how you do the ...
# best-practices
o
Hey guys, I would be interested in how you do the monitoring for the pipelines. We are considering using the metric messages to set up custom Cloudwatch metrics on AWS and also setting up alarms if a job fails. I also know that a monitoring dashboard for the Meltano UI (which we dont use) is on the roadmap, but it would be very interesting to hear what kind of implementations you have.
k
Hey Ole 👋 This is 100% something we think meltano should help out with, either by persisting metric messages directly, or by supporting pluggable backends (to support common platforms like CloudWatch). Would love your input and +1 on this issue. In terms of what can be done today, there are a few options: • great_expectations is a great option in conjunction with the
_sdc_extracted_at
metadata column supported by some targets/loaders (and the SDK). You can set complex expectations around data latency, volumes etc. and pipe alerts to various places like slack or pagerduty. • Airflow alerts for when scheduled runs fail. There is a great write-up on the Astronomer docsite. • If you use dbt, tests are an alternative to
great_expectations
for checking data latency (also assuming the targets/loaders you are using add
_sdc_*
metadata cols). dbt is already supported natively in meltano meltano
Copy code
meltano invoke dbt:test
h
@ole_bause I second the recommendation on Great Expectations if you're looking for an open source solution. Adding a TDS article here that covers additional ideas. Lastly, Databand.ai, whom I work for, offers a data observability platform that monitors data pipelines.