Hi Meltano community, we've been running our pipel...
# best-practices
a
Hi Meltano community, we've been running our pipelines successfully at Gainy.app for almost 6 months now. Now, when we are moving into a production, we've been thinking about data monitoring. Does anyone have an opinion or suggested best practice on this topic? I.e. we would like to monitor and alert on data quality issues. I.e. if a change in pipeline corrupts data in the resulting data set (i.e. number of rows drops from millions to hundreds, etc)
a
Meltano's recent move to Structured Logging should help with this. cc@florian.hines and @ken_payne
f
Yea, the structured logging piece helps make ingesting the existing logs a bit easier into places like Datadog/Loggly/etc and turning the logs we currently emit into alerts/metrics. We’ve also got additional issues like: https://gitlab.com/meltano/meltano/-/issues/2805 https://gitlab.com/meltano/meltano/-/issues/3008 That we’re thinking about or have on the backlog already.
That second issue references log shipping - but the convo is mostly around metrics.
a
Thanks for the answers. Yes indeed structured logs is logical first step, but I'd imagine that those logs will have to be parsed into metrics values and then alerted on those metrics in a separate monitoring system
j
maybe add some dbt tests as part of transformation?
m
I would love a way to use the singer metrics to use in monitoring. As long as this is not possible I create my own metric using prometheus pushgateway and grafana to show the metrics.
r
would it make sense to push logs and metrics to a loader?
a
Regarding
dbt tests
I see test and monitoring as complimentary quality assurance strategies analogous to unit/functional tests and observability tools like datadog in app development. Having one does not automatically exclude the other. So I'd like to have both dbt tests and also production live monitoring that, for instance notifies on sudden drop of row counts in a table. Capturing row count drop is not possible with dbt tests in my understanding, as it requires knowledge of the historical row count.