Does anyone have some best practices for implement...
# best-practices
s
Does anyone have some best practices for implementing a metrics layer? I'm trying to keep track of all my metrics and their definitions so that I can slice and dice them and serve them back to BI. I investigated dbt's metric solution, but it seems a bit wonkey and not that intuitive to implement in a master table. My concerns are: • How easy is it to implement • How easy is it to share with other team members • Does it require hosting • What is the cost (free ?) Any input would be greatly appreciated 😄
s
My expert opinion: don't implement one 😉
s
How do you manage a common definition for metrics?
s
So this is definitely a more edge case opinion from my side, so please ask other people. But I enjoy what Maxime at preset talks about (dataset centric visualization). I've been implementing a similar approach. The idea is simple: 1. Don't have a (technologically) SEPARATE common metrics definition 2. Instead keep your metrics where you keep the rest of your business logic & transformations (e.g. just put them into all of your dbt models). That will come with a few challenges, but IMHO with a vast set of benefits (simplicity and robustness above most).
b
In a previous project I computed metrics with Python jobs and pushed them into an all-in-one, hosted (and costly 😉 ) Prometheus/Grafana environment. As far as I know, it is the best stack to deal with metrics and time-series storage and visualization, even if the learning curve is steap at the beginning. As I am integrating Meltano for my current project, I was looking forward using some tools belonging to the ecosystem to compute and store metrics on Prometheus, but those seem to be lacking. I would be very grateful if some of you have a little more experience on that specific topic (ie integrating Prometheus with Meltano pipelines) 😀.
s
@Sven Balnojan thanks for the article. So the synopsis is - "For a dataset-centric approach, one should aim to define metrics in their transformation layer (as part of their models). This allows for better visibility, reusability and version control. It also removes the need for purchasing and maintaining additional tools. " In this new approach, would this imply re-defining metrics at every granularity (ex: median), or would we want to use some tools that integrate in your transformation layer but give more context to your BI layer (ex: dbt metrics)
a
@Sven Balnojan great article thanks! I think we arrived at a similar 'dataset centric' conclusion in our product- https://www.matatika.com/docs/dataml/datasetml/ We investigated building a metrics layer - delayed that feature to look at Cube, Looker, and later dbt metrics. I still fundamentally like the metrics layer approach, but everything I like about it is achievable with a 'model' / 'dataset' layer and minimal dynamic queries. Current view: The dynamic query building with a dataset approach is sufficiently powerful for most use cases. e.g. Sisense (periscope) queries.
s
@Stéphane Burwash, you would usually try to get away without additional tools, although the dbt metrics do take some of that off your shoulders. As @aaron_phethean points out, you usually are okay with a model/dataset layer and possibly something that allows for more dynamic queries (that's the granularity you're talking about). But honestly, I've never seen the precomputation of metrics as a severe problem. I have, however, witnessed a flood of dashboards that use 1000s of different granularities become a problem. 🙂 If you only precompute, you don't even need dbt metrics. And btw. "recomputing metrics" at every granularity isn't strictly necessary. What I've seen in the past is: 1. Compute it once at the lowest granularity 2. Use an aggregation function to auto-create all other granularities. Or even: 1. Compute the metric once in a separate model, then use appropriate functions to map them into their final models. If you have a problematic example, I'm happy to take a look. Would love to finally see an alarming instance of precomputation. (Fwiw, Max doesn't argue in favor of pre-computation, he's OK with views/ dynamic queries as described by @aaron_phethean).