What are your KPIs / North Star metrics for Data E...
# best-practices
s
What are your KPIs / North Star metrics for Data Engineering? How do you establish the productivity of your team? I know this is hard to quantify, but we're trying to set a few north star metrics that we could display to the rest of the company
s
I already have a blog draft inside a folder somewhere. IMHO you shouldn't try to reinvent the wheel and use what software engineers use. Track and display 1. Lead Time (how long does it take from request to have the final thing in production, and working) 2. Deployment Frequency (how often do you push something out?) 3. Mean time to restore (how much time passes between a bug happens, and production being restored) 4. % of failed deployments to prod. For data teams these metrics are just as telling IMHO, but a bit different in their behavior, because a full "deployment" usually also contains having data running through things at least once. Your dbt model isn't "deployed successfully" until the dbt model ran on old and new data, etc.... https://www.thoughtworks.com/en-de/radar/techniques/four-key-metrics
s
Awesome @Sven Balnojan, thank you so much! I think we also wanted to introduce a metric for costs also, but I've yet to phrase it properly Something like "query costs / TB stored"
s
Hm, So I simply did cost/month and that was received quite well 🙂 Especially if you can group cost/month by businessy related things, but I was used 1) storage 2) ingestion 3) transformation tooling 4) serving.
s
Costs / month would probably work, but as we're integrating more and more data, it seems unreasonable to set objectives that costs will remain stagnant or go down. So I'm trying to establish if there's a better way to illustrate costs based on the amount of data. But as a first metric costs/month should probably be sufficient 😄
s
Yes I suggest to simply go with the bulk (+ a little break down) numbers first, because data costs are high compared to what people expect (in my experience) and to what software engineering teams spend. Just making that part transparent is the important first step IMHO.
s
Hey @Sven Balnojan, I finished by using your metrics, and added to more: • Integrity coverage: How many of your columns are tested for integrity • Time to validate: Once a model is labelled as ready to validate, how long does it take to establish what impact the change will have (and if it is acceptable) Seems fair?
I'm trying to establish what a "all data in the warehouse is useful" KR would look like
s
@Stéphane Burwash sure! These sound like fair additions.