Hey I wasn't sure where to put this question. I've been tasked with researching a few different data cataloging tools. I've looked into Google Data Catalog and Amundsen so far. Has Meltano considered integrating with a data catalog or do you have any guidance around how people have done that?
t
taylor
05/17/2021, 9:18 PM
We’ve been thinking about the metadata layer a bit, but it’s not an immediate priority. What sort of data about Meltano would you want to see?
c
casey_mau
05/17/2021, 9:25 PM
DBT has a lot of good documentation it generates around the transformations it is doing. There are schemas being generated or curated during the ingest of data from third party. This could happen during pipeline creation. Being able to document questions around what fields mean and see/be able to search where that field is (which tables and such). The goal being to reduce the time to effectiveness for new data scientists and data analysts.
casey_mau
05/17/2021, 9:26 PM
I'm figuring this is somewhat outside of scope right now but I didn't want to work toward a custom solution when others were thinking or doing a better job in a different way.
Hi Casey and Taylor, I am the co-creator of Amundsen and co-founder of Stemma. I'd be happy to support - let me know how I can best help.
Amundsen/Stemma already has integration with dbt, where we ingest certain metadata from dbt (descriptions, tags, lineage) and surface it the data catalog. It's just a start - would love your feedback, @casey on if this solves the problem you were looking to solve. If not, would love to learn more.
@taylor while this isn't integration with Meltano per se, open to collaboration there as/when time is right.
Current mapping: https://docs.google.com/presentation/d/1N_1CYEFw2BC5G1elh4-351Pu9ZMUjQfnkW6MmJ_A4mM/edit#slide=id.gd427af01a8_0_5
Video: