Howdy Meltano community melty bouncy We care about your know Meltano #best-practices

Howdy Meltano community! :melty-bouncy: We care ab...

user

11/28/2022, 1:43 PM

Howdy Meltano community! melty bouncy We care about your knowledge & opinions. What are you data warehouse (the thing for raw and modelled analytical data!) NAMING practices? Let's brainstorm!

Sven Balnojan

11/28/2022, 1:46 PM

This poll is sparked by something I watched @aaronsteers discuss somewhere, sometime. Go ahead and add more things!

Sven Balnojan

11/28/2022, 1:48 PM

Fwiw., IMHO "fact" vs. "dim" is quite a misleading naming convention (I think even Kimball mentions that "fact tables" hold "event data", and dims "do not".); But conventions are very valuable to have, so we do need some 😄

Henning Holgersen

11/28/2022, 2:34 PM

I think I’m missing the gene that lets me understand fact tables. But they are there, and apparently some people are able to make use of them.

taylor

11/28/2022, 3:15 PM

Not sure how you want to put this into the poll, but for slightly cleaned up source data I like <source_name>_<table>. Past that I like prefixes for everything <prep|stg|staging|fct|dim|mart>_<table>

Sven Balnojan

11/28/2022, 3:50 PM

@taylor just put them in in any form you like. That's the point of a brainstorm 🙂

aaronsteers

11/28/2022, 8:29 PM

I added 7️⃣ above based on a DataOps training doc I wrote up in a past life. I really like that

list_

"makes sense" as a 'list of items', while also alpha-sorting close to

hist_

which is the same thing except needs to be joined with a date or version reference to avoid duplicates. This also automatically reserves a dedicated namespace for both type 1 and type 2 (or type n) versions of all dimensions, and no users of the table will have to wonder 'is dim_customers a type 1 or type 2 table?' - since

list_customers

is explicitly a list of all customers and

hist_customers

is always a history of customer dim changes

Sven Balnojan

11/28/2022, 9:06 PM

Haha "map = factless fact table" 😉 I like! Keep it coming!

Sven Balnojan

11/29/2022, 9:18 AM

Over sleep I got a few new ones! (1) First of all, having conventions (naming conventions) and checking them automatically, not just writing them down. I think that's key to keeping things organized. (2) Having "one per output" tables, e.g. "report_customerchurn" and "dashboard_whatever" kind of tables. I've way too often "reused" the same table for a different report, only to change it later and forgetting that I broke like 5 more dependent reports.

Sven Balnojan

11/29/2022, 9:19 AM

And FWIW, following @aaronsteers its worth pointing out, that naming only works if you also think about the types of tables. (names for types). I missed to put that into the description.

aaronsteers

11/29/2022, 4:57 PM

(2) Having "one per output" tables, e.g. "report_customerchurn" and "dashboard_whatever" kind of tables. I've way too often "reused" the same table for a different report, only to change it later and forgetting that I broke like 5 more dependent reports.

Agreed! I didn't list it above but I use

rpt_

for dedicated 'reports' or 'contracts' with other systems. Those are often views on top of the core dim/fact tables, and they may knowingly violate conventions about "proper" column naming and/or normalization patterns - since the point is to provide the downstream system exactly the set of columns they require. And per your point, those can't change as easily, since donwstream consumers will create a hard dependency on namings/etc.

Open in Slack

Previous Next