Howdy Meltano community! :melty-bouncy: We care ab...
# best-practices
u
Howdy Meltano community! melty bouncy We care about your knowledge & opinions. What are you data warehouse (the thing for raw and modelled analytical data!) NAMING practices? Let's brainstorm!
s
This poll is sparked by something I watched @aaronsteers discuss somewhere, sometime. Go ahead and add more things!
Fwiw., IMHO "fact" vs. "dim" is quite a misleading naming convention (I think even Kimball mentions that "fact tables" hold "event data", and dims "do not".); But conventions are very valuable to have, so we do need some šŸ˜„
h
I think I’m missing the gene that lets me understand fact tables. But they are there, and apparently some people are able to make use of them.
t
Not sure how you want to put this into the poll, but for slightly cleaned up source data I like <source_name>_<table>. Past that I like prefixes for everything <prep|stg|staging|fct|dim|mart>_<table>
s
@taylor just put them in in any form you like. That's the point of a brainstorm šŸ™‚
a
I added 7ļøāƒ£ above based on a DataOps training doc I wrote up in a past life. I really like that
list_
"makes sense" as a 'list of items', while also alpha-sorting close to
hist_
which is the same thing except needs to be joined with a date or version reference to avoid duplicates. This also automatically reserves a dedicated namespace for both type 1 and type 2 (or type n) versions of all dimensions, and no users of the table will have to wonder 'is dim_customers a type 1 or type 2 table?' - since
list_customers
is explicitly a list of all customers and
hist_customers
is always a history of customer dim changes
s
Haha "map = factless fact table" šŸ˜‰ I like! Keep it coming!
Over sleep I got a few new ones! (1) First of all, having conventions (naming conventions) and checking them automatically, not just writing them down. I think that's key to keeping things organized. (2) Having "one per output" tables, e.g. "report_customerchurn" and "dashboard_whatever" kind of tables. I've way too often "reused" the same table for a different report, only to change it later and forgetting that I broke like 5 more dependent reports.
And FWIW, following @aaronsteers its worth pointing out, that naming only works if you also think about the types of tables. (names for types). I missed to put that into the description.
a
(2) Having "one per output" tables, e.g. "report_customerchurn" and "dashboard_whatever" kind of tables. I've way too often "reused" the same table for a different report, only to change it later and forgetting that I broke like 5 more dependent reports.
Agreed! I didn't list it above but I use
rpt_
for dedicated 'reports' or 'contracts' with other systems. Those are often views on top of the core dim/fact tables, and they may knowingly violate conventions about "proper" column naming and/or normalization patterns - since the point is to provide the downstream system exactly the set of columns they require. And per your point, those can't change as easily, since donwstream consumers will create a hard dependency on namings/etc.