will_musgrave
01/12/2023, 8:43 PMtransform
step that I can customize with DBT. (I know it's not strictly a Meltano question, but I figure someone here might be able to advise me).
I've already extracted documents from MongoDB and loaded them into my Postgres data warehouse. I'm looking to create a staging model in DBT where I flatten the schema for the documents that have been put into the tables. That being said, each document has an array of transactions
in it, in addition to global document-specific information. I'm wondering if anyone knows any best practices for dealing with data of that kind. Should I create a separate staging model for the transactions or should I keep it all as a single staging model?christoph
01/12/2023, 8:45 PMwill_musgrave
01/12/2023, 8:45 PMchristoph
01/12/2023, 8:56 PMjan_soubusta
01/13/2023, 8:56 AMSELECT sum(fact), attribute FROM fact_table GROUP BY attribute
Assume that cardinality of the attribute is 2 (boolean).
Columnar databases can encode/compress such column extremely efficiently, so the query finish even with subsecond latency on top of billions of rows.
And now imagine you would have to run this query like this:
SELECT SUM(fact), dimx.attribute
FROM fact_table JOIN dim1 ..... JOIN dimX
GROUP BY dimX.attribute
This will always be much slower in these clustered databases.will_musgrave
01/13/2023, 8:05 PM