Hey Team I am little new to meltanto so stuck at one place I Meltano #troubleshooting

Hey Team I am little new to meltanto so stuck at o...

yash_sharma

10/20/2023, 2:02 PM

Hey Team I am little new to meltanto so stuck at one place. I was able to successfully extract data from mongodb and load into postgres along with some basic transformations. But I want to use dbt transformation layer before dumping data into postgres but keep running into some errors can some one help here or point to some docs that I can refer for this? Appreciate any help here thanks

Andy Carter

10/20/2023, 2:04 PM

Can you share the errors you are getting? Typically you would do only very light (or no) in-flight transformation from the tap before landing in target postgres in a dedicated

raw

schema or similar. Then build your DBT models from that raw schema as a source.

yash_sharma

10/20/2023, 2:05 PM

hmm what I am trying to do right now is basically flatten few nested jsons from mongo before loading into postgres so wanted to use dbt there. Are you suggesting first load everything into postgres as in raw and then use dbt to create views and go forward from there?

Andy Carter

10/20/2023, 2:06 PM

Yes, ideally retain as close as possible to your original DB data unless you have good reason not to.

Andy Carter

10/20/2023, 2:07 PM

Some taps implement json flattening because that helps you get to the primary keys you want to use for replication. But you don't need the contents of the json to act as primary keys, then dump them in unnested, and do the rest in sql/dbt to form your staging model.

Andy Carter

10/20/2023, 2:10 PM

Sorry, should be 'some targets implement flattening' above, not taps

Andy Carter

10/20/2023, 2:10 PM

But I don't think you can be selective about which columns get flattened in that case.

Andy Carter

10/20/2023, 2:11 PM

https://hub.meltano.com/loaders/target-postgres#flattening_enabled-setting for example

yash_sharma

10/20/2023, 2:11 PM

hmm interesting out of curiosity why this approach is preferred? Apologies for asking noob questions here but still new to all this. I was of the opinion since I know none of our downstreams for which we are planning to use will need most of the data points why load complete nested json in postgres why not keep the db light only?

Andy Carter

10/20/2023, 2:15 PM

I would say given that most of the time is involved in replication from your sources, so if it turns out you are missing a field you need, you don't need to reingest from your tap every time, you just alter your dbt staging model slightly to grab that new field. Obviously you need to make consideration to the size of the data extracted too, other factors. It's a subjective thing in the end.

yash_sharma

10/20/2023, 4:51 PM

Makes sense thanks a lot @Andy Carter

Open in Slack

Previous Next