Ellis Valentiner
07/11/2025, 2:34 PMmeltano.yml
file? Specifically ours is very verbose and contains a lot of duplication. For instance, we have an inline stream map on every table to add a source database identifier. If we try to define that with a *
to apply to all tables then it errors because it doesn't respect the contents of select
. This means for each extractor we have the same table identified in the stream_maps
, select
, and metadata
blocks. So we're constantly jumping around the yaml to make updates and its very easy for devs to miss 1 of the 3 places that need to be updated.visch
07/11/2025, 3:18 PMEllis Valentiner
07/11/2025, 3:22 PMEllis Valentiner
07/11/2025, 3:23 PMEllis Valentiner
07/11/2025, 3:25 PMvisch
07/11/2025, 4:33 PMvisch
07/11/2025, 4:34 PMEllis Valentiner
07/11/2025, 4:35 PMvisch
07/11/2025, 4:41 PMtable1.*
work?
Both of those would reduce your line count pretty quick, stream maps, making a small repeatable example is what I"d go for locally with a PG db ie run docker run -e POSTGRES_PASSWORD=postgres -p 5432:5432 -h postgres -v /home/visch/postgres_data:/var/lib/postgresql/data -d <http://docker.io/library/postgres:16|docker.io/library/postgres:16>
Setup 1-2 tables (no metadata etc) and try to do the same, then post the full troubleshooting stuff sepeartely, there's no logs in your example
I can't right now as I'm busy busy getting ready for stuff myself.visch
07/11/2025, 4:42 PMmeltano.yml
visch
07/11/2025, 4:43 PMEllis Valentiner
07/11/2025, 4:51 PMcustomer_id
column is actually one of our inline stream maps. We have single tenant databases that get replicated to a single warehouse and we add the customer/database identifier on the fly. If we rely on Postgres's PKs then we end up clobbering and writing over data from different customers.
We list each column individually because we had some issues with wildcards, particularly with schema changes (new columns not getting added to the target).
I'll see if I can whip up a reproducible example.visch
07/11/2025, 5:12 PMnew columns not getting added to the targetIs a separate issue that shouldn't happen, and you're trading a more maintainable meltano.yml file for it
visch
07/11/2025, 5:16 PMIf we rely on Postgres's PKs then we end up clobbering and writing over data from different customers.hmm, I'd probably write each customer's data to their own schema in my target (at a minimum, but since the goal is to combine them that makes some sense), then I'd combine them systematically during the transformation step with DBT. Again all of this depends on what the numbers and things actually are, stream maps makes a lot of sense for a few tables but if it's 1000 tables maybe splitting by customers makes more sense. Just ideas!
mark_estey
07/14/2025, 6:16 PMselect 'server1' as server, * from server1.table union all select 'server2' as server, * from server2.table ...
mark_estey
07/14/2025, 6:21 PMEllis Valentiner
07/14/2025, 6:31 PMmark_estey
07/14/2025, 6:48 PMEllis Valentiner
07/14/2025, 6:52 PM