Hi all. How would you handle the following in Melt...
# best-practices
g
Hi all. How would you handle the following in Meltano? I have a source database where the same tables exists in multipe schemas. It's important to know which data comes from which schema after extracting.
v
Use the tap that corresponds to that DB. I'd recommend messing with a local Postgres container with the
tap-postgres
setup to get used to it (or just go for a dev/test setup). Yes schemas are seperated, check out the Select feature in meltano
t
The stream name produced by the tap will be <schema>-<table> so in a lot of ways this is handled automatically. What the target will do is a different question. The jsonl target uses the stream name as the file name so it'd be automatic there too. All the DB targets I've used (PG, Redshift, Snowflake, etc.) use only the table name though. In that case maybe you'd need to rename the stream using a mapper to retain the schema name somehow? It's an interesting question.
a
Hi, @glenn_de_munck. To @thomas_briggs’s point above, it will really depend on your target and how you've configured your target. Some targets ignore/scrub the
<schema>-
prefix by default (which would create a collision for you) and some also have config options for you to control this behavior. Another option, as @thomas_briggs mentioned, is to use a mapper for aliasing.
g
@aaronsteers Thank you for the explanation. I'm using postgres as target. Are config options available to control this behavior? I'm unable to find it in the documentation...
t
I don't see anything in the pipelinewise-target-postgres code that would handle this. I don't think it would be hard to add though. That aside, I think I would handle this by putting each source schema into its own schema in the destination. Even if you find some way to get all the tables into one schema that feels like it'll create a lot of confusion. Practically what's the difference between all tables having a given prefix and all tables being in a separate schema? Not much, I don't think. šŸ¤·šŸ»ā€ā™‚ļø
a
I think I would handle this by putting each source schema into its own schema in the destination
@thomas_briggs - this is what I did in a past life also. We had a SQL Server DB that had a two schemas we were selecting from and some table name collision. So, the solution we used was instead of landing them all in one job to the same "tap_financedb" schema (for example) we broke it into two jobs landed into "tap_financedb_ops" and "tap_financedb_monitoring".
g
I guess this will be the way to go. In my case I will have to create more then 70 schemas. šŸ™ƒ What I did in the past with ETL was to create a loop that loops over all the schemas in source db. All tables with the same name where merged to one by adding an extra column containing the schema name. Now, I will have to do this in dbt after extraction.
a
@glenn_de_munck - This šŸ‘† would make a nice feature in meltano-map-transform. You'd just need a way to (1) append a new column with the (full) stream name and (2) add a column to every stream, which I don't think is possible today.
If you end up are looping through the schemas anyway, you might be able to pass the schema name also to the mapper to seed the column's value. And for a small number of streams, you could manually add the new column for each, but this clearly is a use case where it'd be best to add that to all streams, using a "*" stream qualifier or similar.