I'm playing around with selecting tables from mult...
# random
b
I'm playing around with selecting tables from multiple source schemas in a postgres database, and loading all of them into the same schema in the target postgres. Is there anyway to insert the source schema as a new column value in the target tables? So that I can know which records came from which source schemas?
a
Cool idea, @bob_swain! To confirm, this may consume a bunch of space but would give a per-record audit of the table's json schema at the time of running?
In theory, this would be an easy addition to the meltano-map-transform plugin or upstream in the SDK. Something like this could be a possible path forward: 1. Add a special variable or function into the collection of special variables to the
simplecalc
expression evaluator executor. (For instance
schema()
could return the stringified json schema.) 2. Create a custom mapping like
add-shopify-schema-audits
for the tap stream(s) that you want to add this column to. Something like
{ "stream_map_config": { "stream_a": {"_schema_audit": "schema()" }, ...}
. 3. When invoking, you'd put the mapping between the tap and target:
meltano run tap-shopify add-shopify-schema-audits target-snowflake
b
Oh, sorry, I meant the postgres schema namespace (https://www.postgresqltutorial.com/postgresql-administration/postgresql-schema/#:~:text=What%20is%20a%20PostgreSQL%20schema,schema_name.object_name). Basically I have a bunch of schema namespaces, each with identical tables, and I'm syncing all those together (unioning all the tables) into one schema in the target. That all works great, but at the end, I don't know which schema namespace the original rows came from.
a
Ahhh. Ooops. Yes, thanks for clarifying...
b
haha, overloaded terms!
I do like your idea of a custom mapping though
a
Source schema names are in theory available within the metadata but to my knowledge not generaly accessible in the stream data itself. The same general approach as above could surface and expose that part of the metadata (the source schema name) as a property in the record.
e
Yeah, a custom mapping is probably the way. I created a toy example for merging multi-tenant singer streams: https://github.com/edgarrmondragon/singer-playground/tree/main/merge_streams
b
this is great!! thank you both so much!
I think this should get me started 🙂
a
Here's the part of the mapper SDK where functions can be grafted in: https://github.com/meltano/sdk/blob/main/singer_sdk/mapper.py#L287-L295 Also, I think I mispoke above - after the data leaves the tap, the schema name is not directly accessible within the stream because the messages emitted from the tap generally only contain the stream name (sometimes but not always prefixed with schema name, as in
myschema-mytable
) and the json schema definition. This means a stream map operation in the tap could append the schema name as a property but a stream map operation in the target or in an intermediate mapper plugin might not be able to. Just as a heads up.
b
ah, that's a very good thing to keep a lookout for! and makes sense
thanks again!