I m playing around with selecting tables from multiple sourc Meltano #random

I'm playing around with selecting tables from mult...

bob_swain

07/28/2022, 2:42 PM

I'm playing around with selecting tables from multiple source schemas in a postgres database, and loading all of them into the same schema in the target postgres. Is there anyway to insert the source schema as a new column value in the target tables? So that I can know which records came from which source schemas?

aaronsteers

07/28/2022, 5:52 PM

Cool idea, @bob_swain! To confirm, this may consume a bunch of space but would give a per-record audit of the table's json schema at the time of running?

aaronsteers

07/28/2022, 5:58 PM

In theory, this would be an easy addition to the meltano-map-transform plugin or upstream in the SDK. Something like this could be a possible path forward: 1. Add a special variable or function into the collection of special variables to the

simplecalc

expression evaluator executor. (For instance

schema()

could return the stringified json schema.) 2. Create a custom mapping like

add-shopify-schema-audits

for the tap stream(s) that you want to add this column to. Something like

{ "stream_map_config": { "stream_a": {"_schema_audit": "schema()" }, ...}

. 3. When invoking, you'd put the mapping between the tap and target:

meltano run tap-shopify add-shopify-schema-audits target-snowflake

bob_swain

07/28/2022, 6:00 PM

Oh, sorry, I meant the postgres schema namespace (https://www.postgresqltutorial.com/postgresql-administration/postgresql-schema/#:~:text=What%20is%20a%20PostgreSQL%20schema,schema_name.object_name). Basically I have a bunch of schema namespaces, each with identical tables, and I'm syncing all those together (unioning all the tables) into one schema in the target. That all works great, but at the end, I don't know which schema namespace the original rows came from.

aaronsteers

07/28/2022, 6:00 PM

Ahhh. Ooops. Yes, thanks for clarifying...

bob_swain

07/28/2022, 6:00 PM

haha, overloaded terms!

bob_swain

07/28/2022, 6:01 PM

I do like your idea of a custom mapping though

aaronsteers

07/28/2022, 6:03 PM

Source schema names are in theory available within the metadata but to my knowledge not generaly accessible in the stream data itself. The same general approach as above could surface and expose that part of the metadata (the source schema name) as a property in the record.

edgar_ramirez_mondragon

07/28/2022, 6:03 PM

Yeah, a custom mapping is probably the way. I created a toy example for merging multi-tenant singer streams: https://github.com/edgarrmondragon/singer-playground/tree/main/merge_streams

bob_swain

07/28/2022, 6:04 PM

this is great!! thank you both so much!

bob_swain

07/28/2022, 6:05 PM

I think this should get me started 🙂

aaronsteers

07/28/2022, 6:09 PM

Here's the part of the mapper SDK where functions can be grafted in: https://github.com/meltano/sdk/blob/main/singer_sdk/mapper.py#L287-L295 Also, I think I mispoke above - after the data leaves the tap, the schema name is not directly accessible within the stream because the messages emitted from the tap generally only contain the stream name (sometimes but not always prefixed with schema name, as in

myschema-mytable

) and the json schema definition. This means a stream map operation in the tap could append the schema name as a property but a stream map operation in the target or in an intermediate mapper plugin might not be able to. Just as a heads up.

bob_swain

07/28/2022, 6:12 PM

ah, that's a very good thing to keep a lookout for! and makes sense

bob_swain

07/28/2022, 6:12 PM

thanks again!

Open in Slack

Previous Next