I m trying to use the meltano map transform plugin to overri Meltano #troubleshooting

I’m trying to use the meltano-map-transform plugin...

quinn_batten

09/29/2022, 10:17 PM

I’m trying to use the meltano-map-transform plugin to override the primary key for a table. It looks like it’s picking up the override and then ignoring it. I suspect the issue may just be that you can’t override

__key_properties__

with a mapping plugin— that functionality only works with an actual SDK tap. Does that sound right? Or should I be able to do this with the meltano-map-transform plugin?

quinn_batten

09/29/2022, 10:21 PM

I’m able to use other meltano-map-transform functionality, it’s just the

__key_properties__

override part that isn’t behaving as expected. I tried a few different things and then it dawned on me that this plugin is a rough port of SDK functionality and might not actually be able to do everything the SDK can… It does look like Meltano is noticing the override. In the logs it says

Found stream map override for 'public-mytable' key properties: ['new_key']

. It just goes on to ignore that setting when the tap does the select statement, and instead it does

SELECT ... FROM "public"."mytable" WHERE "old_key" >= '123'::bigint

quinn_batten

09/29/2022, 10:38 PM

Hmm, I wonder if it might have to do with the tap not respecting catalog overrides (Described in this issue https://github.com/transferwise/pipelinewise-tap-postgres/pull/129).

christoph

09/29/2022, 10:39 PM

https://meltano.slack.com/archives/C01TCRBBJD7/p1664436182642689

quinn_batten

09/29/2022, 10:41 PM

Hm, my scenario is slightly different since the primary key override is happening in the stream map rather than in the extractor block

quinn_batten

09/29/2022, 10:42 PM

Meant to link to the docs I’m looking at: https://sdk.meltano.com/en/latest/stream_maps.html#unset-or-modify-the-stream-s-primary-key-behavior

christoph

09/30/2022, 12:08 AM

Yeah. I was wondering what the different use cases are for either using the metadata extra on the extractor vs using a stream mapper for this functionality.

quinn_batten

09/30/2022, 3:49 PM

Oh gotcha— I’m creating a new column via the stream-mapper, and I want to set that as the new primary key. So I think I have to do this override in the stream-map config, because (AIUI) the tap block wont even know that this column exists More broadly, I have a table with two timestamp columns, created_at and deleted_at (and there’s no “updated_at” column). We’re currently loading on created_at as the replication-key, but that means we can’t tell when a delete happens. So, I’m making a column that does coalesce(deleted_at, created_at) and setting that as replication-key, so that when deletes happen (and deleted_at becomes non-null), that’s picked up as a new row to load

christoph

09/30/2022, 10:25 PM

Ah. Interesting. Yeah. I have some pretty bad source systems like that too, some of them have absolutely ZERO create or update timestamp columns at all even though they are somewhat timeseries based data ... right now I'm just doing FULL loads on daily batch for them and then I deduplicate in dbt ... pretty ugly and over time my extract process will certainly fail as the number of rows in the source keeps growing.

quinn_batten

09/30/2022, 10:47 PM

Yeah, totally feel that… exactly my position here haha

wesseljt

08/30/2023, 4:06 PM

@quinn_batten I know this is old - but what did you end up doing here? I'm trying to set a new primary key for a source with 3 columns from the source as well as a fixed value like (location1) that will be set for the entire connection. I was thinking solving in the mapping layer would be good - but not sure how I should approach this

wesseljt

08/30/2023, 4:07 PM

The use case is several different servers that house the same data but are physically in different locations - and im combing the data into 1 destination table

quinn_batten

08/30/2023, 6:49 PM

Sooo IIRC you actually cannot construct a col in stream-maps and then go and use that as a pkey

quinn_batten

08/30/2023, 6:50 PM

I had assumed that was an option but in fact it just doesn’t work like that; applies to SDK-based plugins too. I havent thought about this in months though obviously I could be remembering incorrectly

quinn_batten

08/30/2023, 6:51 PM

This is one of the major issues I’ve been having with Meltano, tbh. No flexibility about loading on anything more complex than [one single column]

wesseljt

08/30/2023, 8:03 PM

@quinn_batten so I think I found a way to do this

wesseljt

08/30/2023, 8:04 PM

Copy code

- name: dw1__snowflake
      config:
        stream_maps:
          dbo-order:
            source_system: str('location_one')
            __alias__: dbo-orderv2
            __key_properties__:
            - ordnum
            - location_id
            - client_id
            - source_system

wesseljt

08/30/2023, 8:04 PM

i didn't make a new single column pk per se - but i added a column and have a composite key with 4 columns

edward_ryan

04/10/2024, 8:14 PM

@christoph did you ever get a solution to BigQuery duplication? We are using this target with merge:true but data being hung up in temp tables https://github.com/z3z1ma/target-bigquery

Open in Slack

Previous Next