I’m trying to use the meltano-map-transform plugin...
# troubleshooting
q
I’m trying to use the meltano-map-transform plugin to override the primary key for a table. It looks like it’s picking up the override and then ignoring it. I suspect the issue may just be that you can’t override
__key_properties__
with a mapping plugin— that functionality only works with an actual SDK tap. Does that sound right? Or should I be able to do this with the meltano-map-transform plugin?
I’m able to use other meltano-map-transform functionality, it’s just the
__key_properties__
override part that isn’t behaving as expected. I tried a few different things and then it dawned on me that this plugin is a rough port of SDK functionality and might not actually be able to do everything the SDK can… It does look like Meltano is noticing the override. In the logs it says
Found stream map override for 'public-mytable' key properties: ['new_key']
. It just goes on to ignore that setting when the tap does the select statement, and instead it does
SELECT ... FROM "public"."mytable" WHERE "old_key" >= '123'::bigint
Hmm, I wonder if it might have to do with the tap not respecting catalog overrides (Described in this issue https://github.com/transferwise/pipelinewise-tap-postgres/pull/129).
q
Hm, my scenario is slightly different since the primary key override is happening in the stream map rather than in the extractor block
c
Yeah. I was wondering what the different use cases are for either using the metadata extra on the extractor vs using a stream mapper for this functionality.
q
Oh gotcha— I’m creating a new column via the stream-mapper, and I want to set that as the new primary key. So I think I have to do this override in the stream-map config, because (AIUI) the tap block wont even know that this column exists More broadly, I have a table with two timestamp columns, created_at and deleted_at (and there’s no “updated_at” column). We’re currently loading on created_at as the replication-key, but that means we can’t tell when a delete happens. So, I’m making a column that does coalesce(deleted_at, created_at) and setting that as replication-key, so that when deletes happen (and deleted_at becomes non-null), that’s picked up as a new row to load
c
Ah. Interesting. Yeah. I have some pretty bad source systems like that too, some of them have absolutely ZERO create or update timestamp columns at all even though they are somewhat timeseries based data ... right now I'm just doing FULL loads on daily batch for them and then I deduplicate in dbt ... pretty ugly and over time my extract process will certainly fail as the number of rows in the source keeps growing.
q
Yeah, totally feel that… exactly my position here haha
w
@quinn_batten I know this is old - but what did you end up doing here? I'm trying to set a new primary key for a source with 3 columns from the source as well as a fixed value like (location1) that will be set for the entire connection. I was thinking solving in the mapping layer would be good - but not sure how I should approach this
The use case is several different servers that house the same data but are physically in different locations - and im combing the data into 1 destination table
q
Sooo IIRC you actually cannot construct a col in stream-maps and then go and use that as a pkey
I had assumed that was an option but in fact it just doesn’t work like that; applies to SDK-based plugins too. I havent thought about this in months though obviously I could be remembering incorrectly
This is one of the major issues I’ve been having with Meltano, tbh. No flexibility about loading on anything more complex than [one single column]
w
@quinn_batten so I think I found a way to do this
Copy code
- name: dw1__snowflake
      config:
        stream_maps:
          dbo-order:
            source_system: str('location_one')
            __alias__: dbo-orderv2
            __key_properties__:
            - ordnum
            - location_id
            - client_id
            - source_system
i didn't make a new single column pk per se - but i added a column and have a composite key with 4 columns
e
@christoph did you ever get a solution to BigQuery duplication? We are using this target with merge:true but data being hung up in temp tables https://github.com/z3z1ma/target-bigquery