Hello, I’m trying to change a data type using the ...
# troubleshooting
d
Hello, I’m trying to change a data type using the transferwise postgres tap, and I’m having some trouble. I have a
jsonb
field that I want to be treated as a
['null', 'string']
type rather than
['null', 'object', 'array']
. I’ve added this to my `meltano.yml`:
Copy code
schema:
      public-a:
        event_data:
          type: ["null", "string"]
I saw this message which helped me find where the issue is happening in the tap. https://meltano.slack.com/archives/C01TCRBBJD7/p1631295595178100 The tap does discovery every time the job runs, and simply overwrites the provided schema with the discovered one. How does this work in other taps? Is discovery normally done on every run?
v
End goal is to get the field be a String in
target-postgres
?
Discovery is chosen by the orchestrator, running "every run" is not normal. But if you run with
meltano elt
I'm pretty certain Discovery does run every time (You want to keep up with source schema changes)
meltano invoke
does not, due to wanting to cache the schema to make calls to things like
meltano select
quick
Maybe you're saying when you pass a catalog into the tap it doesn't get respected ie
tap-abc -c config.json --catalog catalog.json
just get's ignored? If we can start from the issue you're trying to solve it'll help me focus 😄
Also what tap, and what target?
d
Hey @visch, I’m using
tap-postgres
&
target-snowflake
(both transferwise variants and latest releases). The end goal is for the field to be a
varchar
in Snowflake rather than a
variant
. At least that’s the immediate problem I’m trying to solve, although it may not fix the larger issue. I have a postres table with json blobs > 16MB, and the
variant
type in Snowflake can only be <= 16MB. I’ve now realized varchar has the same limitation but maybe there’s a way to truncate larger values, but anyway I digress. I’m running
meltano elt
, but the catalog it passes to the tap isn’t respected, and I think I see why now. I’m pretty sure it only happens with LOG_BASED replication but I’m still experimenting a little. In the tap it’ll call this line https://github.com/transferwise/pipelinewise-tap-postgres/blob/master/tap_postgres/sync_strategies/logical_replication.py#L411 and run discovery and squash the schema in the provided catalog.
t
@dean_morin it seems like you’ve confirmed this is a problem with the tap and not Meltano, correct? Based on https://github.com/transferwise/pipelinewise-tap-postgres/issues/128
d
Yes, sorry should have followed up here as well
m
Is the active workaround for jsonb columns to use this branch still https://github.com/transferwise/pipelinewise-tap-postgres/issues/128?