I'm trying to replicate a Postgres DB using FULL_T...
# singer-targets
j
I'm trying to replicate a Postgres DB using FULL_TABLE replication (for now, just want to get something working). Seems like my JSON fields don't survive. I have a largeish object looking like this:
Copy code
{
    "header": "just-a-plain-old-string",
    "sections": [
        {"name": "another string", "questions": [ .... ]},
        {"name": "another string 2", "questions": [ .... ]},
        {"name": "another string 3", "questions": [ .... ]},
        {"name": "another string 4", "questions": [ .... ]},
    ]
}
... you get the gist... What I end up with in my warehouse is just this string:
Copy code
{"\"header\"","\"text\"","\"sections\""}
So that's the two roots in the object,
header
and
sections
and ... I don't know where
text
comes from. Also there's some quotation mark business that needs my attention. Digging into it, it kinda looks like I'm out of luck w.r.t. having the JSON Postgres data type supported in the target, and I suppose I could live with a string. But I'd kinda just expect the JSON to survive in a string field. But maybe the problems start already at the tap?
c
I ran into a similar problem recently going from tap-postgres to target-postgres where the target schema was aggressively choosing
jsonb[]
even though the tap column was
jsonb
and ending up in a similar situation where I just had root-level attribute names with no associated JSON hashes. The fix was in
target-postgres
where
jsonb
became the chosen default via a code change. What target are you using? (It’d be weird if the behavior of your target and target-postgres is exactly the same, because I believe it was the conversion during
INSERT
from
jsonb
to
jsonb[]
where data was lost.)
j
I'm also using target-postgres, the meltanolabs variant
brilliant.. I wonder if I can somehow force the data type on a column level in the project
yml
file....
c
The fix for me was to update to target-postgres 0.0.14 and change my column types in staging - didn’t care about the data loss there - and I manually defined my tables in production prior to running the pipeline the first time to ensure
jsonb
was chosen over
jsonb[]
I went over the source code for 0.0.14 and it seems like it will work as long as you drop your tables and let it recreate them
j
Fantastic, thanks a lot
c
Copy code
- name: target-postgres
    variant: meltanolabs
    pip_url: meltanolabs-target-postgres~=0.0.14
the version change should be enough, I was just extra cautious with production prior to having a job that runs for 24 hours give or take 🙂
j
oh wow, a full refresh takes about 20 minutes for me..
c
yeah similar situation for the staging environment on tiny itty bitty AWS instances for me, but production… has a lot of audit records
you can use this query to identify any
jsonb[]
columns still hanging around - note, my version produces a bunch of alter table statements so I could just correct staging and move on:
Copy code
SELECT
    distinct
    'alter table ' || c.table_schema || '.' || c.table_name || ' alter column ' || c.column_name || ' type jsonb using to_jsonb(' || c.column_name || ');' as ddl
FROM
    information_schema.columns c
JOIN
    pg_catalog.pg_class t ON c.table_name = t.relname
JOIN
    pg_catalog.pg_attribute a ON a.attname = c.column_name AND a.attrelid = t.oid
JOIN
    pg_catalog.pg_type tp ON a.atttypid = tp.oid
WHERE
    c.data_type = 'ARRAY' AND tp.typname = '_jsonb';
🙌 1
j
Thank you so much for all this! I really need to leave now, but I'll check back on it.
Worked great. Did have to nuke my DB though, but no biggie. Thanks again!
🙌 1