Hey everyone :wave: Happy tuesday! I was wonderin...
# singer-tap-development
s
Hey everyone 👋 Happy tuesday! I was wondering if anyone knew how I could only select specific columns to sync using tap-postgres in a json payload. We have a
payload
column which contains WAY too much PII, and I was hoping to be able to sync only the datapoints I needed (only grab values from
payload.xyz
if
payload.xyz
exists) Thanks! cc @visch since you're my tap-postgres guru
👋 1
e
Hi @Stéphane Burwash! I think you could enrich the schema of the subfields using https://docs.meltano.com/concepts/plugins/#schema-extra and then try selecting only the fields you want with https://docs.meltano.com/concepts/plugins/#select-extra.
s
Ouh I like that, thanks @Edgar Ramírez (Arch.dev)!
d
Another option is to use a view in Postgres to transform the data, then point Meltano at that view
2
s
🤔 @dylan_just that's a wonderful alternative, thanks!
@Edgar Ramírez (Arch.dev) I can't seem to access the nested values of a json object, would we have an example lying around on how to use the schema extra?
Copy code
schema:
        public-respondents:
          foo:
            type: ["object", "null"]
          foo.bar:
            type: ["string", "null"]

      select:
        - public-respondents.*
In this case, I'm trying to access the nested value inside the
payload
JSON object:
Copy code
[selected   ] public-respondents.checksum
        [selected   ] public-respondents.created_at
        [selected   ] public-respondents.deleted_at
        [automatic  ] public-respondents.foo
        [selected   ] public-respondents.payload
        [automatic  ] public-respondents.survey_id
        [selected   ] public-respondents.updated_at
        [automatic  ] public-respondents.uuid
v
View in Postgres is probably the easiest way and it will ensure no data leakage as it'll never make it to the tap https://sdk.meltano.com/en/v0.45.0/stream_maps.html has a chance at working, but my bet is nested records aren't going to serialize nicely
e
Can you try
Copy code
schema:
        public-respondents:
          foo:
            type: ["object", "null"]
            properties:
              bar: type: ["string", "null"]
💯 1
s
That's added it to the schema, but still is not grabbing the value. I'll look into it 🤔
e
Oh, this might be another case requiring #2924
👀 1
s
@Edgar Ramírez (Arch.dev) when would you plan on releasing this change?
e
I'm actually thinking that PR just might not be enough. You actually need the tap to emit the object with
"additionalProperties": True
, and for some reason I'm investigating at the moment, it doesn't work if you set it with the schema override. https://github.com/MeltanoLabs/tap-postgres/pull/601
👍 1
Ok, made some time to test with tap-postgres itself. I seeded this data into postgres:
Copy code
CREATE TABLE has_jsonb (
  id INTEGER PRIMARY KEY,
  my_jsonb JSONB
);

INSERT INTO has_jsonb (id, my_jsonb) VALUES ('1', '{"name": "John", "age": 30, "email": "<mailto:john@example.com|john@example.com>"}');
...
Then my extras look like this:
Copy code
select:
    - 'public-*.*'
    - 'public-has_jsonb.my_jsonb.name'
    - 'public-has_jsonb.my_jsonb.age'
    - '!public-has_jsonb.my_jsonb.email'
    schema:
      public-has_jsonb:
        my_jsonb:
          type: [object, "null"]
          properties:
            name:
              type: string
            age:
              type: integer
            email:
              type: string
and invoking it shows the right data:
Copy code
{"type":"SCHEMA","stream":"public-has_jsonb","schema":{"properties":{"id":{"type":["integer"]},"my_jsonb":{"properties":{"name":{"type":"string"},"age":{"type":"integer"}},"type":["object","null"]}},"type":"object","required":["id"],"$schema":"<https://json-schema.org/draft/2020-12/schema>"},"key_properties":["id"]}
{"type":"RECORD","stream":"public-has_jsonb","record":{"id":1,"my_jsonb":{"age":30,"name":"John"}},"time_extracted":"2025-03-27T15:39:25.534987+00:00"}
If I don't include
email
in the
schema
, it's still selected.
Ok, made some time to test with tap-postgres itself.
Forgot to mention, this is on tap-postgres
main
which doesn't have any functional differences with https://pypi.org/project/meltanolabs-tap-postgres/0.1.1/