Stéphane Burwash
03/25/2025, 5:51 PMpayload
column which contains WAY too much PII, and I was hoping to be able to sync only the datapoints I needed (only grab values from payload.xyz
if payload.xyz
exists)
Thanks!
cc @visch since you're my tap-postgres guruEdgar Ramírez (Arch.dev)
03/25/2025, 8:13 PMStéphane Burwash
03/25/2025, 8:20 PMdylan_just
03/25/2025, 8:46 PMStéphane Burwash
03/25/2025, 8:46 PMStéphane Burwash
03/26/2025, 2:00 PMschema:
public-respondents:
foo:
type: ["object", "null"]
foo.bar:
type: ["string", "null"]
select:
- public-respondents.*
In this case, I'm trying to access the nested value inside the payload
JSON object:
[selected ] public-respondents.checksum
[selected ] public-respondents.created_at
[selected ] public-respondents.deleted_at
[automatic ] public-respondents.foo
[selected ] public-respondents.payload
[automatic ] public-respondents.survey_id
[selected ] public-respondents.updated_at
[automatic ] public-respondents.uuid
visch
03/26/2025, 2:20 PMEdgar Ramírez (Arch.dev)
03/26/2025, 2:23 PMschema:
public-respondents:
foo:
type: ["object", "null"]
properties:
bar: type: ["string", "null"]
Stéphane Burwash
03/26/2025, 3:48 PMEdgar Ramírez (Arch.dev)
03/26/2025, 3:58 PMStéphane Burwash
03/26/2025, 5:39 PMEdgar Ramírez (Arch.dev)
03/26/2025, 9:44 PM"additionalProperties": True
, and for some reason I'm investigating at the moment, it doesn't work if you set it with the schema override.
https://github.com/MeltanoLabs/tap-postgres/pull/601Edgar Ramírez (Arch.dev)
03/27/2025, 3:40 PMCREATE TABLE has_jsonb (
id INTEGER PRIMARY KEY,
my_jsonb JSONB
);
INSERT INTO has_jsonb (id, my_jsonb) VALUES ('1', '{"name": "John", "age": 30, "email": "<mailto:john@example.com|john@example.com>"}');
...
Then my extras look like this:
select:
- 'public-*.*'
- 'public-has_jsonb.my_jsonb.name'
- 'public-has_jsonb.my_jsonb.age'
- '!public-has_jsonb.my_jsonb.email'
schema:
public-has_jsonb:
my_jsonb:
type: [object, "null"]
properties:
name:
type: string
age:
type: integer
email:
type: string
and invoking it shows the right data:
{"type":"SCHEMA","stream":"public-has_jsonb","schema":{"properties":{"id":{"type":["integer"]},"my_jsonb":{"properties":{"name":{"type":"string"},"age":{"type":"integer"}},"type":["object","null"]}},"type":"object","required":["id"],"$schema":"<https://json-schema.org/draft/2020-12/schema>"},"key_properties":["id"]}
{"type":"RECORD","stream":"public-has_jsonb","record":{"id":1,"my_jsonb":{"age":30,"name":"John"}},"time_extracted":"2025-03-27T15:39:25.534987+00:00"}
If I don't include email
in the schema
, it's still selected.Edgar Ramírez (Arch.dev)
03/27/2025, 3:41 PMOk, made some time to test with tap-postgres itself.Forgot to mention, this is on tap-postgres
main
which doesn't have any functional differences with https://pypi.org/project/meltanolabs-tap-postgres/0.1.1/