I'm experiencing some weirdness when doing a postg...
# troubleshooting
j
I'm experiencing some weirdness when doing a postgres-to-postgres elt. in my
meltano.yml
I have specified a small list of selected entities (tables) and attributes (columns) to extract. only the entities selected are being loaded ✔️ but, among these entities, all columns (including excluded ones) are being loaded this is my relevant config:
Copy code
environments:
- name: dev
  config:
    plugins:
      extractors:
      - name: tap-postgres--custom
        select:
        - public-app_users.id
        - public-app_users.email
        - public-app_users.username
        - public-app_users.date_joined
        - public-app_users.last_login
        - public-app_users.is_staff
        - public-app_users.is_suspended
        - public-app_users.is_nuked
        metadata:
          '*':
            replication-method: FULL_TABLE
if I do
meltano select tap-postgres--custom --list
, only desired attributes appear as selected ✔️ and
--list --all
conveniently shows the rest as excluded ✔️ more details in thread 👇
the
catalog.json
coming from
meltano invoke --dump=catalog tap-postgres--custom
contains seemingly correct data:
Copy code
"table_name": "app_users",
      "stream": "app_users",
      "metadata": [
        {
          "breadcrumb": [],
          "metadata": {
            "table-key-properties": [
              "id"
            ],
            "row-count": 1714,
            "is-view": false,
            "selected": true,
            "replication-method": "FULL_TABLE"
          }
        },
        {
          "breadcrumb": [
            "properties",
            "id"
          ],
          "metadata": {
            "sql-datatype": "integer",
            "inclusion": "automatic",
            "selected-by-default": true,
            "selected": true
          }
        },
        {
          "breadcrumb": [
            "properties",
            "password"
          ],
          "metadata": {
            "sql-datatype": "character varying",
            "inclusion": "available",
            "selected-by-default": true,
            "selected": false
          }
        },
i.e.
id
appears as
selected: true
whereas
password
appeas as
selected: false
✔️
I additionally have a mapper that aliases these two entities:
Copy code
plugins:
  extractors:
  - name: tap-postgres
    variant: transferwise
    pip_url: pipelinewise-tap-postgres
  - name: tap-postgres--custom
    inherit_from: tap-postgres
  loaders:
  - name: target-postgres
    variant: transferwise
    pip_url: pipelinewise-target-postgres
  mappers:
  - name: meltano-map-transformer
    variant: meltano
    pip_url: git+<https://github.com/MeltanoLabs/meltano-map-transform.git>
    mappings:
    - name: alias-tables
      config:
        stream_maps:
          public-app_users:
            __alias__: custom_users
          public-app_instances:
            __alias__: custom_instances
          __else__: null
and the table aliasing is happening correctly ✔️
but then, when I call
meltano run tap-postgres--custom alias-tables target-postgres
, I see in the logs that the tables are being loaded in the target with all the columns I didn't select
I'd note that
meltano run tap-postgres--custom target-postgres
(without mapping) also loads the selected entities ✔️ with the original names ✔️ but all the attributes
the debugging continues: I did
meltano invoke tap-postgres--custom --dump {config,catalog}
and then directly
./meltano/extractors/tap-postgres--custom/venv/bin/tap-postgres --catalog catalog.json --config config.json
and I still see the non desired attributes
There's no current way to select individual columns to sync.
https://github.com/transferwise/pipelinewise-tap-postgres/pull/129 that's... disappointing
ok, I seem to be affected by a bug that has been mentioned in this Slack already. I left a long comment at https://github.com/transferwise/pipelinewise-tap-postgres/pull/129#issuecomment-1123338635, but basically I'm echoing @taylor when he said
ideally the tap would respect the catalog!
v
@juan_luis_cano_rodriguez you can use Mappers to fake this so your target doesn't get any of the data as well. Not sure if that would help you here. That is a silly limitation 😕 I haven't tried the singer variant for tap-postgres, you could give it a go https://hub.meltano.com/taps/postgres--singer-io 🤷
j
thanks for the suggestions @visch - I'll give the singer variant a try in a few days. about achieving this with mappers, are you thinking of creating 1 alias per column and then add
"__else__": null
so the rest don't get picked up?
t
One other creative solution: create views that select only the columns you want and replicate those. That won't work for LOG_BASED mode but if you don't happen to be using that then... should work.
j
meh, singer-io tap-postgres can't be installed on Python 3.8, seems unmaintained https://github.com/singer-io/tap-postgres/issues/134