Hello everyone! I I am stuck with basic `select ta...
# troubleshooting
r
Hello everyone! I I am stuck with basic
select tap-postgres --exclude
and can’t understand if I’m doing it wrong or if I have the wrong expectations. I have the following setup:
Copy code
...
  extractors:
  - name: tap-postgres
    variant: transferwise
    pip_url: pipelinewise-tap-postgres
    config:
      default_replication_method: FULL_TABLE
      filter_schemas: public
    select:
    - '!*.first_name'
    - public-admins.*
    - '!*.last_name'
  loaders:
  - name: target-csv
    variant: singer-io
    pip_url: target-csv
    config:
      destination_path: output/csv
...
with this I want to exclude first and last name from the destination. But unfortunately they are included in the final csv files. With meltano select
Copy code
meltano select tap-postgres --list
I get selected attributes w/o name columns
Copy code
...
Legend:
	selected
	excluded
	automatic

Enabled patterns:
	!*.first_name
	public-admins.*
	!*.last_name

Selected attributes:
	[selected ] public-admins.created_at
	[selected ] public-admins.deleted_at
	[automatic] public-admins.id
	[selected ] public-admins.permissions
	[selected ] public-admins.updated_at
	[selected ] public-admins.uuid
I just reproduced this with a completely fresh setup (meltano --version meltano, version 1.67.0) Can you please help understand how to exclude columns from the export?
v
If you dump the catalog
meltano invoke tap-postgress --dump catalog > catalog.json
does that contain your firstname / lastname as expected / not expected
Have to determine if this is meltano generating the wrong catalog, or if it's tap-postgressignoring your catlog (seems unlikely it's tap-postgres)
Sorry I clobered your question with mine, we both posted at the same time 😄
r
when I run
meltano invoke --dump=catalog tap-postgres > catalog.json
in catalog I see:
Copy code
"metadata": [
... {
          "breadcrumb": [
            "properties",
            "first_name"
          ],
          "metadata": {
            "sql-datatype": "character varying",
            "inclusion": "available",
            "selected-by-default": true,
            "selected": false
          }
        },
        {
          "breadcrumb": [
            "properties",
            "last_name"
          ],
          "metadata": {
            "sql-datatype": "character varying",
            "inclusion": "available",
            "selected-by-default": true,
            "selected": false
          }
        },
...
and then in “schema”
Copy code
"schema": {
        "type": "object",
        "properties": {
          "id": {
            "type": [
              "integer"
            ],
            "minimum": -9223372036854775808,
            "maximum": 9223372036854775807
          },
          "first_name": {
            "type": [
              "null",
              "string"
            ]
          },
          "last_name": {
            "type": [
              "null",
              "string"
            ]
so they have
Copy code
"selected-by-default": true,
"selected": false
though I’m not sure what is exactly expected
Sorry I clobered your question with mine, we both posted at the same time
😅 no worries. Meltano is a hot topic 🙂
v
I am guessing at this point, but here it goes maybe tap-postgres is ignoring the column level selected: false ? I've done ignores by stream before with different taps and haven't had an issue. Haven't done column level so I'm not sure
It should work, and I want the feature for myself for a few things. I'm messing around if I get a chance I'll give it a shot on my end
I'm not using tap-postgres though 😕
r
Derek, thanks for you input. I think maybe
transferwise
variant uses different syntax or really doesn’t support filtering of columns. I’ll try to play with catalog file manually and will check docs for this tap deeper
v
Yeah let us know!
j
For what is worth I’ve already had problems with the
select
command in Meltano. It ignores what I asked I wrote an issue about it Furthermore, I just had a problem with the
catalog
too. I re-installed a tap and meltano kept using the old catalog with the new tap so I just had to manually override it
v
d
@roman_gusev pipelinewise-tap-postgres does have logic to determine if a column should be synced based on its
selected
metadata: https://github.com/transferwise/pipelinewise-tap-postgres/blob/8c4732ee398f6124a616f929c6f85c5499366149/tap_postgres/sync_strategies/common.py#L[…]2, which is used here: https://github.com/transferwise/pipelinewise-tap-postgres/blob/8c4732ee398f6124a616f929c6f85c5499366149/tap_postgres/__init__.py#L110 and here: https://github.com/transferwise/pipelinewise-tap-postgres/blob/8c4732ee398f6124a616f929c6f85c5499366149/tap_postgres/__init__.py#L160. My next step would be to add some debug logging statements there (e.g.
LOGGER.warning(desired_columns)
) to see if the columns in question are correctly excluded there.
I’ve already had problems with the 
select
command in Meltano. It ignores what I asked
I wrote an issue about it
@juan_sebastian_suarez_valencia Sorry to hear that! Which issue was that again?
Furthermore, I just had a problem with the 
catalog
too. I re-installed a tap and meltano kept using the old catalog with the new tap so I just had to manually override it
We have an issue for that one: https://gitlab.com/meltano/meltano/-/issues/2627. We haven't gotten to it yet, but it's on our radar. Contributions are welcome 🙂
j
@douwe_maan That was the issue that I was talking about but I vividly remember that I had a problem with the
select
. I didn’t write an issue and I haven’t tried it since then. If I stumble upon the problem again (let’s hope not🤞) I’ll write the issue this time
r
I have an update: pipelinewise-tap-postgres updates the streams’ metadata before sync and discards columns’ metadata. I created a fix for it https://github.com/transferwise/pipelinewise-tap-postgres/pull/88