A question about the `tap-postgres` extractor, I'm...
# troubleshooting
j
A question about the
tap-postgres
extractor, I'm trying to exclude a few columns from a table but can't figure how how to. I have the following config in my meltano.yml file which I think should only select 4 columns from the
messages
table in the public schema. However, when I actually run Meltano it's syncing all columns for the table. What am I doing wrong here?
Copy code
- name: tap-postgres
    variant: transferwise
    config:
      host: ${DB_HOST}
      user: ${DB_USER}
      port: ${DB_PORT}
      password: ${DB_PASSWORD}
      dbname: ${DB_NAME}
      default_replication_method: INCREMENTAL
      filter_schemas: public
      ssl: true

    select:
    - public-messages.id
    - public-messages.message_channel_id
    - public-messages.sender_id
    - public-messages.sender_type
When I run
meltano select tap-postgres --list --all
it looks like this should be working as expected:
However, when I run
meltano invoke tap-postgres > messages.json
in the schema generated by Meltano I see it's including all columns:
Copy code
{
  "type": "SCHEMA",
  "stream": "public-messages",
  "schema": {
    "type": "object",
    "properties": {
      "id": {
        "type": [
          "string"
        ]
      },
      "body": {
        "type": [
          "null",
          "string"
        ]
      },
      "message_channel_id": {
        "type": [
          "null",
          "string"
        ]
      },
      "sender_id": {
        "type": [
          "null",
          "string"
        ]
      },
      "sender_type": {
        "type": [
          "null",
          "string"
        ]
      },
      "original_body": {
        "type": [
          "null",
          "object",
          "array"
        ]
      },
      "external_id": {
        "type": [
          "null",
          "string"
        ]
      },
      "read_at": {
        "type": [
          "null",
          "string"
        ],
        "format": "date-time"
      },
      "record_file": {
        "type": [
          "null",
          "string"
        ]
      },
      "created_at": {
        "type": [
          "null",
          "string"
        ],
        "format": "date-time"
      },
      "updated_at": {
        "type": [
          "null",
          "string"
        ],
        "format": "date-time"
      },
      "delivered_at": {
        "type": [
          "null",
          "string"
        ],
        "format": "date-time"
      },
      "failed_at": {
        "type": [
          "null",
          "string"
        ],
        "format": "date-time"
      },
      "failed_error_code": {
        "type": [
          "null",
          "string"
        ]
      },
      "failed_description": {
        "type": [
          "null",
          "string"
        ]
      },
      "scheduled_at": {
        "type": [
          "null",
          "string"
        ],
        "format": "date-time"
      },
      "sent_at": {
        "type": [
          "null",
          "string"
        ],
        "format": "date-time"
      },
      "comms_service": {
        "type": [
          "null",
          "string"
        ]
      }
    },
    "definitions": {
      "sdc_recursive_integer_array": {
        "type": [
          "null",
          "integer",
          "array"
        ],
        "items": {
          "$ref": "#/definitions/sdc_recursive_integer_array"
        }
      },
      "sdc_recursive_number_array": {
        "type": [
          "null",
          "number",
          "array"
        ],
        "items": {
          "$ref": "#/definitions/sdc_recursive_number_array"
        }
      },
      "sdc_recursive_string_array": {
        "type": [
          "null",
          "string",
          "array"
        ],
        "items": {
          "$ref": "#/definitions/sdc_recursive_string_array"
        }
      },
      "sdc_recursive_boolean_array": {
        "type": [
          "null",
          "boolean",
          "array"
        ],
        "items": {
          "$ref": "#/definitions/sdc_recursive_boolean_array"
        }
      },
      "sdc_recursive_timestamp_array": {
        "type": [
          "null",
          "string",
          "array"
        ],
        "format": "date-time",
        "items": {
          "$ref": "#/definitions/sdc_recursive_timestamp_array"
        }
      },
      "sdc_recursive_object_array": {
        "type": [
          "null",
          "object",
          "array"
        ],
        "items": {
          "$ref": "#/definitions/sdc_recursive_object_array"
        }
      }
    }
  },
  "key_properties": [
    "id"
  ],
  "bookmark_properties": [
    "updated_at"
  ]
}
And here's the SQL query that Meltano runs which also uses all columns, not just the ones I selected in the meltano.yml file:
Copy code
SELECT  "body" , "comms_service" ,CASE WHEN  "created_at"  < '0001-01-01 00:00:00.000' OR  "created_at"  > '9999-12-31 23:59:59.999' THEN '9999-12-31 23:59:59.999' ELSE  "created_at"  END AS  "created_at" ,CASE WHEN  "delivered_at"  < '0001-01-01 00:00:00.000' OR  "delivered_at"  > '9999-12-31 23:59:59.999' THEN '9999-12-31 23:59:59.999' ELSE  "delivered_at"  END AS  "delivered_at" , "external_id" ,CASE WHEN  "failed_at"  < '0001-01-01 00:00:00.000' OR  "failed_at"  > '9999-12-31 23:59:59.999' THEN '9999-12-31 23:59:59.999' ELSE  "failed_at"  END AS  "failed_at" , "failed_description" , "failed_error_code" , "id" , "message_channel_id" , "original_body" ,CASE WHEN  "read_at"  < '0001-01-01 00:00:00.000' OR  "read_at"  > '9999-12-31 23:59:59.999' THEN '9999-12-31 23:59:59.999' ELSE  "read_at"  END AS  "read_at" , "record_file" ,CASE WHEN  "scheduled_at"  < '0001-01-01 00:00:00.000' OR  "scheduled_at"  > '9999-12-31 23:59:59.999' THEN '9999-12-31 23:59:59.999' ELSE  "scheduled_at"  END AS  "scheduled_at" , "sender_id" , "sender_type" ,CASE WHEN  "sent_at"  < '0001-01-01 00:00:00.000' OR  "sent_at"  > '9999-12-31 23:59:59.999' THEN '9999-12-31 23:59:59.999' ELSE  "sent_at"  END AS  "sent_at" ,CASE WHEN  "updated_at"  < '0001-01-01 00:00:00.000' OR  "updated_at"  > '9999-12-31 23:59:59.999' THEN '9999-12-31 23:59:59.999' ELSE  "updated_at"  END AS  "updated_at" 
        FROM "public"."messages"
         with itersize 20000
v
Try the Meltano labs variant! I know this works there. iirc wise has this long standing issue which I think is causing your problem https://github.com/transferwise/pipelinewise-tap-postgres/pull/129
j
Ooo thanks for the pointer! I'll try shifting to the Meltano variant
😄 1
I think I'm stuck on the SSL setup. Here's the config I have right now:
Copy code
extractors:
  - name: tap-postgres-arc-incremental
    inherit_from: tap-postgres
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-postgres.git>
    config:
      host: ${ARC_DB_HOST}
      user: ${ARC_DB_USER}
      port: ${ARC_DB_PORT}
      password: ${ARC_DB_PASSWORD}
      database: ${ARC_DB_NAME}
      default_replication_method: INCREMENTAL
      filter_schemas:
      - public
      ssl_enable: true
      ssl_mode: require
and I'm getting this error:
Copy code
Cannot list the selected attributes: Catalog discovery failed: command ['/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/bin/tap-postgres', '--config', '/Users/jacob/code/pairteam/analytics-etl/.meltano/run/tap-postgres-arc-incremental/tap.bc22fdb4-0d0b-45b1-98c8-51dbb324bda4.config.json', '--discover'] returned 1 with stderr:
 Traceback (most recent call last):
  File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/bin/tap-postgres", line 8, in <module>
    sys.exit(TapPostgres.cli())
  File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/click/core.py", line 1077, in main
    with self.make_context(prog_name, args, **extra) as ctx:
  File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/click/core.py", line 943, in make_context
    self.parse_args(ctx, args)
  File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/click/core.py", line 1408, in parse_args
    value, args = param.handle_parse_result(ctx, opts, args)
  File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/click/core.py", line 2400, in handle_parse_result
    value = self.process_value(ctx, value)
  File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/click/core.py", line 2362, in process_value
    value = self.callback(ctx, self, value)
  File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/singer_sdk/tap_base.py", line 529, in cb_discover
    tap.run_discovery()
  File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/singer_sdk/tap_base.py", line 289, in run_discovery
    catalog_text = self.catalog_json_text
  File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/singer_sdk/tap_base.py", line 309, in catalog_json_text
    return json.dumps(self.catalog_dict, indent=2)
  File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/tap_postgres/tap.py", line 512, in catalog_dict
    result["streams"].extend(self.connector.discover_catalog_entries())
  File "/Users/jacob/.pyenv/versions/3.10.11/lib/python3.10/functools.py", line 981, in __get__
    val = self.func(instance)
  File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/tap_postgres/tap.py", line 408, in connector
    url = make_url(self.get_sqlalchemy_url(config=self.config))
  File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/tap_postgres/tap.py", line 329, in get_sqlalchemy_url
    query=self.get_sqlalchemy_query(config=config),
  File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/tap_postgres/tap.py", line 348, in get_sqlalchemy_query
    query["sslrootcert"] = self.filepath_or_certificate(
  File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/tap_postgres/tap.py", line 394, in filepath_or_certificate
    with open(alternative_name, "wb") as alternative_file:
FileNotFoundError: [Errno 2] No such file or directory: '.secrets/root.crt'
this is the key piece, i think
Copy code
with open(alternative_name, "wb") as alternative_file:
FileNotFoundError: [Errno 2] No such file or directory: '.secrets/root.crt'
with the transferwise variant I had
ssl: true
v
Yeah looks like there's a bug with the
require
ssl_mode as we shouldn't require a cert authority to be provided in that case. The tests just optimized for the most secure ssl method. Could you put an issue into the tap Options 1. Put a certificate in that directory
root.crt
I don't think it'd even need to be the right one, although pulling the right one wouldn't be so hard 2. Update the code here https://github.com/MeltanoLabs/tap-postgres/blob/3593d2e21675d4970cf784ea3a464a4d4cdec962/tap_postgres/tap.py#L345-L351 to not send an
sslrootcert
for different ssl modes (Sounds like the right solution)
j
I can try opening a PR to not send an
sslrootcert
for certain ssl modes.. although, in which ssl_modes should the cert not be required? I'm not so familiar with this + this would be my first time contributing to a meltano package. So a few new things here and appreciate the help! 🙂
v
If you start with making an issue, that's a good step!
j
And yes adding a cert file that's empty ""fixed"" the issue. although I agree with you that a empty cert file hack isn't a great solution to this. transparently, I'm going to just do this for now since this unblocks me getting some data into our DW.
dancingpenguin 1
v
Glad you're up and running thanks for putting hte issue in. If you could put the workaround in the issue too it'll help some folks if they hit this 😄