jacob_mulligan
01/04/2024, 1:07 PMtap-postgres
extractor, I'm trying to exclude a few columns from a table but can't figure how how to. I have the following config in my meltano.yml file which I think should only select 4 columns from the messages
table in the public schema. However, when I actually run Meltano it's syncing all columns for the table. What am I doing wrong here?
- name: tap-postgres
variant: transferwise
config:
host: ${DB_HOST}
user: ${DB_USER}
port: ${DB_PORT}
password: ${DB_PASSWORD}
dbname: ${DB_NAME}
default_replication_method: INCREMENTAL
filter_schemas: public
ssl: true
select:
- public-messages.id
- public-messages.message_channel_id
- public-messages.sender_id
- public-messages.sender_type
jacob_mulligan
01/04/2024, 1:08 PMmeltano select tap-postgres --list --all
it looks like this should be working as expected:jacob_mulligan
01/04/2024, 1:10 PMmeltano invoke tap-postgres > messages.json
in the schema generated by Meltano I see it's including all columns:
{
"type": "SCHEMA",
"stream": "public-messages",
"schema": {
"type": "object",
"properties": {
"id": {
"type": [
"string"
]
},
"body": {
"type": [
"null",
"string"
]
},
"message_channel_id": {
"type": [
"null",
"string"
]
},
"sender_id": {
"type": [
"null",
"string"
]
},
"sender_type": {
"type": [
"null",
"string"
]
},
"original_body": {
"type": [
"null",
"object",
"array"
]
},
"external_id": {
"type": [
"null",
"string"
]
},
"read_at": {
"type": [
"null",
"string"
],
"format": "date-time"
},
"record_file": {
"type": [
"null",
"string"
]
},
"created_at": {
"type": [
"null",
"string"
],
"format": "date-time"
},
"updated_at": {
"type": [
"null",
"string"
],
"format": "date-time"
},
"delivered_at": {
"type": [
"null",
"string"
],
"format": "date-time"
},
"failed_at": {
"type": [
"null",
"string"
],
"format": "date-time"
},
"failed_error_code": {
"type": [
"null",
"string"
]
},
"failed_description": {
"type": [
"null",
"string"
]
},
"scheduled_at": {
"type": [
"null",
"string"
],
"format": "date-time"
},
"sent_at": {
"type": [
"null",
"string"
],
"format": "date-time"
},
"comms_service": {
"type": [
"null",
"string"
]
}
},
"definitions": {
"sdc_recursive_integer_array": {
"type": [
"null",
"integer",
"array"
],
"items": {
"$ref": "#/definitions/sdc_recursive_integer_array"
}
},
"sdc_recursive_number_array": {
"type": [
"null",
"number",
"array"
],
"items": {
"$ref": "#/definitions/sdc_recursive_number_array"
}
},
"sdc_recursive_string_array": {
"type": [
"null",
"string",
"array"
],
"items": {
"$ref": "#/definitions/sdc_recursive_string_array"
}
},
"sdc_recursive_boolean_array": {
"type": [
"null",
"boolean",
"array"
],
"items": {
"$ref": "#/definitions/sdc_recursive_boolean_array"
}
},
"sdc_recursive_timestamp_array": {
"type": [
"null",
"string",
"array"
],
"format": "date-time",
"items": {
"$ref": "#/definitions/sdc_recursive_timestamp_array"
}
},
"sdc_recursive_object_array": {
"type": [
"null",
"object",
"array"
],
"items": {
"$ref": "#/definitions/sdc_recursive_object_array"
}
}
}
},
"key_properties": [
"id"
],
"bookmark_properties": [
"updated_at"
]
}
And here's the SQL query that Meltano runs which also uses all columns, not just the ones I selected in the meltano.yml file:
SELECT "body" , "comms_service" ,CASE WHEN "created_at" < '0001-01-01 00:00:00.000' OR "created_at" > '9999-12-31 23:59:59.999' THEN '9999-12-31 23:59:59.999' ELSE "created_at" END AS "created_at" ,CASE WHEN "delivered_at" < '0001-01-01 00:00:00.000' OR "delivered_at" > '9999-12-31 23:59:59.999' THEN '9999-12-31 23:59:59.999' ELSE "delivered_at" END AS "delivered_at" , "external_id" ,CASE WHEN "failed_at" < '0001-01-01 00:00:00.000' OR "failed_at" > '9999-12-31 23:59:59.999' THEN '9999-12-31 23:59:59.999' ELSE "failed_at" END AS "failed_at" , "failed_description" , "failed_error_code" , "id" , "message_channel_id" , "original_body" ,CASE WHEN "read_at" < '0001-01-01 00:00:00.000' OR "read_at" > '9999-12-31 23:59:59.999' THEN '9999-12-31 23:59:59.999' ELSE "read_at" END AS "read_at" , "record_file" ,CASE WHEN "scheduled_at" < '0001-01-01 00:00:00.000' OR "scheduled_at" > '9999-12-31 23:59:59.999' THEN '9999-12-31 23:59:59.999' ELSE "scheduled_at" END AS "scheduled_at" , "sender_id" , "sender_type" ,CASE WHEN "sent_at" < '0001-01-01 00:00:00.000' OR "sent_at" > '9999-12-31 23:59:59.999' THEN '9999-12-31 23:59:59.999' ELSE "sent_at" END AS "sent_at" ,CASE WHEN "updated_at" < '0001-01-01 00:00:00.000' OR "updated_at" > '9999-12-31 23:59:59.999' THEN '9999-12-31 23:59:59.999' ELSE "updated_at" END AS "updated_at"
FROM "public"."messages"
with itersize 20000
visch
01/04/2024, 1:25 PMjacob_mulligan
01/04/2024, 1:27 PMjacob_mulligan
01/04/2024, 1:53 PMextractors:
- name: tap-postgres-arc-incremental
inherit_from: tap-postgres
variant: meltanolabs
pip_url: git+<https://github.com/MeltanoLabs/tap-postgres.git>
config:
host: ${ARC_DB_HOST}
user: ${ARC_DB_USER}
port: ${ARC_DB_PORT}
password: ${ARC_DB_PASSWORD}
database: ${ARC_DB_NAME}
default_replication_method: INCREMENTAL
filter_schemas:
- public
ssl_enable: true
ssl_mode: require
and I'm getting this error:
Cannot list the selected attributes: Catalog discovery failed: command ['/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/bin/tap-postgres', '--config', '/Users/jacob/code/pairteam/analytics-etl/.meltano/run/tap-postgres-arc-incremental/tap.bc22fdb4-0d0b-45b1-98c8-51dbb324bda4.config.json', '--discover'] returned 1 with stderr:
Traceback (most recent call last):
File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/bin/tap-postgres", line 8, in <module>
sys.exit(TapPostgres.cli())
File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/click/core.py", line 1077, in main
with self.make_context(prog_name, args, **extra) as ctx:
File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/click/core.py", line 943, in make_context
self.parse_args(ctx, args)
File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/click/core.py", line 1408, in parse_args
value, args = param.handle_parse_result(ctx, opts, args)
File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/click/core.py", line 2400, in handle_parse_result
value = self.process_value(ctx, value)
File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/click/core.py", line 2362, in process_value
value = self.callback(ctx, self, value)
File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/singer_sdk/tap_base.py", line 529, in cb_discover
tap.run_discovery()
File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/singer_sdk/tap_base.py", line 289, in run_discovery
catalog_text = self.catalog_json_text
File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/singer_sdk/tap_base.py", line 309, in catalog_json_text
return json.dumps(self.catalog_dict, indent=2)
File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/tap_postgres/tap.py", line 512, in catalog_dict
result["streams"].extend(self.connector.discover_catalog_entries())
File "/Users/jacob/.pyenv/versions/3.10.11/lib/python3.10/functools.py", line 981, in __get__
val = self.func(instance)
File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/tap_postgres/tap.py", line 408, in connector
url = make_url(self.get_sqlalchemy_url(config=self.config))
File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/tap_postgres/tap.py", line 329, in get_sqlalchemy_url
query=self.get_sqlalchemy_query(config=config),
File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/tap_postgres/tap.py", line 348, in get_sqlalchemy_query
query["sslrootcert"] = self.filepath_or_certificate(
File "/Users/jacob/code/pairteam/analytics-etl/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/tap_postgres/tap.py", line 394, in filepath_or_certificate
with open(alternative_name, "wb") as alternative_file:
FileNotFoundError: [Errno 2] No such file or directory: '.secrets/root.crt'
jacob_mulligan
01/04/2024, 1:53 PMwith open(alternative_name, "wb") as alternative_file:
FileNotFoundError: [Errno 2] No such file or directory: '.secrets/root.crt'
jacob_mulligan
01/04/2024, 1:54 PMssl: true
visch
01/04/2024, 2:57 PMrequire
ssl_mode as we shouldn't require a cert authority to be provided in that case. The tests just optimized for the most secure ssl method.
Could you put an issue into the tap
Options
1. Put a certificate in that directory root.crt
I don't think it'd even need to be the right one, although pulling the right one wouldn't be so hard
2. Update the code here https://github.com/MeltanoLabs/tap-postgres/blob/3593d2e21675d4970cf784ea3a464a4d4cdec962/tap_postgres/tap.py#L345-L351 to not send an sslrootcert
for different ssl modes (Sounds like the right solution)jacob_mulligan
01/04/2024, 3:58 PMsslrootcert
for certain ssl modes.. although, in which ssl_modes should the cert not be required?
I'm not so familiar with this + this would be my first time contributing to a meltano package. So a few new things here and appreciate the help! 🙂visch
01/04/2024, 4:11 PMjacob_mulligan
01/04/2024, 4:32 PMjacob_mulligan
01/04/2024, 4:33 PMvisch
01/04/2024, 4:34 PM