simon_walter
06/28/2023, 6:01 PMtimestamp data type to Postgres’s timezone-aware timestamptz fails
I have a MySQL tap which extracts several tables that all have a created column, holding a UTC timestamp. This is also my replication key for incremental replication. My target is Postgres. The problem is that Postgres stores timezone-ware timestamps in its custom timestamptz data type. I need to use timestamptz in order to let my frontend application automatically display the UTC timestamp in the local timezone. When running the tap to target job without any modifications, the target-postgres plugin maps MySQL timestamp to Postgres timestamp , which is wrong since that is Postgres’s timezone-unaware data type.
Since the raw format of the stored data is the same, I had hoped that manually altering the tables in my target Postgres DB from timestamp to timestamptz might solve the problem. However, doing that causes the next run to detect an anomaly between input data type and target datatype (timestamp vs timestamptz ) and causes the plugin to alter the existing column and versioning it (created->created_202306281600 for example) and adding the created column (again with data type timestamp ) for only the incrementally replicated rows. So clearly that is not working, Meltano logs of what’s happening below, as well as the example result table which corrupted created data.
Has anyone run into this issue by chance? The only solve I can think of is a bit hacky, by using the Meltano SDK version of the Postgres target and trying to use a stream mapper to add a timezone offset to the timestamp manually, which hopefully the plugin then correctly detects as timestamptz and uses the correct data type in my Postgres target.
╔═════╤═════╤══════════════════╤══════════════════════╗
║ id │ ... │ created │ created_202306281600 ║
╠═════╪═════╪══════════════════╪══════════════════════╣
║ 1 │ ... │ │ 2023-06-27 18:00 ║
╟─────┼─────┼──────────────────┼──────────────────────╢
║ 2 │ ... │ 2023-06-28 15:00 │ ║
╟─────┼─────┼──────────────────┼──────────────────────╢
║ ... │ │ │ ║
╚═════╧═════╧══════════════════╧══════════════════════╝
```INFO {'consumer': True, 'producer': False, 'string_id': 'target-my-postgres-target', 'cmd_type': 'elb', 'stdio': 'stderr', 'name': 'target-my-postgres-target', 'event': 'time=2023-06-28 113310 name=target_postgres level=INFO message=Table \'"my_table"\' exists', 'level': 'info', 'timestamp': '2023-06-28T173310.444618Z'}
INFO {'consumer': True, 'producer': False, 'string_id': 'target-my-postgres-target', 'cmd_type': 'elb', 'stdio': 'stderr', 'name': 'target-my-postgres-target', 'event': 'time=2023-06-28 113310 name=target_postgres level=INFO message=Versioning column: ALTER TABLE myschema."my_table" RENAME COLUMN "created" TO "created_20230628_1133"', 'level': 'info', 'timestamp': '2023-06-28T173310.466170Z'}
INFO {'consumer': True, 'producer': False, 'string_id': 'target-my-postgres-target', 'cmd_type': 'elb', 'stdio': 'stderr', 'name': 'target-my-postgres-target', 'event': 'time=2023-06-28 113310 name=target_postgres level=INFO message=Adding column: ALTER TABLE myschema."my_table" ADD COLUMN "created" timestamp without time zone', 'level': 'info', 'timestamp': '2023-06-28T17:33…visch
06/28/2023, 7:14 PMsimon_walter
06/28/2023, 7:15 PMvisch
06/28/2023, 7:18 PMsimon_walter
06/28/2023, 7:38 PMvisch
06/28/2023, 8:10 PMtarget-postgres wouldn't default to using timestampz instead of timestamp we could probably just move the default over in meltanolabs unless i'm missing somethingsimon_walter
06/28/2023, 11:15 PMtarget-postgres , but am I correct to assume it would come down to changing this line to return TIMESTAMP(timezone=True) ?simon_walter
06/28/2023, 11:16 PMmainvisch
06/29/2023, 10:37 AM