simon_walter
06/28/2023, 6:01 PMtimestamp
data type to Postgres’s timezone-aware timestamptz
fails
I have a MySQL tap which extracts several tables that all have a created
column, holding a UTC timestamp. This is also my replication key for incremental replication. My target is Postgres. The problem is that Postgres stores timezone-ware timestamps in its custom timestamptz
data type. I need to use timestamptz
in order to let my frontend application automatically display the UTC timestamp in the local timezone. When running the tap to target job without any modifications, the target-postgres
plugin maps MySQL timestamp
to Postgres timestamp
, which is wrong since that is Postgres’s timezone-unaware data type.
Since the raw format of the stored data is the same, I had hoped that manually altering the tables in my target Postgres DB from timestamp
to timestamptz
might solve the problem. However, doing that causes the next run to detect an anomaly between input data type and target datatype (timestamp
vs timestamptz
) and causes the plugin to alter the existing column and versioning it (created->created_202306281600
for example) and adding the created
column (again with data type timestamp
) for only the incrementally replicated rows. So clearly that is not working, Meltano logs of what’s happening below, as well as the example result table which corrupted created
data.
Has anyone run into this issue by chance? The only solve I can think of is a bit hacky, by using the Meltano SDK version of the Postgres target and trying to use a stream mapper to add a timezone offset to the timestamp manually, which hopefully the plugin then correctly detects as timestamptz
and uses the correct data type in my Postgres target.
╔═════╤═════╤══════════════════╤══════════════════════╗
║ id │ ... │ created │ created_202306281600 ║
╠═════╪═════╪══════════════════╪══════════════════════╣
║ 1 │ ... │ │ 2023-06-27 18:00 ║
╟─────┼─────┼──────────────────┼──────────────────────╢
║ 2 │ ... │ 2023-06-28 15:00 │ ║
╟─────┼─────┼──────────────────┼──────────────────────╢
║ ... │ │ │ ║
╚═════╧═════╧══════════════════╧══════════════════════╝
```INFO {'consumer': True, 'producer': False, 'string_id': 'target-my-postgres-target', 'cmd_type': 'elb', 'stdio': 'stderr', 'name': 'target-my-postgres-target', 'event': 'time=2023-06-28 113310 name=target_postgres level=INFO message=Table \'"my_table"\' exists', 'level': 'info', 'timestamp': '2023-06-28T173310.444618Z'}
INFO {'consumer': True, 'producer': False, 'string_id': 'target-my-postgres-target', 'cmd_type': 'elb', 'stdio': 'stderr', 'name': 'target-my-postgres-target', 'event': 'time=2023-06-28 113310 name=target_postgres level=INFO message=Versioning column: ALTER TABLE myschema."my_table" RENAME COLUMN "created" TO "created_20230628_1133"', 'level': 'info', 'timestamp': '2023-06-28T173310.466170Z'}
INFO {'consumer': True, 'producer': False, 'string_id': 'target-my-postgres-target', 'cmd_type': 'elb', 'stdio': 'stderr', 'name': 'target-my-postgres-target', 'event': 'time=2023-06-28 113310 name=target_postgres level=INFO message=Adding column: ALTER TABLE myschema."my_table" ADD COLUMN "created" timestamp without time zone', 'level': 'info', 'timestamp': '2023-06-28T17:33…visch
06/28/2023, 7:14 PMsimon_walter
06/28/2023, 7:15 PMvisch
06/28/2023, 7:18 PMsimon_walter
06/28/2023, 7:38 PMvisch
06/28/2023, 8:10 PMtarget-postgres
wouldn't default to using timestampz
instead of timestamp
we could probably just move the default over in meltanolabs
unless i'm missing somethingsimon_walter
06/28/2023, 11:15 PMtarget-postgres
, but am I correct to assume it would come down to changing this line to return TIMESTAMP(timezone=True)
?simon_walter
06/28/2023, 11:16 PMmain
visch
06/29/2023, 10:37 AM