Mapping of MySQL’s `timestamp` data type to Postgr...
# troubleshooting
s
Mapping of MySQL’s
timestamp
data type to Postgres’s timezone-aware
timestamptz
fails I have a MySQL tap which extracts several tables that all have a
created
column, holding a UTC timestamp. This is also my replication key for incremental replication. My target is Postgres. The problem is that Postgres stores timezone-ware timestamps in its custom
timestamptz
data type. I need to use
timestamptz
in order to let my frontend application automatically display the UTC timestamp in the local timezone. When running the tap to target job without any modifications, the
target-postgres
plugin maps MySQL
timestamp
to Postgres
timestamp
, which is wrong since that is Postgres’s timezone-unaware data type. Since the raw format of the stored data is the same, I had hoped that manually altering the tables in my target Postgres DB from
timestamp
to
timestamptz
might solve the problem. However, doing that causes the next run to detect an anomaly between input data type and target datatype (
timestamp
vs
timestamptz
) and causes the plugin to alter the existing column and versioning it (
created->created_202306281600
for example) and adding the
created
column (again with data type
timestamp
) for only the incrementally replicated rows. So clearly that is not working, Meltano logs of what’s happening below, as well as the example result table which corrupted
created
data. Has anyone run into this issue by chance? The only solve I can think of is a bit hacky, by using the Meltano SDK version of the Postgres target and trying to use a stream mapper to add a timezone offset to the timestamp manually, which hopefully the plugin then correctly detects as
timestamptz
and uses the correct data type in my Postgres target.
Copy code
╔═════╤═════╤══════════════════╤══════════════════════╗
║ id  │ ... │ created          │ created_202306281600 ║
╠═════╪═════╪══════════════════╪══════════════════════╣
║ 1   │ ... │                  │ 2023-06-27 18:00     ║
╟─────┼─────┼──────────────────┼──────────────────────╢
║ 2   │ ... │ 2023-06-28 15:00 │                      ║
╟─────┼─────┼──────────────────┼──────────────────────╢
║ ... │     │                  │                      ║
╚═════╧═════╧══════════════════╧══════════════════════╝
```INFO {'consumer': True, 'producer': False, 'string_id': 'target-my-postgres-target', 'cmd_type': 'elb', 'stdio': 'stderr', 'name': 'target-my-postgres-target', 'event': 'time=2023-06-28 113310 name=target_postgres level=INFO message=Table \'"my_table"\' exists', 'level': 'info', 'timestamp': '2023-06-28T173310.444618Z'} INFO {'consumer': True, 'producer': False, 'string_id': 'target-my-postgres-target', 'cmd_type': 'elb', 'stdio': 'stderr', 'name': 'target-my-postgres-target', 'event': 'time=2023-06-28 113310 name=target_postgres level=INFO message=Versioning column: ALTER TABLE myschema."my_table" RENAME COLUMN "created" TO "created_20230628_1133"', 'level': 'info', 'timestamp': '2023-06-28T173310.466170Z'} INFO {'consumer': True, 'producer': False, 'string_id': 'target-my-postgres-target', 'cmd_type': 'elb', 'stdio': 'stderr', 'name': 'target-my-postgres-target', 'event': 'time=2023-06-28 113310 name=target_postgres level=INFO message=Adding column: ALTER TABLE myschema."my_table" ADD COLUMN "created" timestamp without time zone', 'level': 'info', 'timestamp': '2023-06-28T17:33…
v
which postgres target are you using?
s
transferwise
v
Got it, I like the use case (working on target-postgres the meltanolabs version today) I've had some thoughts about this (data type mappings here https://github.com/MeltanoLabs/target-postgres/issues/106 ) . Have to think about an easier fix for you, I try to get folks to swap to the meltanolabs variant since it's so close to having all the features of the pipelinewise version (and it's actively developed with the sdk) I'll think about this a bit and get back
s
Thanks Derek, I appreciate your input!
v
This one honestly looks fairly staright forward, I don't see why
target-postgres
wouldn't default to using
timestampz
instead of
timestamp
we could probably just move the default over in
meltanolabs
unless i'm missing something
s
I have been bogged down with some other work, so I have not looked too deep into the source for
target-postgres
, but am I correct to assume it would come down to changing this line to
return TIMESTAMP(timezone=True)
?
I suppose if that works out I could fork the target and install my own version until this potentially makes it into MeltabnoLabs
main
v
Yeah that should be it! At a minimum if you could throw an issue in (the initial text above is good enough)