When using postgres tap and target (transferwise) ...
# plugins-general
m
When using postgres tap and target (transferwise) with FULL_TABLE sync. I'm realizing it's keeping all the data every time the sync happens. Is there some best practice to keep around the last few syncs or remove old data? I can't use log based because GCP CloudSQL and need to get all the columns in for Incremental. I tried looking through the meltano docs and stitchdata https://www.stitchdata.com/docs/replication/replication-methods/full-table#limitations but couldn't find anything.
d
@mark_poole Do your source tables have primary keys that could be used in the target to UPDATE existing rows instead of always INSERTing new ones?
m
@douwe_maan the majority if not all of the tables have primary keys
d
Are those primary keys making it into the tables created in the destination DB? Or do we end up with duplicate records there as the graph suggests?
m
Copy code
SELECT
    pg_database.datname,
    pg_size_pretty(pg_database_size(pg_database.datname)) AS size
    FROM pg_database;
shows something completely different
I'm going to raise a gcp support ticket, their UI says 1tb in use, the psql server (via that command) shows <50G
sorry about bringing that here first, I couldn't find duplicate rows and the tables match in size closely
d
No worries, glad it doesn't appear to be a real issue with the tap or target!
m
Anyone else seeing this, GCP uses WAL for recovery which takes up a ton of space if you are re-writing your largest tables to the DB once an hour 🙂
a
@mark_poole - I haven’t with GCP specifically, but in other platforms, yes, this definitely comes up often. For Snowflake, for instance, we would change the retention time (aka “time travel”) to zero or 24 hours in order to reduce the redundant disk space consumption. Do you know if GCP has any similar configurability?
m
Yes, I think that would cover it. I turned it off because i'm happy with daily backups only in development and there is a dedicated analytics server in production that can be rebuilt from other data automatically (removing the need for point in time backups)
Appreciate your help
a
Happy to help! And I wanted to better understand the GCP side anyway, so thank you for circling back to confirm.