I need help and some bright ideas. :sweat_smile: I...
# troubleshooting
m
I need help and some bright ideas. 😅 It's about tap-postgres log_replication method. The target is another postresql. Everything was working fine then I added another table to the config file. This table contains around 16million rows. The first job started like this: full table sync.
Copy code
[2023-07-18, 09:38:26 UTC] {subprocess.py:92} INFO - 2023-07-18T09:38:26.414445Z [info     ] time=2023-07-18 09:38:26 name=tap_postgres level=INFO message=Beginning sync of stream(public-table) with sync method(logical_initial) cmd_type=elb consumer=False name=tap-postgres producer=True stdio=stderr string_id=tap-postgres
[2023-07-18, 09:38:26 UTC] {subprocess.py:92} INFO - 2023-07-18T09:38:26.460792Z [info     ] time=2023-07-18 09:38:26 name=tap_postgres level=INFO message=Performing initial full table sync cmd_type=elb consumer=False name=tap-postgres producer=True stdio=stderr string_id=tap-postgres

....

[2023-07-18, 08:04:49 UTC] {subprocess.py:92} INFO - 2023-07-18T08:04:49.404981Z [info     ] Incremental state has been updated at 2023-07-18 08:04:49.404844.

....

[2023-07-18, 08:30:20 UTC] {subprocess.py:92} INFO - 2023-07-18T08:30:20.142742Z [info     ] time=2023-07-18 08:30:20 name=target_postgres level=INFO message=Loading 66838 rows into 'public."table"' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
[2023-07-18, 08:30:22 UTC] {subprocess.py:92} INFO - 2023-07-18T08:30:22.181746Z [info     ] time=2023-07-18 08:30:22 name=target_postgres level=INFO message=Loading into public."table": {"inserts": 0, "updates": 66838, "size_bytes": 15638185} cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
Job ends successfuly. The second job doesn't do log based replication but again starts as
initial full table sync
. Finds the table already exists in the database then starts updating all the record in batches. All consecutive jobs does the same,
initial full table sync
, updates all records. Question 1: Why is this the case? Same configuration, same database every other table - with thousands of rows not even a million - has been running on log based replication for months. In the mean time log size on the source grows. pg_wal was huge 50GB, checking by the disk usage over time it happened just after I included the above table in tap-config. I am no expert here but I'm assume postgres didn't flush the logs because it hasn't been replicated by any client yet. When we create a replication slot on postgresql we use
select pg_create_logical_replication_slot('pipelinewise_<database_name>', 'wal2json');
. It's defined on database not per table, right? Question 2: Before I included the table to tap config, wasn't postgres already generating logs for that table? Or it started writing wa logs for that table after some client (tap-postgres) tried to read about it?
j
It has been a while since I've done anything with logical replication in PGSQL. Logical replication is per database, but you need to "publish" tables. 1. Create table 2. Publish it 3. Initial replication (retrieves the LSN) 4. Incremental replication should use the LSN As for the log growing, you are correct, the logs have not been consumed, hence they pile up.
I have little to no idea how the
tap-postgres
utilises the publications, hence I could be way off here :3
m
But I haven't published any table before, so the previous tables published automatically? Could it be tap-postgres' initial read on the log that triggers the table publication under the hood?
I was trying to figure out why the tap has performed initial full table sync at each run. I'm still not sure but now I suspect it's related to meltano. Here is what's happening:
Copy code
plugins:
  extractors:
  - name: tap-postgres--view-01
    inherit_from: tap-postgres--select-schema
    config:
      ssl: false
      filter_schemas: public
      default_replication_method: LOG_BASED
      logical_poll_total_seconds: 600
      max_run_seconds: 1500
      break_at_end_lsn: true

plugins:
  extractors:
  - name: tap-postgres--select-schema
    inherit_from: tap-postgres
    select:
    - public-table1.*
    - public-table2.*
    - public-table3.*
This was my configuration. The select list had hundres of tables. It has been working
LOG_BASED
replication for months. Then I added another table
table-x
and the new table never started running log based replication always repeated the
initial full table sync
as I explained in the first post. Today, I figured every new table (tried for different tables and for different source dbs) shows the same behaviour. At each run, it logs
Beginning sync of stream(public-table1) with sync method(logical_initial)
. Then I removed all existing tables (
table-1, table-2, table-3
)in the select statement and added a single one
table-x
only. After performing full table sync once it started running log based replication for the next runs. Then I added back the previously removed tables (
table-1, table-2, table-3
), now they are stuck in performing initial full table sync while
table-x
works fine. I think there is a problem at updating the state. I don't know enough about meltano to figure out this alone.