I have a job using `pipelinewise-tap-postgres==1.8...
# troubleshooting
d
I have a job using
pipelinewise-tap-postgres==1.8.0
&
pipelinewise-target-snowflake==1.12.0
. I’m using log-based replication. All except one table end up doing a full table extract every time before moving on to log based. In between runs,
payload
in the
run
table in the meltano db looks like:
Copy code
{
  "singer_state": {
    "currently_syncing": null,
    "bookmarks": {
      "public-table_one": {
        "last_replication_method": "LOG_BASED",
        "lsn": 1286946685864,
        "version": 1622778419625
      }
    }
  }
}
When the job starts, the first thing it does is (logs for the tap):
Copy code
Beginning sync of stream(public-table_two) with sync method(logical_initial)
Performing initial full table sync
Current Server Encoding: UTF8
Current Client Encoding: UTF8
hstore is UNavailable
Beginning new Full Table replication 1625783275746
After it writes the first batch to snowflake, the target logs:
Copy code
Emitting state 
{
    "currently_syncing": "public-discrepancy",
    "bookmarks": {
        "public-table_one": {
            "last_replication_method": "LOG_BASED",
            "lsn": 1286946685864,
            "version": 1622778419625
        },
        "public-table_two": {
            "last_replication_method": "LOG_BASED",
            "lsn": 1287349339280,
            "version": 1625783275746,
            "xmin": 31435
        },
        "public-table_three": {
            "last_replication_method": "LOG_BASED"
        },
        "public-table_four": {
            "last_replication_method": "LOG_BASED"
        }
    }
}
However it continues to do a sync method(logical_initial) for every table in turn. Only
table_one
avoids this. Any ideas?
a
This is very odd. Seems like either a bug in the tap, or the target is modifying the state (it shouldn't), or a mismatch of meltano job ids. Assuming it's not the second or third item, can you try syncing only table_two and/or only table_three?
The other tables might be corrupting their own state, failing silently in some internal operation, or else colliding with each other in some bizarre way.
If they are failing silently with some internal operation, running one at a time of the ones failing would hopefully prove that.
d
I’ll give that a shot
While that’s getting deployed, here’s the relevant part of my `meltano.yml`:
Copy code
version: 1
plugins:
  extractors:
  - name: tap-postgres
    variant: transferwise
    pip_url: pipelinewise-tap-postgres==1.8.0
    config:
      filter_schemas: public
      default_replication_method: LOG_BASED

  - name: tap-compliance
    inherit_from: tap-postgres
    config:
      dbname: compliance
    select_filter:
    - _meltano.heartbeat
    - public-table_one
    - public-table_two
    - public-table_three
    - public-table_four

  loaders:
  - name: target-snowflake
    variant: transferwise
    pip_url: pipelinewise-target-snowflake==1.12.0
    config:
      dbname: MELTANO
      warehouse: MELTANO
      user: MELTANO_ROBOT
      file_format: FILE_FORMATS.CSV
      role: MELTANO
      schema_mapping:
        _meltano:
          target_schema: _MELTANO

  - name: target-snowflake-compliance
    inherit_from: target-snowflake
    config:
      schema_mapping:
        public:
          target_schema: COMPLIANCE
Oh, actually there’s something else I should try. The target schema
_MELTANO
in snowflake doesn’t actually get created in snowflake. Let me try removing the leading underscore.
I simplified my meltano config, deleted the relevant jobs from the meltano metadata db, and deleted everything from snowflake. It’s now working as expected (there is a bookmark for all tables after the job finishes, rather than just the one). Now I’m adding back in things to my meltano config. Hopefully this will show what is causing the issue.
I’m running the original config and now not having any issues, so I don’t think my config was the problem after all (I deleted rows from the meltano job table and dropped the destination in snowflake first). I’m seeing the same issue with other jobs however. The only thing I can think of is, that I’ve mishandled some upgrades. Since creating these jobs, I’ve upgraded
pipeline-tap-postgres
from
1.7.1
to
1.8.0
, and
meltano
from
1.74.0
to
'1.77.0
.
And I don’t think I’ve run
meltano upgrade
after updating the base meltano docker image, at least for some of the upgrades
@aaronsteers Okay, this is not working as I thought. If I clear everything out and start a fresh new job, it’ll do an initial sync of every table, then use the log for subsequent runs. However, if I add any new tables to the config, they’ll have this issue. Every single time it runs, it’ll run with
sync method(logical_initial)