I have a job using `pipelinewise tap postgres==1 8 0` amp `p Meltano #troubleshooting

I have a job using `pipelinewise-tap-postgres==1.8...

dean_morin

07/08/2021, 10:53 PM

I have a job using

pipelinewise-tap-postgres==1.8.0

pipelinewise-target-snowflake==1.12.0

. I’m using log-based replication. All except one table end up doing a full table extract every time before moving on to log based. In between runs,

payload

in the

run

table in the meltano db looks like:

Copy code

{
  "singer_state": {
    "currently_syncing": null,
    "bookmarks": {
      "public-table_one": {
        "last_replication_method": "LOG_BASED",
        "lsn": 1286946685864,
        "version": 1622778419625
      }
    }
  }
}

When the job starts, the first thing it does is (logs for the tap):

Copy code

Beginning sync of stream(public-table_two) with sync method(logical_initial)
Performing initial full table sync
Current Server Encoding: UTF8
Current Client Encoding: UTF8
hstore is UNavailable
Beginning new Full Table replication 1625783275746

After it writes the first batch to snowflake, the target logs:

Copy code

Emitting state 
{
    "currently_syncing": "public-discrepancy",
    "bookmarks": {
        "public-table_one": {
            "last_replication_method": "LOG_BASED",
            "lsn": 1286946685864,
            "version": 1622778419625
        },
        "public-table_two": {
            "last_replication_method": "LOG_BASED",
            "lsn": 1287349339280,
            "version": 1625783275746,
            "xmin": 31435
        },
        "public-table_three": {
            "last_replication_method": "LOG_BASED"
        },
        "public-table_four": {
            "last_replication_method": "LOG_BASED"
        }
    }
}

However it continues to do a sync method(logical_initial) for every table in turn. Only

table_one

avoids this. Any ideas?

aaronsteers

07/08/2021, 11:02 PM

This is very odd. Seems like either a bug in the tap, or the target is modifying the state (it shouldn't), or a mismatch of meltano job ids. Assuming it's not the second or third item, can you try syncing only table_two and/or only table_three?

aaronsteers

07/08/2021, 11:04 PM

The other tables might be corrupting their own state, failing silently in some internal operation, or else colliding with each other in some bizarre way.

aaronsteers

07/08/2021, 11:04 PM

If they are failing silently with some internal operation, running one at a time of the ones failing would hopefully prove that.

dean_morin

07/08/2021, 11:09 PM

I’ll give that a shot

dean_morin

07/08/2021, 11:13 PM

While that’s getting deployed, here’s the relevant part of my `meltano.yml`:

Copy code

version: 1
plugins:
  extractors:
  - name: tap-postgres
    variant: transferwise
    pip_url: pipelinewise-tap-postgres==1.8.0
    config:
      filter_schemas: public
      default_replication_method: LOG_BASED

  - name: tap-compliance
    inherit_from: tap-postgres
    config:
      dbname: compliance
    select_filter:
    - _meltano.heartbeat
    - public-table_one
    - public-table_two
    - public-table_three
    - public-table_four

  loaders:
  - name: target-snowflake
    variant: transferwise
    pip_url: pipelinewise-target-snowflake==1.12.0
    config:
      dbname: MELTANO
      warehouse: MELTANO
      user: MELTANO_ROBOT
      file_format: FILE_FORMATS.CSV
      role: MELTANO
      schema_mapping:
        _meltano:
          target_schema: _MELTANO

  - name: target-snowflake-compliance
    inherit_from: target-snowflake
    config:
      schema_mapping:
        public:
          target_schema: COMPLIANCE

dean_morin

07/08/2021, 11:21 PM

Oh, actually there’s something else I should try. The target schema

_MELTANO

in snowflake doesn’t actually get created in snowflake. Let me try removing the leading underscore.

dean_morin

07/09/2021, 5:42 PM

I simplified my meltano config, deleted the relevant jobs from the meltano metadata db, and deleted everything from snowflake. It’s now working as expected (there is a bookmark for all tables after the job finishes, rather than just the one). Now I’m adding back in things to my meltano config. Hopefully this will show what is causing the issue.

dean_morin

07/10/2021, 12:26 AM

I’m running the original config and now not having any issues, so I don’t think my config was the problem after all (I deleted rows from the meltano job table and dropped the destination in snowflake first). I’m seeing the same issue with other jobs however. The only thing I can think of is, that I’ve mishandled some upgrades. Since creating these jobs, I’ve upgraded

pipeline-tap-postgres

from

1.7.1

1.8.0

, and

meltano

from

1.74.0

'1.77.0

dean_morin

07/10/2021, 12:27 AM

And I don’t think I’ve run

meltano upgrade

after updating the base meltano docker image, at least for some of the upgrades

dean_morin

07/12/2021, 11:49 PM

@aaronsteers Okay, this is not working as I thought. If I clear everything out and start a fresh new job, it’ll do an initial sync of every table, then use the log for subsequent runs. However, if I add any new tables to the config, they’ll have this issue. Every single time it runs, it’ll run with

sync method(logical_initial)

Open in Slack

Previous Next