Hi everyone I m not entirely certain if this is the most app Meltano #troubleshooting

Hi everyone, I'm not entirely certain if this is ...

alex

10/17/2023, 7:45 PM

Hi everyone, I'm not entirely certain if this is the most appropriate channel for my question, but I thought I'd give it a try. I'm currently attempting to perform a simple extract and load operation from a PostgreSQL source to a PostgreSQL target, and I'm facing an issue with hard deletes not working as expected. Below you can find my meltano.yml configuration:

Copy code

...
plugins:
  extractors:
  - name: tap-postgres
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-postgres.git>
    config:
      filter_schemas:
      - public
    select:
    - public-users.*
    metadata:
      '*':
        replication-method: FULL_TABLE

  loaders:
  - name: target-postgres
    variant: transferwise
    pip_url: pipelinewise-target-postgres
    config:
      hard_delete: true
      add_metadata_columns: true

The issue I'm encountering is that the

_sdc_deleted_at

column is not being updated on the users table in the target database. Both of my databases are running using the following docker compose entries:

Copy code

source-db:
    image: postgres:14.9-alpine3.18
    container_name: source_postgres
    environment:
      POSTGRES_DB: db
      POSTGRES_USER: dbuser
      POSTGRES_PASSWORD: dbpassword
    ports:
      - "15432:5432"

  target-db:
    image: postgres:14.9-alpine3.18
    command: ["postgres", "-c", "log_statement=all"]
    container_name: target_db
    environment:
      POSTGRES_DB: meltano_db
      POSTGRES_USER: meltano_user
      POSTGRES_PASSWORD: password
    ports:
      - "25432:5432"

For reference, here's my .env file:

Copy code

# SOURCE DB
TAP_POSTGRES_DATABASE='db'
TAP_POSTGRES_HOST='localhost'
TAP_POSTGRES_USER='dbuser'
TAP_POSTGRES_PASSWORD='dbpassword'
TAP_POSTGRES_PORT=15432

# TARGET DB
TARGET_POSTGRES_DBNAME='meltano_db'
TARGET_POSTGRES_HOST='localhost'
TARGET_POSTGRES_USER='meltano_user'
TARGET_POSTGRES_PASSWORD='password'
TARGET_POSTGRES_DEFAULT_TARGET_SCHEMA='loaded'
TARGET_POSTGRES_PORT=25432

Andy Carter

10/17/2023, 8:04 PM

I think I'm missing something here, but if you're setting

hard_delete: true

then aren't rows deleted from your source table being deleted from your target table, rather than updated? You wouldn't see the

_sdc_deleted_at

updating as those rows should be deleted completely from your target.

alex

10/17/2023, 8:05 PM

Hello Andy. The problem is that the rows are not deleted

alex

10/17/2023, 8:07 PM

I mentioned the

_sdc_deleted_at

to make it clear that the deleted rows (from source) are being ignored in the target db

Andy Carter

10/17/2023, 8:10 PM

Do you have the option to use log based replication? I'm not sure hard_delete and FULL_TABLE replication will work together. CHeck out the last para of the link below https://transferwise.github.io/pipelinewise/user_guide/metadata_columns.html

Andy Carter

10/17/2023, 8:11 PM

But I'm definitely not an expert here, someone else will chime in.

alex

10/17/2023, 8:13 PM

Thank you very much. It's quite clear from the link you sent

Please also note that Only <https://transferwise.github.io/pipelinewise/concept/replication_methods.html#log-based|Log Based> replication method detects delete row events.

visch

10/17/2023, 8:14 PM

Simple answer is https://github.com/meltano/sdk/issues/18 deletes do work for Full table refresh via Activate Version. That tap you chose doesn't support activate version. You could flop and go with tap-postgres from pipeline wise, and target-postgres from meltanolabs and it should work as I believe pipeline wise has activate version

visch

10/17/2023, 8:15 PM

Log based does deletes yes, but if you can do full refresh's I'd highly recommend going with full refreshes. If you really want log based then give the meltanolabs target-postgres a shot as it has log based now

alex

10/17/2023, 8:17 PM

I don't mind picking other variants. I just started experimenting. Btw is there a specific reason you recommend full refresh?

visch

10/17/2023, 8:19 PM

Someday I should make a video / write up about it. But everyone has different opinions, and the reality is it depends on how much data you have. If you have <100 million rows I'd do full refresh for everything as it's just much easier to think about and debug for everyone

alex

10/17/2023, 8:21 PM

OK so if I want to do a full table replication that supports deletes do I have to opt in to an experimental feature (activate_version)?

visch

10/17/2023, 8:22 PM

Activate version doesn't exist for the meltanolabs version of tap-postgres. There's easy ways to get there though,

meltano run dbt:drop_source_schema tap-postgres target-postgres dbt:run

visch

10/17/2023, 8:23 PM

This is basically saying 1. Drop the table 2. Run the sync 3. Store the successful sync in another table Boom deletes

alex

10/17/2023, 8:27 PM

Makes sense. Thank you. Just one last question. I guess full table replication doesn't work for the meltano variant of tap-postgres as well, right?

visch

10/17/2023, 8:28 PM

No, it works

alex

10/17/2023, 8:28 PM

🤔

alex

10/17/2023, 8:28 PM

sorry. I mean the deletes with the full table replication

user

10/17/2023, 9:02 PM

There's easy ways to get there though,
meltano run dbt:drop_source_schema tap-postgres target-postgres dbt:run

@visch I'd add that as of SDK 0.32.0 theres a new feature that allows you to configure

load_method

on the target to

overwrite

which will do this drop table for you before starting the load

Andy Carter

10/18/2023, 7:33 AM

Sounds cool. Trying to figure out how this plays with replication / primary keys though 🤯

andy_carter

10/23/2023, 10:16 PM

@visch what is 'drop source schema' in this context? A dbt command? How do I tell it which schema to drop? Feel like I'm missing something here.

andy_carter

10/23/2023, 10:23 PM

Ah got it https://discuss.meltano.com/t/15521319/hey-everyone-have-a-silly-question-with-regard-to-tap-spread#41992d24-6fe0-48a1-9da5-0033317783f0

3 Views

Open in Slack

Previous Next