Hi everyone, I'm not entirely certain if this is ...
# troubleshooting
a
Hi everyone, I'm not entirely certain if this is the most appropriate channel for my question, but I thought I'd give it a try. I'm currently attempting to perform a simple extract and load operation from a PostgreSQL source to a PostgreSQL target, and I'm facing an issue with hard deletes not working as expected. Below you can find my meltano.yml configuration:
Copy code
...
plugins:
  extractors:
  - name: tap-postgres
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-postgres.git>
    config:
      filter_schemas:
      - public
    select:
    - public-users.*
    metadata:
      '*':
        replication-method: FULL_TABLE

  loaders:
  - name: target-postgres
    variant: transferwise
    pip_url: pipelinewise-target-postgres
    config:
      hard_delete: true
      add_metadata_columns: true
The issue I'm encountering is that the
_sdc_deleted_at
column is not being updated on the users table in the target database. Both of my databases are running using the following docker compose entries:
Copy code
source-db:
    image: postgres:14.9-alpine3.18
    container_name: source_postgres
    environment:
      POSTGRES_DB: db
      POSTGRES_USER: dbuser
      POSTGRES_PASSWORD: dbpassword
    ports:
      - "15432:5432"

  target-db:
    image: postgres:14.9-alpine3.18
    command: ["postgres", "-c", "log_statement=all"]
    container_name: target_db
    environment:
      POSTGRES_DB: meltano_db
      POSTGRES_USER: meltano_user
      POSTGRES_PASSWORD: password
    ports:
      - "25432:5432"
For reference, here's my .env file:
Copy code
# SOURCE DB
TAP_POSTGRES_DATABASE='db'
TAP_POSTGRES_HOST='localhost'
TAP_POSTGRES_USER='dbuser'
TAP_POSTGRES_PASSWORD='dbpassword'
TAP_POSTGRES_PORT=15432

# TARGET DB
TARGET_POSTGRES_DBNAME='meltano_db'
TARGET_POSTGRES_HOST='localhost'
TARGET_POSTGRES_USER='meltano_user'
TARGET_POSTGRES_PASSWORD='password'
TARGET_POSTGRES_DEFAULT_TARGET_SCHEMA='loaded'
TARGET_POSTGRES_PORT=25432
a
I think I'm missing something here, but if you're setting
hard_delete: true
then aren't rows deleted from your source table being deleted from your target table, rather than updated? You wouldn't see the
_sdc_deleted_at
updating as those rows should be deleted completely from your target.
a
Hello Andy. The problem is that the rows are not deleted
I mentioned the
_sdc_deleted_at
to make it clear that the deleted rows (from source) are being ignored in the target db
a
Do you have the option to use log based replication? I'm not sure hard_delete and FULL_TABLE replication will work together. CHeck out the last para of the link below https://transferwise.github.io/pipelinewise/user_guide/metadata_columns.html
But I'm definitely not an expert here, someone else will chime in.
a
Thank you very much. It's quite clear from the link you sent
Please also note that Only <https://transferwise.github.io/pipelinewise/concept/replication_methods.html#log-based|Log Based> replication method detects delete row events.
v
Simple answer is https://github.com/meltano/sdk/issues/18 deletes do work for Full table refresh via Activate Version. That tap you chose doesn't support activate version. You could flop and go with tap-postgres from pipeline wise, and target-postgres from meltanolabs and it should work as I believe pipeline wise has activate version
Log based does deletes yes, but if you can do full refresh's I'd highly recommend going with full refreshes. If you really want log based then give the meltanolabs target-postgres a shot as it has log based now
a
I don't mind picking other variants. I just started experimenting. Btw is there a specific reason you recommend full refresh?
v
Someday I should make a video / write up about it. But everyone has different opinions, and the reality is it depends on how much data you have. If you have <100 million rows I'd do full refresh for everything as it's just much easier to think about and debug for everyone
a
OK so if I want to do a full table replication that supports deletes do I have to opt in to an experimental feature (activate_version)?
v
Activate version doesn't exist for the meltanolabs version of tap-postgres. There's easy ways to get there though,
meltano run dbt:drop_source_schema tap-postgres target-postgres dbt:run
This is basically saying 1. Drop the table 2. Run the sync 3. Store the successful sync in another table Boom deletes
a
Makes sense. Thank you. Just one last question. I guess full table replication doesn't work for the meltano variant of tap-postgres as well, right?
v
No, it works
a
🤔
sorry. I mean the deletes with the full table replication
u
There's easy ways to get there though,
meltano run dbt:drop_source_schema tap-postgres target-postgres dbt:run
@visch I'd add that as of SDK 0.32.0 theres a new feature that allows you to configure
load_method
on the target to
overwrite
which will do this drop table for you before starting the load
a
Sounds cool. Trying to figure out how this plays with replication / primary keys though 🤯
a
@visch what is 'drop source schema' in this context? A dbt command? How do I tell it which schema to drop? Feel like I'm missing something here.