erkan_ciftci
10/13/2022, 11:55 AMmeltano run tap-postgres target-postgres
It works perfect.
***I deleted some of the rows (210 records) hardly from source db table.
and re-run the same pipeline which I'm expecting it will truncate or delete all the records from target and insert all the records from scracth because "*replication_method=FULL_TABLE"*. (h*ard_delete, add_metadata_columns, primary_key_required* configs are true)
meltano --log-level=debug run tap-postgres target-postgres
but it tries to replicate 2290 rows again and tries to delete the records which is not null the "*_sdc_deleted_at*" column
unfortunatelly there is none of any record's "_sdc_deleted_at" column has any value so that it can't delete anything (DELETE 0) from the target table and total row count stay 2500 instead of the 2290.
Can you please guide or help me about this issue or am i missing something?
btw, I also tried to --full-refresh parameter and it deosnt affect anything again.
meltano run tap-postgres target-postgres --full-refresh
meltano.yaml
=============
plugins:
extractors:
- name: tap-postgres
variant: transferwise
pip_url: pipelinewise-tap-postgres
config:
host: localhost
port: 54321
dbname: aipg_test_db
user: postgres
default_replication_method: FULL_TABLE
filter_schemas: cus
max_run_seconds: 43200
ssl: false
break_at_end_lsn: true
select:
- cus-customer.*
loaders:
- name: target-postgres
variant: transferwise
pip_url: pipelinewise-target-postgres
config:
host: localhost
port: 54322
dbname: pg2pg_meltano
user: postgres
schema_mapping:
cus.target_schema: raw_cus
default_target_schema: test
# to catch the hard delete records
hard_delete: true
add_metadata_columns: true
primary_key_required: true
validate_records: true
temp_dir:
etl.log (after delete from source table)
=======
[info ] {"type": "ACTIVATE_VERSION", "stream": "cus-customer", "version": 1665658359861} cmd_type=elb consumer=False name=tap-postgres producer=True stdio=stdout string_id=tap-postgres
[info ] {"type": "STATE", "value": {"bookmarks": {"cus-customer": {"last_replication_method": "FULL_TABLE", "version": 1665658359861, "xmin": null}}, "currently_syncing": null}} cmd_type=elb consumer=False name=tap-postgres producer=True stdio=stdout string_id=tap-postgres
[info ] time=2022-10-13 135241 name=singer level=INFO message=METRIC: {"type": "counter", "metric": "record_count", "value": 2290, "tags": {}} cmd_type=elb consumer=False name=tap-postgres producer=True stdio=stderr string_id=tap-postgres
[debug ] head producer completed first as expected name=tap-postgres
[debug ] tail consumer is next block, wrapping up
[info ] time=2022-10-13 135241 name=target_postgres level=INFO message=Loading 2290 rows into 'raw_cus."customer"' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
[info ] time=2022-10-13 135242 name=target_postgres level=INFO message=Loading into raw_cus."customer": {"inserts": 0, "updates": 2290, "size_bytes": 632142} cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
[info ] time=2022-10-13 135242 name=target_postgres level=INFO message=Creating index on 'raw_cus."customer"' table on '_sdc_deleted_at' column(s)... CREATE INDEX IF NOT EXISTS i_customer__sdc_deleted_at ON raw_cus."customer" (_sdc_deleted_at) cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target…thomas_briggs
10/13/2022, 12:45 PM--full-refresh option resets Meltano's internal state for the pipeline, which will cause the tap to retrieve all data from the source. It has no impact on the target, however. If you want all the rows in the destination DB to be removed you'll have to remove them somehow first.
a. Think of it this way: If the target were JSON files, what would you expect to happen? Nothing, right? Because there's no way to remove data from JSON files.
2. Detecting hard deleted columns requires using LOG_BASED replication mode. Rows that have been hard deleted no longer exist in the database so there's no way for INCREMENTAL or FULL_TABLE modes to know they were ever there.erkan_ciftci
10/13/2022, 1:32 PMthomas_briggs
10/13/2022, 2:06 PMthomas_briggs
10/13/2022, 2:07 PMhard_delete option is enabled. That will remove any rows marked as deleted, They'll only be marked as deleted if the tap says there were deleted though, and that can only happen when using LOG_BASED replication.erkan_ciftci
10/13/2022, 3:19 PMthomas_briggs
10/13/2022, 3:26 PMfelix_kenton
11/02/2022, 3:25 PMtap-postgres and target-snowflake.
The documentation for ACTIVATE_VERSION states precisely that it is designed to be used with the FULL_TABLE option for this use case:
This is where theIn theory the target should be able to handle “delete everything that wasn’t sent in this run”, but after trying various combinations of parameters I’ve given up trying to get deletes working (either soft or hard!) 😞message comes in! TheACTIVATE_VERSIONmessage type is a mechanism that is used in full table sync mode which tells the target to disregard all previously received records and only consider the current version as “active”. By using this mechanism the sync between the tap and target can properly delete inactive versions of the data in the destination to clean up any stale records that were hard deleted.ACTIVATE_VERSION
thomas_briggs
11/02/2022, 4:04 PMfelix_kenton
11/02/2022, 4:20 PMerkan_ciftci
11/02/2022, 5:28 PMfelix_kenton
11/03/2022, 2:48 PM