justin_cohen
10/26/2023, 12:56 PMversion: 1
default_environment: dev
project_id: 9a761bbb-6e95-4c10-9e31-cac2f6bb74eb
environments:
- name: dev
- name: staging
- name: prod
plugins:
extractors:
- name: tap-rest-api-msdk--avp
inherit_from: tap-rest-api-msdk
pip_url: git+<https://github.com/Widen/tap-rest-api-msdk.git>
config:
api_url: <http://127.0.0.1:8000>
streams:
- name: AccCust
params:
table: test."raw_AccCust"
api_key: ******
path: /tables
records_path: $.records[*]
metadata:
AccCust:
replication-method: INCREMENTAL
replication-key: RecLockedDate
loaders:
- name: target-postgres
variant: meltanolabs
pip_url: git+<https://github.com/MeltanoLabs/target-postgres.git>
config:
database: ******
host: ******
user: ******
port: ****
default_target_schema: avp
add_record_metadata: false
My understanding is that on the first execution of the pipeline, the whole table is extracted and the replication key value is changed to be the maximum value in the replication key column of the stream. Then, upon the next pipeline run,
the tap extracts all records, but only passes those with a value greater than the saved replication key value to the target for insertion. Am I wrong here?
Given the above assumption, I was testing the incremental replication by running an initial ingestion to postgres. Then, I go to the source and change one record's RecLockedDate to greater than the replication key saved value.
Then, I run the pipeline again, obviously expecting only one record to be inserted into the target table. Instead the whole source table is inserted again into the target. I'm not sure what I'm missing here.
Let me know if more code snippets are needed for aid.
One other strange thing. One of the last meltano outputs is the emission of the new target state, however checking in .meltano/run/tap-rest-api-msdk--avp/state.json the state present in this file is the previous state emitted at the last pipeline run, not this new state.
Thanks guys, hope you can help.
*This was posted in troubleshooting originally, but didn't get any attention, so I thought it might be better suited for getting started.edgar_ramirez_mondragon
10/30/2023, 5:18 PMMy understanding is that on the first execution of the pipeline, the whole table is extracted and the replication key value is changed to be the maximum value in the replication key column of the stream. Then, upon the next pipeline run,
the tap extracts all records, but only passes those with a value greater than the saved replication key value to the target for insertion. Am I wrong here?That's all correct. I think you have to use the
replication_key tap setting instead of overriding metadata: https://github.com/Widen/tap-rest-api-msdk#stream-level-config-options