Hey guys I need some help setting up incremental replication Meltano #getting-started

Hey guys, I need some help setting up incremental...

justin_cohen

10/26/2023, 12:56 PM

Hey guys, I need some help setting up incremental replication on a stream. Here's my meltano.yml file:

Copy code

version: 1
default_environment: dev
project_id: 9a761bbb-6e95-4c10-9e31-cac2f6bb74eb
environments:
- name: dev
- name: staging
- name: prod
plugins:
  extractors:

  - name: tap-rest-api-msdk--avp
    inherit_from: tap-rest-api-msdk
    pip_url: git+<https://github.com/Widen/tap-rest-api-msdk.git>
    config:
      api_url: <http://127.0.0.1:8000>
      streams:
      - name: AccCust
        params:
          table: test."raw_AccCust"
          api_key: ******
        path: /tables
        records_path: $.records[*]
    metadata:
      AccCust:
        replication-method: INCREMENTAL
        replication-key: RecLockedDate

loaders:
  - name: target-postgres
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/target-postgres.git>
    config:
      database: ******
      host: ******
      user: ******
      port: ****
      default_target_schema: avp
      add_record_metadata: false

My understanding is that on the first execution of the pipeline, the whole table is extracted and the replication key value is changed to be the maximum value in the replication key column of the stream. Then, upon the next pipeline run, the tap extracts all records, but only passes those with a value greater than the saved replication key value to the target for insertion. Am I wrong here? Given the above assumption, I was testing the incremental replication by running an initial ingestion to postgres. Then, I go to the source and change one record's RecLockedDate to greater than the replication key saved value. Then, I run the pipeline again, obviously expecting only one record to be inserted into the target table. Instead the whole source table is inserted again into the target. I'm not sure what I'm missing here. Let me know if more code snippets are needed for aid. One other strange thing. One of the last meltano outputs is the emission of the new target state, however checking in

.meltano/run/tap-rest-api-msdk--avp/state.json

the state present in this file is the previous state emitted at the last pipeline run, not this new state. Thanks guys, hope you can help. *This was posted in troubleshooting originally, but didn't get any attention, so I thought it might be better suited for getting started.

edgar_ramirez_mondragon

10/30/2023, 5:18 PM

My understanding is that on the first execution of the pipeline, the whole table is extracted and the replication key value is changed to be the maximum value in the replication key column of the stream. Then, upon the next pipeline run,

the tap extracts all records, but only passes those with a value greater than the saved replication key value to the target for insertion. Am I wrong here?

That's all correct. I think you have to use the

replication_key

tap setting instead of overriding metadata: https://github.com/Widen/tap-rest-api-msdk#stream-level-config-options

10 Views

Open in Slack

Previous Next