Writing a test today for `repliction_key`'s for a ...
# singer-tap-development
v
Writing a test today for `repliction_key`'s for a sql based target based on the sdk. There's some behavior I didn't realize Meltano is doing in the catalog with metadata being set. 🧵 for more details so I don't spam the channel
meltano.yml
Copy code
version: 1
send_anonymous_usage_stats: true
project_id: "tap-postgres"
plugins:
  extractors:
  - name: "tap-postgres"
    namespace: "tap_postgres"
    pip_url: -e .
    capabilities:
    - state
    - catalog
    - discover
    config:
      sqlalchemy_url: "<postgresql://postgres:postgres@localhost:5432/postgres>"
    settings:
    - name: sqlalchemy_url
      kind: password
    select:
      - "public-test_replication_key.*"
    metadata:
      public-test_replication_key:
        replication-key: "updated_at"
        replication-method: "INCREMENTAL"
  loaders:
  - name: target-jsonl
    variant: andyh1203
    pip_url: target-jsonl
Outputs a catalog with
meltano invoke --dump=catalog tap-postgres > catalog.json
Puts
Copy code
replication-key: "updated_at"
        replication-method: "INCREMENTAL"
In the
stream
of
public-test_replication_key
's breadcrumb=[],
metadata
object , and it also adds a key to the top level object of
stream
of
"replication_key":"updated_at"
(I didn't instruct this to happen which throws me a bit). Without that
replication_key
at the top level the tap doesn't' realize it should be doing an incremental sync which is odd to me. Maybe this is a bug with the SDK? Also seems interesting that meltano adds this key to the catalog? Maybe Meltano is on purpose and the sdk isn't? Maybe I'm mistaken in some way PR Incoming with tap-postgres which should show this in a little bit of an easier form to read
a
Thanks for raising. Recapping from office hours... On first glance, I think we should probably get this from either the stream definition or the metadata
[]
breadcrumb. I'd have to look a bit deeper into the spec to confirm. cc @edgar_ramirez_mondragon
e
v
Yes, and it looks like the Singer SDK requires you to set it at the stream level as well. Hmm maybe as @pat_nadolny pointed at maybe we should at this to the hub docs for the singer spec. There's a really good section there for replication-key and this portion isn't there
Is it a bug that the SDK requires it to be at the stream level as well? Or do we want to just leave it for now?
a
Source docs seems to confirm that the metadata level should be the authoritative. https://github.com/singer-io/getting-started/blob/ca5c56f16e67e0c2da49e9bb94d6f52fb19c9de6/docs/DISCOVERY_MODE.md#example-2
I'd suggest we coalesce from one to the other in the SDK when parsing - so one or the other would be sufficient.
@edgar_ramirez_mondragon - what do you think?
e
I’d suggest we coalesce from one to the other in the SDK when parsing - so one or the other would be sufficient.
I agree. Although after diving a bit I can’t tell where in the SDK that’s currently being read from the catalog entry’s top-level 🤔