Hey Everyone! I wrote my first simple extractor f...
# getting-started
p
Hey Everyone! I wrote my first simple extractor for a JSON REST API and am struggling to figure out how to approach incremental replication / state based on a
modified_at
field for certain streams. Is there a simple example that exists anywhere to model mine from? I've scoured the docs but haven't been able to find anything that I'm looking for. Apologies if this has been covered already!
a
Have you found the Stream.replication_key property? Generally, you can set that in your Stream definition, and then reference the starting value with a call to Stream.get_starting_timestamp() when you call out to the API. Normally, that call to
get_starting_timestamp()
will happen within a custom get_url_params() call.
I'm sure others may have simpler examples. Sorry these are more complex examples than are really needed for demonstration.
To your question of managing state, that will happen automatically by setting
replication_key
on the stream class. And the value of
get_starting_timestamp()
will automatically consider the bookmark value, as well as a
start_date
input config, if provided by the user.
c
tap-shopify
probably has a good variety of examples for replication key: https://github.com/Matatika/tap-shopify
p
thank you both! will take a look and let you know if I get stuck : )
i'm already setting replication keys, so hopefully not too far off. is it also necessary to set
replication_method
in the stream class or meltano config?
the shopify example is actually very analogous to what i'm working on, so that's helpful already
a
is it also necessary to set
replication_method
in the stream class or meltano config?
Nope. Setting the
replication_key
is sufficient and tells the Tap that you can support key-based incremental replication. The user can then select from INCREMENTAL or FULL_TABLE, as needed.
p
sweet, I think I have what I need now - much appreciated!
a
Great! Feel free to reach out in #C01PKLU5D1R if you need additional assist. And when you have something working well that you want to share back to the community, you can post into #C01UGBSJNG5 (or open an issue) to start the process of publishing your tap to hub.meltano.com
p
will do! this is for a client, but might as well publish when I'm done last qq - is there a good way to validate that the state bits are working locally before creating a full pipeline?
not sure if a
state.json
file gets saved or anything like that like
i do see this at the end of the run which looks promising
{"type": "STATE", "value": {"bookmarks": {"orders-received": {"replication_key": "modified", "replication_key_value": "2022-08-24T18:01:19.104156-06:00"}}}}
a
If running with Meltano, you can use
meltano state list
and
meltano state get
to validate the values generated. You can also seed/override values with
meltano state set
. In not using Meltano, i.e. if invoking directly, you can pass in a state file using the
--state
CLI arg.
will do! this is for a client, but might as well publish when I'm done
Totally your call. While we of course advocate for community contributions and open source, we want to support all kinds of business models. A nice benefit of putting on the hub (apart from attribution) is that the community can help with bug fixes and future iterations. 🙂
p
no skin off my back! vendor might be upset since they sell a BI suite, but it's garbage : )