Is there something I’m missing here. Implementing...
# getting-started
m
Is there something I’m missing here. Implementing a custom tap using the meltanosdk. I’m using the
state
capability and setting a
replication_key
. I’m using that key as a watermark in my query after digging it out of my state context; however, if I am in a situation where I return zero events (because my source has no new data since I extracted last), meltano is updating my state to be
{}
which means the next time I ask for
self.get_starting_replication_key_value(context)
I get
None
which means I end up doing a full import… Have I missed something? I don’t see what I can do to adapt the iterable returned in
get_records
to avoid this if the iterable is empty.
Ah. Plot thickens. It appears the target is controlling this. Time to go read more code 🤔 . I see this behavior with
target-bigquery
but not with
target-jsonl
. Must be missing a setting somewhere. Quite the footgun though
v
Check that your source has
state
listed as a capability, I've missed this so many times and hunted for it that you should just start there. After you do that then I"d make sure the tap is properly sending the state messages you'd expect ie
meltano invoke tap-name > out
, after that's true, then the target needs to echo them back so look at that
poetry run tap-name --config config.json --catalog catalog.json --state state.json | meltano invoke target-name > target_out
, some targets are bad about sending a state message back as the data is "committed" to the target and they may wait until all data is sent before sending a state message back, looking at that file as it's running should make it pretty obvious
m
I’ll take a look. My source definitely has state listed --if it doesn’t, no state gets set. Definitely seems like target-bigquery is the culprit here. Thanks for the tips for investigating what is going on, that is helpful. I will go spelunking today.
I can see the problem but I don’t think I can see the solution. I can’t decide if the
meltanoSDK
is wrong, or if
target-bigquery
is wrong.
if https://github.com/meltano/sdk/blob/main/singer_sdk/streams/core.py#L1269
get_records
returns an empty iterable, because there were no new messages to consume based off the
replication_key
being used, then no state message is emitted.
target-bigquery
doesn’t check if the state object it is consuming is empty or not before writing the state --if it doesn’t process at least one message then it writes an empty state message (which wipes out the stored state). This happens here: https://github.com/meltano/sdk/blob/main/singer_sdk/streams/core.py#L529. So the question is: • should the meltano SDK emit a state message if it produces no records (i.e. it should propagate the existing state). You can see some non-meltano taps, such as
tap-prometheus
take this approach -> they will always emit a state message, even if they have incrementally processed 0 message. See -> https://github.com/meshcloud/tap-prometheus/blob/master/tap_prometheus/__init__.py#L204 • is
target-bigquery
at fault because it assumes it getting at least one new message on every run because it’s not looking when it’s writing out the state (it assumes a state message exists 100%). • is everyone at fault 😆 • there is no meltano spec around this as far as I can see regarding what should be the behavior of the stage message when a incremental run happens that produces no new record. I have to take the code as face value (a system is what it does) for the behavior. Can anyone comment here?
v
Did this get addressed? Had it in my slack reminders looks like a pretty big issue with the target implementation if state gets cleared
m
apologies for the late reply
v
np, glad it's addressed in the SDK now!