mathew_fournier
06/01/2023, 10:24 PMstate
capability and setting a replication_key
. I’m using that key as a watermark in my query after digging it out of my state context; however, if I am in a situation where I return zero events (because my source has no new data since I extracted last), meltano is updating my state to be {}
which means the next time I ask for self.get_starting_replication_key_value(context)
I get None
which means I end up doing a full import… Have I missed something? I don’t see what I can do to adapt the iterable returned in get_records
to avoid this if the iterable is empty.mathew_fournier
06/01/2023, 10:29 PMtarget-bigquery
but not with target-jsonl
. Must be missing a setting somewhere. Quite the footgun thoughvisch
06/02/2023, 12:38 PMstate
listed as a capability, I've missed this so many times and hunted for it that you should just start there.
After you do that then I"d make sure the tap is properly sending the state messages you'd expect ie meltano invoke tap-name > out
, after that's true, then the target needs to echo them back so look at that poetry run tap-name --config config.json --catalog catalog.json --state state.json | meltano invoke target-name > target_out
, some targets are bad about sending a state message back as the data is "committed" to the target and they may wait until all data is sent before sending a state message back, looking at that file as it's running should make it pretty obviousmathew_fournier
06/02/2023, 2:43 PMmathew_fournier
06/02/2023, 6:21 PMmeltanoSDK
is wrong, or if target-bigquery
is wrong.mathew_fournier
06/02/2023, 6:26 PMget_records
returns an empty iterable, because there were no new messages to consume based off the replication_key
being used, then no state message is emitted. target-bigquery
doesn’t check if the state object it is consuming is empty or not before writing the state --if it doesn’t process at least one message then it writes an empty state message (which wipes out the stored state). This happens here: https://github.com/meltano/sdk/blob/main/singer_sdk/streams/core.py#L529. So the question is:
• should the meltano SDK emit a state message if it produces no records (i.e. it should propagate the existing state). You can see some non-meltano taps, such as tap-prometheus
take this approach -> they will always emit a state message, even if they have incrementally processed 0 message. See -> https://github.com/meshcloud/tap-prometheus/blob/master/tap_prometheus/__init__.py#L204
• is target-bigquery
at fault because it assumes it getting at least one new message on every run because it’s not looking when it’s writing out the state (it assumes a state message exists 100%).
• is everyone at fault 😆
• there is no meltano spec around this as far as I can see regarding what should be the behavior of the stage message when a incremental run happens that produces no new record. I have to take the code as face value (a system is what it does) for the behavior. Can anyone comment here?visch
06/28/2023, 1:03 PMmathew_fournier
10/25/2023, 4:20 PMmathew_fournier
10/25/2023, 4:20 PMvisch
10/25/2023, 4:22 PM