Is there something I m missing here Implementing a custom ta Meltano #getting-started

Is there something I’m missing here. Implementing...

mathew_fournier

06/01/2023, 10:24 PM

Is there something I’m missing here. Implementing a custom tap using the meltanosdk. I’m using the

state

capability and setting a

replication_key

. I’m using that key as a watermark in my query after digging it out of my state context; however, if I am in a situation where I return zero events (because my source has no new data since I extracted last), meltano is updating my state to be

{}

which means the next time I ask for

self.get_starting_replication_key_value(context)

I get

None

which means I end up doing a full import… Have I missed something? I don’t see what I can do to adapt the iterable returned in

get_records

to avoid this if the iterable is empty.

mathew_fournier

06/01/2023, 10:29 PM

Ah. Plot thickens. It appears the target is controlling this. Time to go read more code 🤔 . I see this behavior with

target-bigquery

but not with

target-jsonl

. Must be missing a setting somewhere. Quite the footgun though

visch

06/02/2023, 12:38 PM

Check that your source has

state

listed as a capability, I've missed this so many times and hunted for it that you should just start there. After you do that then I"d make sure the tap is properly sending the state messages you'd expect ie

meltano invoke tap-name > out

, after that's true, then the target needs to echo them back so look at that

poetry run tap-name --config config.json --catalog catalog.json --state state.json | meltano invoke target-name > target_out

, some targets are bad about sending a state message back as the data is "committed" to the target and they may wait until all data is sent before sending a state message back, looking at that file as it's running should make it pretty obvious

mathew_fournier

06/02/2023, 2:43 PM

I’ll take a look. My source definitely has state listed --if it doesn’t, no state gets set. Definitely seems like target-bigquery is the culprit here. Thanks for the tips for investigating what is going on, that is helpful. I will go spelunking today.

mathew_fournier

06/02/2023, 6:21 PM

I can see the problem but I don’t think I can see the solution. I can’t decide if the

meltanoSDK

is wrong, or if

target-bigquery

is wrong.

mathew_fournier

06/02/2023, 6:26 PM

if https://github.com/meltano/sdk/blob/main/singer_sdk/streams/core.py#L1269

get_records

returns an empty iterable, because there were no new messages to consume based off the

replication_key

being used, then no state message is emitted.

target-bigquery

doesn’t check if the state object it is consuming is empty or not before writing the state --if it doesn’t process at least one message then it writes an empty state message (which wipes out the stored state). This happens here: https://github.com/meltano/sdk/blob/main/singer_sdk/streams/core.py#L529. So the question is: • should the meltano SDK emit a state message if it produces no records (i.e. it should propagate the existing state). You can see some non-meltano taps, such as

tap-prometheus

take this approach -> they will always emit a state message, even if they have incrementally processed 0 message. See -> https://github.com/meshcloud/tap-prometheus/blob/master/tap_prometheus/__init__.py#L204 • is

target-bigquery

at fault because it assumes it getting at least one new message on every run because it’s not looking when it’s writing out the state (it assumes a state message exists 100%). • is everyone at fault 😆 • there is no meltano spec around this as far as I can see regarding what should be the behavior of the stage message when a incremental run happens that produces no new record. I have to take the code as face value (a system is what it does) for the behavior. Can anyone comment here?

visch

06/28/2023, 1:03 PM

Did this get addressed? Had it in my slack reminders looks like a pretty big issue with the target implementation if state gets cleared

mathew_fournier

10/25/2023, 4:20 PM

see: https://github.com/meltano/sdk/issues/1750

mathew_fournier

10/25/2023, 4:20 PM

apologies for the late reply

visch

10/25/2023, 4:22 PM

np, glad it's addressed in the SDK now!

Open in Slack

Previous Next