hey all I m using the target sdk for a new internal connecto Meltano #singer-target-development

hey all - I'm using the target sdk for a new inter...

pnadolny

07/22/2021, 9:35 PM

hey all - I'm using the target sdk for a new internal connector and I'm running into an issue where I'm trying to process in batches but every time a state message comes through it drains my batch. I'm using tap snowflake which emits state after 1000 records but on the target side I want batches larger than that. I found the

_DRAIN_AFTER_STATE

setting here but noticed theres its not a property thats available for overriding. Was this intentional for a reason that I'm overlooking or would it be safe to add as a public property we can override?

pnadolny

07/22/2021, 9:44 PM

also by default it might make sense to compare the new state to current state before draining. In this case I get a message like

{"type": "STATE", "value": {"currently_syncing": "table_name"}}

after every 1000 messages so its not even useful state to output for bookmarking purposes. any thoughts?

aaronsteers

07/22/2021, 10:58 PM

Hi, @pnadolny. I think this relates to this issue: Target sink - optimization strategies for when to flush batches (#135) · Issues · Meltano / Meltano SDK for Singer Taps and Targets · GitLab

aaronsteers

07/22/2021, 11:00 PM

Could you open an issue for specifically what you'd like to see in terms of control? I like your idea of checking for some difference in the state message, and the plan has always been that we would need to expand the level of control for this drain behavior.

aaronsteers

07/22/2021, 11:04 PM

For instance, we could add a min desired record count, but there are other things to consider also, like should we combine this with a max hold-time, so if 9999 records are held for over 4 hours, we eventually flush them anyway. And the other complexity is that we need more advanced tracking on which STATE messages are safe to send downstream and when, if we are not forcing all streams to drain with each STATE message. All solvable problems, but would love to talk more in an issue, and/or MR on the topic.

pnadolny

07/23/2021, 4:27 PM

@aaronsteers awesome thanks - I created an issue. It feels like 2 topics: what controls to expose and what optimizations/behaviors should be the default. I'll join the issue discussions

Open in Slack

Previous Next