hey all - I'm using the target sdk for a new inter...
# singer-target-development
p
hey all - I'm using the target sdk for a new internal connector and I'm running into an issue where I'm trying to process in batches but every time a state message comes through it drains my batch. I'm using tap snowflake which emits state after 1000 records but on the target side I want batches larger than that. I found the
_DRAIN_AFTER_STATE
setting here but noticed theres its not a property thats available for overriding. Was this intentional for a reason that I'm overlooking or would it be safe to add as a public property we can override?
also by default it might make sense to compare the new state to current state before draining. In this case I get a message like
{"type": "STATE", "value": {"currently_syncing": "table_name"}}
after every 1000 messages so its not even useful state to output for bookmarking purposes. any thoughts?
Could you open an issue for specifically what you'd like to see in terms of control? I like your idea of checking for some difference in the state message, and the plan has always been that we would need to expand the level of control for this drain behavior.
For instance, we could add a min desired record count, but there are other things to consider also, like should we combine this with a max hold-time, so if 9999 records are held for over 4 hours, we eventually flush them anyway. And the other complexity is that we need more advanced tracking on which STATE messages are safe to send downstream and when, if we are not forcing all streams to drain with each STATE message. All solvable problems, but would love to talk more in an issue, and/or MR on the topic.
p
@aaronsteers awesome thanks - I created an issue. It feels like 2 topics: what controls to expose and what optimizations/behaviors should be the default. I'll join the issue discussions