Hi all, I have developed a tap to extract all data...
# troubleshooting
a
Hi all, I have developed a tap to extract all databases, schemas, and tables of Redshift (just the catalog, not the rows of the tables). I use the
get_child_context
method to recursively discover the Redshift catalog. Everything works but the tap generates a lot of
STATE
messages that I don't need because every run is a full scan of the catalog to generate soft deletions. Is there a way to disable the
STATE
messages?
t
Hi, I'm not sure if this is a proper approach as I am new here but I would change Stream.STATE_MSG_FREQUENCY
a
It didn't work. 😞
Adding @douwe_maan for visibility.
e
I’m guessing
Stream.STATE_MSG_FREQUENCY
didn’t work because there’s a lot of context switching between parent and child streams (which reset the record count) There’s no way to completely disable state messages at the moment, but can you try setting the streams’
state_partitioning_keys = {}
attribute. That should at least produce smaller state messages. I’m trying to think if it’d be safe to omit writing state messages if the state hasn’t changed from a previous iteration.
a
@alberto_miorin - The below issue describes the issue you are seeing, I think, along with a path we could follow to mitigate. Since child streams create a new instance of the stream class, they don't have visibility to prior-sent SCHEMA messages. However, there are some ways we could work around this using a global cache of last-sent SCHEMA messages and/or class methods to cache/check/send schema messages. https://github.com/meltano/sdk/issues/1061 This is Accepting Pull Requests if it is something you have time to help contribute.