silverbullet1
09/18/2023, 6:09 PMtarget-postgres
(meltanolabs variant)
1. Is the batch size of fixed ? How do I find out what it is and how to change it ? I am noticing that in my logs, I see a different number everytime.
Target sink for foo is full. Draining..
METRIC: {"type": "counter", "metric": "record_count", "value": 28711,..}
Target sink for foo is full. Draining..
METRIC: {"type": "counter", "metric": "record_count", "value": 40006,..}
2. When I do a full refresh, I notice that my pod (running the pipeline) crashes due to OOM. If we are using batching, that shouldn't happen right? or am I missing something.
3. Is the batch size decided by the tap, target or both ?
4. What strategy is used to write state info, I am noticing that the job is writing state very less frequently! (once an hour)..more details in 2nd commentsilverbullet1
09/19/2023, 5:15 AMMAX_SIZE_DEFAULT = 10000 mentioned here. https://github.com/meltano/sdk/issues/1626
But I am seeing 28k, 40k, etc in the above logs, am I checking the wrong thing ?silverbullet1
09/19/2023, 5:15 AM[2023-09-19, 05:44:09 UTC]
INFO | tap-redshift | Beginning incremental sync of 'foo-bar'.
but incremental state got updated almost an hour later ...
[2023-09-19, 06:31:51 UTC] Incremental state has been updated at 2023-09-19 06:31:51.323767.
So I would lose almost an hour's work in case something went wrong ?silverbullet1
09/20/2023, 5:52 AMuser
10/02/2023, 2:50 PMalso looks like we are not updating state after writing each batch record@silverbullet1 this might be related to https://sdk.meltano.com/en/latest/classes/singer_sdk.Stream.html#singer_sdk.Stream.is_sorted in the tap, I'm not positive though. If the tap doesnt sent sorted data then the SDK cant know that all the data has arrived up to the new state timestamp until the run is completed. If the records are sorted then it can bookmark state more frequently to save progress and resume after a failure where it left off vs starting over