andrey_tatarinov
02/25/2023, 6:39 PMbatch_size_rows: 500000 empirically it takes about 4Gb in memory in target-bigquery before flush.
The issue is in the combination of target-bigquery (I use pipelinewise variant) and state store:
Data flows fast, I get 500K rows in ~10 seconds, 10 more seconds are spent to write batch to BigQuery.
And then everything halts on Writing state to AWS S3 for almost 2 minutes, then state is finally written and process continues.andrey_tatarinov
02/25/2023, 6:42 PMBaseFilesystemStateStoreManager does not distinguish between it's own lock and someone else's lock.
So it waits for lock_timeout_seconds even if it was the same process that updated lock 10 seconds ago.
It seems inefficient to me.andrey_tatarinov
02/25/2023, 6:45 PMalexander_butler
02/25/2023, 6:50 PMedgar_ramirez_mondragon
02/25/2023, 6:51 PMI did a little digging and it turns out, thatYeah, that sounds like a 🐛 Would you mind logging an issue with the problem and your proposed solution? I can do that later too.does not distinguish between it’s own lock and someone else’s lock.BaseFilesystemStateStoreManager
andrey_tatarinov
02/25/2023, 6:51 PMaaronsteers
02/25/2023, 7:10 PMaaronsteers
02/25/2023, 7:13 PMandrey_tatarinov
02/25/2023, 7:15 PMandrey_tatarinov
02/25/2023, 7:16 PMalexander_butler
02/25/2023, 7:18 PMalexander_butler
02/25/2023, 7:19 PMandrey_tatarinov
02/25/2023, 7:19 PMalexander_butler
02/25/2023, 7:20 PM