silverbullet1
02/21/2024, 10:31 AMtarget-postgres
(SDK version), while meltano attempts inserting into temp table. I have already increased the statement timeout to 30 mins, and bumped my RDS instance but no avail. I have also tried reducing batch size defined in SDK from 10k to 5k.
Feeling lost, any pointers ?
Thanks!Edgar Ramírez (Arch.dev)
02/21/2024, 4:42 PMsilverbullet1
02/21/2024, 6:04 PMINSERT INTO public.cbs (ifi_id, coa_id, node_id, parent_node_id, code, type, node_depth, cycle_id, period_sequence_number, opening_balance, closing_balance, created_at, updated_at, _sdc_extracted_at, _sdc_deleted_at, _sdc_batched_at, _sdc_received_at, _sdc_sequence, _sdc_table_version, _sdc_sync_started_at) SELECT "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".ifi_id, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".coa_id, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".node_id, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".parent_node_id, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".code, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".type, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".node_depth, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".cycle_id, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".period_sequence_number, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".opening_balance, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".closing_balance, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".created_at, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".updated_at, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7"._sdc_extracted_at, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7"._sdc_deleted_at, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7"._sdc_batched_at, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7"._sdc_received_at, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7"._sdc_sequence, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7"._sdc_table_version, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7"._sdc_sync_started_at
FROM "479e4fba_e18d_4f4c_81f0_5f12bea82ec7" LEFT OUTER JOIN public.cbs ON ("479e4fba_e18d_4f4c_81f0_5f12bea82ec7".ifi_id = public.cbs.ifi_id OR "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".ifi_id IS NULL AND public.cbs.ifi_id IS NULL) AND ("479e4fba_e18d_4f4c_81f0_5f12bea82ec7".coa_id = public.cbs.coa_id OR "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".coa_id IS NULL AND public.cbs.coa_id IS NULL) AND ("479e4fba_e18d_4f4c_81f0_5f12bea82ec7".node_id = public.cbs.node_id OR "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".node_id IS NULL AND public.cbs.node_id IS NULL) AND ("479e4fba_e18d_4f4c_81f0_5f12bea82ec7".parent_node_id = public.cbs.parent_node_id OR "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".parent_node_id IS NULL AND public.cbs.parent_node_id IS NULL) AND ("479e4fba_e18d_4f4c_81f0_5f12bea82ec7".type = public.cbs.type OR "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".type IS NULL AND public.cbs.type IS NULL) AND ("479e4fba_e18d_4f4c_81f0_5f12bea82ec7".code = public.cbs.code OR "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".code IS NULL AND public.cbs.code IS NULL) AND ("479e4fba_e18d_4f4c_81f0_5f12bea82ec7".cycle_id = public.cbs.cycle_id OR "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".cycle_id IS NULL AND public.cbs.cycle_id IS NULL) AND ("479e4fba_e18d_4f4c_81f0_5f12bea82ec7".period_sequence_number = public.cbs.period_sequence_number OR "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".period_sequence_number IS NULL AND public.cbs.period_sequence_number IS NULL)
WHERE public.cbs.ifi_id IS NULL AND public.cbs.coa_id IS NULL AND public.cbs.node_id IS NULL AND public.cbs.parent_node_id IS NULL AND public.cbs.type IS NULL AND public.cbs.code IS NULL AND public.cbs.cycle_id IS NULL AND public.cbs.period_sequence_number IS NULL
silverbullet1
02/21/2024, 6:07 PMtarget-postgres | Target 'target-postgres' is listening for input from tap.
target-postgres | Initializing 'target-postgres' target sink...
target-postgres | Initializing custom target sink for stream 'reports-coa_balance_summary'...
singer_sdk.metrics | METRIC: {"type": "counter", "metric": "record_count", "value": 34569, "tags": {"stream": "reports-cbs", "context": {}}}
target-postgres | Target sink for 'reports-cbs' is full. Draining...
target-postgres | Inserting with SQL: INSERT INTO "479e4fba_e18d_4f4c_81f0_5f12bea82ec7" (ifi_id, coa_id, node_id, parent_node_id, code, type, node_depth, cycle_id, period_sequence_number, opening_balance, closing_balance, created_at, updated_at, _sdc_extracted_at, _sdc_received_at, _sdc_batched_at, _sdc_deleted_at, _sdc_sequence, _sdc_table_version, cbs) VALUES (:ifi_id, :coa_id, :node_id, :parent_node_id, :code, :type, :node_depth, :cycle_id, :period_sequence_number, :opening_balance, :closing_balance, :created_at, :updated_at, :_sdc_extracted_at, :_sdc_received_at, :_sdc_batched_at, :_sdc_deleted_at, :_sdc_sequence, :_sdc_table_version, :cbs
I increased timeout to infinite, its running since 6 hours! I don't think its progressing 😕Edgar Ramírez (Arch.dev)
02/21/2024, 6:18 PMvisch
02/21/2024, 6:20 PMEdgar Ramírez (Arch.dev)
02/21/2024, 6:21 PMLEFT OUTER JOIN public.cbs
in there tells me this is probably not created by the sdk)visch
02/21/2024, 6:22 PMsilverbullet1
02/21/2024, 6:24 PMsilverbullet1
02/21/2024, 6:28 PMvisch
02/21/2024, 6:31 PMpublic.cbs
?
I'm trying to walk through in my head what could be getting stuck as we should time out somewhere and make it clear what exactly we're stuck on. It seems like it's the insert with the big left join. If that table is super large maybe it's taking a really long time to run that query?
I'm guessing though.
What I'd try to do on your end is manually run a sync
1. meltano invoke tap-redshift > test_output
2. meltano invoke target-postgres < test_output
3. Does it happen again?
Does it happen if you restrict this down to a few rows? Like 10 or 100.
If our core issue is the table being massive you'd see this issue even when only insetting a few into the table.
I think doing those steps will help us get closervisch
02/21/2024, 6:32 PMsilverbullet1
02/21/2024, 6:32 PMpublic.cbs
Table size is 11gbvisch
02/21/2024, 6:33 PMvisch
02/21/2024, 6:34 PMsilverbullet1
02/21/2024, 6:35 PMMETRIC: {"type": "counter", "metric": "record_count", "value": 34569, "tags": {"stream": "reports-cbs", "context": {}}}
?visch
02/21/2024, 6:36 PMEdgar Ramírez (Arch.dev)
02/21/2024, 6:44 PMAlso, when i set batch size in SDK to 5k@silverbullet1 what's the name of the setting you're using?
Edgar Ramírez (Arch.dev)
02/21/2024, 6:51 PMbatch_config
(https://sdk.meltano.com/en/latest/batch.html), which I believe has an unfortunate and very overloaded name in retrospective 🫤. You may really want https://github.com/meltano/sdk/pull/2248/ shipped and brought into target-postgres.Edgar Ramírez (Arch.dev)
02/21/2024, 6:51 PMsilverbullet1
02/22/2024, 3:42 AMDEFAULT_BATCH_SIZE
in _batch.py
.
Indeed it's very confusing with the names 😅
Would love to have #2248 merged!silverbullet1
02/24/2024, 3:26 PMtarget_batch_size
. I have set it to 50k.
But my insert statements are still taking long! I have a composite primary key on postgres, seems like join is taking time.silverbullet1
02/24/2024, 3:28 PMvisch
02/24/2024, 3:29 PMsilverbullet1
02/24/2024, 3:35 PMIndexes:
"cbs_pkey" PRIMARY KEY, btree (ifi_id, coa_id, node_id, parent_node_id, code, type, cycle_id, period_sequence_number)
"cycle_coa_type_code_idx" btree (cycle_id, coa_id, type, code)
Access method: heap
I cannot do explain analyze on the insert query because of temporary tables, since they only exist in the session by meltano!visch
02/24/2024, 3:35 PMsilverbullet1
02/24/2024, 3:36 PMvisch
02/24/2024, 8:16 PMvisch
04/22/2024, 5:59 PMvisch
04/22/2024, 6:06 PM