Hi all I am getting statement timeout when using ...
# troubleshooting
s
Hi all I am getting statement timeout when using
target-postgres
(SDK version), while meltano attempts inserting into temp table. I have already increased the statement timeout to 30 mins, and bumped my RDS instance but no avail. I have also tried reducing batch size defined in SDK from 10k to 5k. Feeling lost, any pointers ? Thanks!
👀 1
e
Hi @silverbullet1! Do you happen to know the query that's timing out?
s
Yes @Edgar Ramírez (Arch.dev)
Copy code
INSERT INTO public.cbs (ifi_id, coa_id, node_id, parent_node_id, code, type, node_depth, cycle_id, period_sequence_number, opening_balance, closing_balance, created_at, updated_at, _sdc_extracted_at, _sdc_deleted_at, _sdc_batched_at, _sdc_received_at, _sdc_sequence, _sdc_table_version, _sdc_sync_started_at) SELECT "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".ifi_id, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".coa_id, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".node_id, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".parent_node_id, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".code, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".type, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".node_depth, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".cycle_id, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".period_sequence_number, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".opening_balance, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".closing_balance, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".created_at, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".updated_at, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7"._sdc_extracted_at, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7"._sdc_deleted_at, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7"._sdc_batched_at, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7"._sdc_received_at, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7"._sdc_sequence, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7"._sdc_table_version, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7"._sdc_sync_started_at
FROM "479e4fba_e18d_4f4c_81f0_5f12bea82ec7" LEFT OUTER JOIN public.cbs ON ("479e4fba_e18d_4f4c_81f0_5f12bea82ec7".ifi_id = public.cbs.ifi_id OR "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".ifi_id IS NULL AND public.cbs.ifi_id IS NULL) AND ("479e4fba_e18d_4f4c_81f0_5f12bea82ec7".coa_id = public.cbs.coa_id OR "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".coa_id IS NULL AND public.cbs.coa_id IS NULL) AND ("479e4fba_e18d_4f4c_81f0_5f12bea82ec7".node_id = public.cbs.node_id OR "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".node_id IS NULL AND public.cbs.node_id IS NULL) AND ("479e4fba_e18d_4f4c_81f0_5f12bea82ec7".parent_node_id = public.cbs.parent_node_id OR "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".parent_node_id IS NULL AND public.cbs.parent_node_id IS NULL) AND ("479e4fba_e18d_4f4c_81f0_5f12bea82ec7".type = public.cbs.type OR "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".type IS NULL AND public.cbs.type IS NULL) AND ("479e4fba_e18d_4f4c_81f0_5f12bea82ec7".code = public.cbs.code OR "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".code IS NULL AND public.cbs.code IS NULL) AND ("479e4fba_e18d_4f4c_81f0_5f12bea82ec7".cycle_id = public.cbs.cycle_id OR "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".cycle_id IS NULL AND public.cbs.cycle_id IS NULL) AND ("479e4fba_e18d_4f4c_81f0_5f12bea82ec7".period_sequence_number = public.cbs.period_sequence_number OR "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".period_sequence_number IS NULL AND public.cbs.period_sequence_number IS NULL)
WHERE public.cbs.ifi_id IS NULL AND public.cbs.coa_id IS NULL AND public.cbs.node_id IS NULL AND public.cbs.parent_node_id IS NULL AND public.cbs.type IS NULL AND public.cbs.code IS NULL AND public.cbs.cycle_id IS NULL AND public.cbs.period_sequence_number IS NULL
Copy code
target-postgres      | Target 'target-postgres' is listening for input from tap.
target-postgres      | Initializing 'target-postgres' target sink...
target-postgres      | Initializing custom target sink for stream 'reports-coa_balance_summary'...
singer_sdk.metrics   | METRIC: {"type": "counter", "metric": "record_count", "value": 34569, "tags": {"stream": "reports-cbs", "context": {}}}
target-postgres      | Target sink for 'reports-cbs' is full. Draining...
target-postgres      | Inserting with SQL: INSERT INTO "479e4fba_e18d_4f4c_81f0_5f12bea82ec7" (ifi_id, coa_id, node_id, parent_node_id, code, type, node_depth, cycle_id, period_sequence_number, opening_balance, closing_balance, created_at, updated_at, _sdc_extracted_at, _sdc_received_at, _sdc_batched_at, _sdc_deleted_at, _sdc_sequence, _sdc_table_version, cbs) VALUES (:ifi_id, :coa_id, :node_id, :parent_node_id, :code, :type, :node_depth, :cycle_id, :period_sequence_number, :opening_balance, :closing_balance, :created_at, :updated_at, :_sdc_extracted_at, :_sdc_received_at, :_sdc_batched_at, :_sdc_deleted_at, :_sdc_sequence, :_sdc_table_version, :cbs
I increased timeout to infinite, its running since 6 hours! I don't think its progressing 😕
e
Thanks @silverbullet1! (Tagging @visch in case he's familiar with this issue)
1
v
I haven't seen it get stuck, I'm curious about the logging times of those inserts. Has there been no new logging at all and it's just the Insert call? Is the tap blocking? How are you running this in production? Can you replicate it?
e
(fwiw
LEFT OUTER JOIN public.cbs
in there tells me this is probably not created by the sdk)
v
@Edgar Ramírez (Arch.dev) https://github.com/MeltanoLabs/target-postgres/blob/main/target_postgres/sinks.py#L223 there is an outer join in here. Not the SDK though you're right
👍 1
👀 1
s
@visch nothing is getting logged which is strange. This has been running fine in production since last few months. Tap used here is tap-redshift, don't think its blocking because from logs I can see its able to extract, and then after "Draining streams" step it gets stuck. This is being orchestrated via Airflow
🤔 1
not sure how to replicate this
v
Yeah I'm not sure either, how many rows are in
public.cbs
? I'm trying to walk through in my head what could be getting stuck as we should time out somewhere and make it clear what exactly we're stuck on. It seems like it's the insert with the big left join. If that table is super large maybe it's taking a really long time to run that query? I'm guessing though. What I'd try to do on your end is manually run a sync 1. meltano invoke tap-redshift > test_output 2. meltano invoke target-postgres < test_output 3. Does it happen again? Does it happen if you restrict this down to a few rows? Like 10 or 100. If our core issue is the table being massive you'd see this issue even when only insetting a few into the table. I think doing those steps will help us get closer
Other things that would be helpful for us 1. Full output logs (timestamp included) with all the gory detaisl!
s
public.cbs
Table size is 11gb
v
That seems fine
Maybe we're doing full scans for that join or something extremely slow and it's happening for each column in that primary key list (There seems to be a lot) If you verify with those steps that same issue happens with a few records to insert then I think we'll be able to poin point this. The other thing you could try from the other side is to look from postgres and see what the SQL query running is doing, is it taking up a ton of resources trying to do that insert, and is it happening the entire 1-3 hours you're waiting (6+ I know! )
s
Let me try manual run...i think it should be helpful, will also increase verbosity. Also, when i set batch size in SDK to 5k, i expect sink to drain after 5k records are extracted, why is it logging
METRIC: {"type": "counter", "metric": "record_count", "value": 34569, "tags": {"stream": "reports-cbs", "context": {}}}
?
v
@Edgar Ramírez (Arch.dev) the batch size parameter isn't listen in target-postgres's readme so I didn't go through and test that it works. Maybe something to do with our custom inserts (It shouldn't matter though)
e
Also, when i set batch size in SDK to 5k
@silverbullet1 what's the name of the setting you're using?
1
I think there might be some confusion with
batch_config
(https://sdk.meltano.com/en/latest/batch.html), which I believe has an unfortunate and very overloaded name in retrospective 🫤. You may really want https://github.com/meltano/sdk/pull/2248/ shipped and brought into target-postgres.
🙏 1
(and fwiw I plan to do a new minor release of the SDK next week)
dancingpenguin 1
s
@Edgar Ramírez (Arch.dev) I was trying to modify
DEFAULT_BATCH_SIZE
in
_batch.py
. Indeed it's very confusing with the names 😅 Would love to have #2248 merged!
@Edgar Ramírez (Arch.dev) I have taken the latest release from yesterday and able to utilize
target_batch_size
. I have set it to 50k. But my insert statements are still taking long! I have a composite primary key on postgres, seems like join is taking time.
some inserts are taking 30 mins, some are event taking 1 hr
v
Hmm I wonder if the temp table should also get indexes added, not sure that it would help that much though. Something is full table scanning id think the target table has some indexes missing but I don't know databases well enough to guess I'd have to look at the execution plan
s
@visch I do have indexes on target table
Copy code
Indexes:
    "cbs_pkey" PRIMARY KEY, btree (ifi_id, coa_id, node_id, parent_node_id, code, type, cycle_id, period_sequence_number)
    "cycle_coa_type_code_idx" btree (cycle_id, coa_id, type, code)
Access method: heap
I cannot do explain analyze on the insert query because of temporary tables, since they only exist in the session by meltano!
v
On second thought it could be a lot of things, there could be locks on the table as well
s
there is no query running on this db other than this insert ^^
v
The key is to look into why the query takes so long to run. You can analyze this with postgres
There isn't a silver bullet here unfortunately. auto_explain could help https://www.postgresql.org/docs/current/auto-explain.html
I'm not sure exactly what is going on, I would create a table from your main table and try to run the same query as target postgres going to take some debugging for sure, something is holding us up