Hi all I am getting statement timeout when using `target pos Meltano #troubleshooting

Hi all I am getting statement timeout when using ...

silverbullet1

02/21/2024, 10:31 AM

Hi all I am getting statement timeout when using

target-postgres

(SDK version), while meltano attempts inserting into temp table. I have already increased the statement timeout to 30 mins, and bumped my RDS instance but no avail. I have also tried reducing batch size defined in SDK from 10k to 5k. Feeling lost, any pointers ? Thanks!

👀 1

Edgar Ramírez (Arch.dev)

02/21/2024, 4:42 PM

Hi @silverbullet1! Do you happen to know the query that's timing out?

silverbullet1

02/21/2024, 6:04 PM

Yes @Edgar Ramírez (Arch.dev)

Copy code

INSERT INTO public.cbs (ifi_id, coa_id, node_id, parent_node_id, code, type, node_depth, cycle_id, period_sequence_number, opening_balance, closing_balance, created_at, updated_at, _sdc_extracted_at, _sdc_deleted_at, _sdc_batched_at, _sdc_received_at, _sdc_sequence, _sdc_table_version, _sdc_sync_started_at) SELECT "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".ifi_id, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".coa_id, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".node_id, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".parent_node_id, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".code, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".type, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".node_depth, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".cycle_id, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".period_sequence_number, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".opening_balance, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".closing_balance, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".created_at, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".updated_at, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7"._sdc_extracted_at, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7"._sdc_deleted_at, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7"._sdc_batched_at, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7"._sdc_received_at, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7"._sdc_sequence, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7"._sdc_table_version, "479e4fba_e18d_4f4c_81f0_5f12bea82ec7"._sdc_sync_started_at
FROM "479e4fba_e18d_4f4c_81f0_5f12bea82ec7" LEFT OUTER JOIN public.cbs ON ("479e4fba_e18d_4f4c_81f0_5f12bea82ec7".ifi_id = public.cbs.ifi_id OR "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".ifi_id IS NULL AND public.cbs.ifi_id IS NULL) AND ("479e4fba_e18d_4f4c_81f0_5f12bea82ec7".coa_id = public.cbs.coa_id OR "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".coa_id IS NULL AND public.cbs.coa_id IS NULL) AND ("479e4fba_e18d_4f4c_81f0_5f12bea82ec7".node_id = public.cbs.node_id OR "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".node_id IS NULL AND public.cbs.node_id IS NULL) AND ("479e4fba_e18d_4f4c_81f0_5f12bea82ec7".parent_node_id = public.cbs.parent_node_id OR "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".parent_node_id IS NULL AND public.cbs.parent_node_id IS NULL) AND ("479e4fba_e18d_4f4c_81f0_5f12bea82ec7".type = public.cbs.type OR "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".type IS NULL AND public.cbs.type IS NULL) AND ("479e4fba_e18d_4f4c_81f0_5f12bea82ec7".code = public.cbs.code OR "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".code IS NULL AND public.cbs.code IS NULL) AND ("479e4fba_e18d_4f4c_81f0_5f12bea82ec7".cycle_id = public.cbs.cycle_id OR "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".cycle_id IS NULL AND public.cbs.cycle_id IS NULL) AND ("479e4fba_e18d_4f4c_81f0_5f12bea82ec7".period_sequence_number = public.cbs.period_sequence_number OR "479e4fba_e18d_4f4c_81f0_5f12bea82ec7".period_sequence_number IS NULL AND public.cbs.period_sequence_number IS NULL)
WHERE public.cbs.ifi_id IS NULL AND public.cbs.coa_id IS NULL AND public.cbs.node_id IS NULL AND public.cbs.parent_node_id IS NULL AND public.cbs.type IS NULL AND public.cbs.code IS NULL AND public.cbs.cycle_id IS NULL AND public.cbs.period_sequence_number IS NULL

silverbullet1

02/21/2024, 6:07 PM

Copy code

target-postgres      | Target 'target-postgres' is listening for input from tap.
target-postgres      | Initializing 'target-postgres' target sink...
target-postgres      | Initializing custom target sink for stream 'reports-coa_balance_summary'...
singer_sdk.metrics   | METRIC: {"type": "counter", "metric": "record_count", "value": 34569, "tags": {"stream": "reports-cbs", "context": {}}}
target-postgres      | Target sink for 'reports-cbs' is full. Draining...
target-postgres      | Inserting with SQL: INSERT INTO "479e4fba_e18d_4f4c_81f0_5f12bea82ec7" (ifi_id, coa_id, node_id, parent_node_id, code, type, node_depth, cycle_id, period_sequence_number, opening_balance, closing_balance, created_at, updated_at, _sdc_extracted_at, _sdc_received_at, _sdc_batched_at, _sdc_deleted_at, _sdc_sequence, _sdc_table_version, cbs) VALUES (:ifi_id, :coa_id, :node_id, :parent_node_id, :code, :type, :node_depth, :cycle_id, :period_sequence_number, :opening_balance, :closing_balance, :created_at, :updated_at, :_sdc_extracted_at, :_sdc_received_at, :_sdc_batched_at, :_sdc_deleted_at, :_sdc_sequence, :_sdc_table_version, :cbs

I increased timeout to infinite, its running since 6 hours! I don't think its progressing 😕

Edgar Ramírez (Arch.dev)

02/21/2024, 6:18 PM

Thanks @silverbullet1! (Tagging @visch in case he's familiar with this issue)

✅ 1

visch

02/21/2024, 6:20 PM

I haven't seen it get stuck, I'm curious about the logging times of those inserts. Has there been no new logging at all and it's just the Insert call? Is the tap blocking? How are you running this in production? Can you replicate it?

Edgar Ramírez (Arch.dev)

02/21/2024, 6:21 PM

(fwiw

LEFT OUTER JOIN public.cbs

in there tells me this is probably not created by the sdk)

visch

02/21/2024, 6:22 PM

@Edgar Ramírez (Arch.dev) https://github.com/MeltanoLabs/target-postgres/blob/main/target_postgres/sinks.py#L223 there is an outer join in here. Not the SDK though you're right

👍 1

👀 1

silverbullet1

02/21/2024, 6:24 PM

@visch nothing is getting logged which is strange. This has been running fine in production since last few months. Tap used here is tap-redshift, don't think its blocking because from logs I can see its able to extract, and then after "Draining streams" step it gets stuck. This is being orchestrated via Airflow

🤔 1

silverbullet1

02/21/2024, 6:28 PM

not sure how to replicate this

visch

02/21/2024, 6:31 PM

Yeah I'm not sure either, how many rows are in

public.cbs

? I'm trying to walk through in my head what could be getting stuck as we should time out somewhere and make it clear what exactly we're stuck on. It seems like it's the insert with the big left join. If that table is super large maybe it's taking a really long time to run that query? I'm guessing though. What I'd try to do on your end is manually run a sync 1. meltano invoke tap-redshift > test_output 2. meltano invoke target-postgres < test_output 3. Does it happen again? Does it happen if you restrict this down to a few rows? Like 10 or 100. If our core issue is the table being massive you'd see this issue even when only insetting a few into the table. I think doing those steps will help us get closer

visch

02/21/2024, 6:32 PM

Other things that would be helpful for us 1. Full output logs (timestamp included) with all the gory detaisl!

silverbullet1

02/21/2024, 6:32 PM

public.cbs

Table size is 11gb

visch

02/21/2024, 6:33 PM

That seems fine

visch

02/21/2024, 6:34 PM

Maybe we're doing full scans for that join or something extremely slow and it's happening for each column in that primary key list (There seems to be a lot) If you verify with those steps that same issue happens with a few records to insert then I think we'll be able to poin point this. The other thing you could try from the other side is to look from postgres and see what the SQL query running is doing, is it taking up a ton of resources trying to do that insert, and is it happening the entire 1-3 hours you're waiting (6+ I know! )

silverbullet1

02/21/2024, 6:35 PM

Let me try manual run...i think it should be helpful, will also increase verbosity. Also, when i set batch size in SDK to 5k, i expect sink to drain after 5k records are extracted, why is it logging

METRIC: {"type": "counter", "metric": "record_count", "value": 34569, "tags": {"stream": "reports-cbs", "context": {}}}

visch

02/21/2024, 6:36 PM

@Edgar Ramírez (Arch.dev) the batch size parameter isn't listen in target-postgres's readme so I didn't go through and test that it works. Maybe something to do with our custom inserts (It shouldn't matter though)

Edgar Ramírez (Arch.dev)

02/21/2024, 6:44 PM

Also, when i set batch size in SDK to 5k

@silverbullet1 what's the name of the setting you're using?

➕ 1

Edgar Ramírez (Arch.dev)

02/21/2024, 6:51 PM

I think there might be some confusion with

batch_config

(https://sdk.meltano.com/en/latest/batch.html), which I believe has an unfortunate and very overloaded name in retrospective 🫤. You may really want https://github.com/meltano/sdk/pull/2248/ shipped and brought into target-postgres.

🙏 1

Edgar Ramírez (Arch.dev)

02/21/2024, 6:51 PM

(and fwiw I plan to do a new minor release of the SDK next week)

dancingpenguin 1

silverbullet1

02/22/2024, 3:42 AM

@Edgar Ramírez (Arch.dev) I was trying to modify

DEFAULT_BATCH_SIZE

_batch.py

. Indeed it's very confusing with the names 😅 Would love to have #2248 merged!

silverbullet1

02/24/2024, 3:26 PM

@Edgar Ramírez (Arch.dev) I have taken the latest release from yesterday and able to utilize

target_batch_size

. I have set it to 50k. But my insert statements are still taking long! I have a composite primary key on postgres, seems like join is taking time.

silverbullet1

02/24/2024, 3:28 PM

some inserts are taking 30 mins, some are event taking 1 hr

visch

02/24/2024, 3:29 PM

Hmm I wonder if the temp table should also get indexes added, not sure that it would help that much though. Something is full table scanning id think the target table has some indexes missing but I don't know databases well enough to guess I'd have to look at the execution plan

silverbullet1

02/24/2024, 3:35 PM

@visch I do have indexes on target table

Copy code

Indexes:
    "cbs_pkey" PRIMARY KEY, btree (ifi_id, coa_id, node_id, parent_node_id, code, type, cycle_id, period_sequence_number)
    "cycle_coa_type_code_idx" btree (cycle_id, coa_id, type, code)
Access method: heap

I cannot do explain analyze on the insert query because of temporary tables, since they only exist in the session by meltano!

visch

02/24/2024, 3:35 PM

On second thought it could be a lot of things, there could be locks on the table as well

silverbullet1

02/24/2024, 3:36 PM

there is no query running on this db other than this insert ^^

visch

02/24/2024, 8:16 PM

The key is to look into why the query takes so long to run. You can analyze this with postgres

visch

04/22/2024, 5:59 PM

There isn't a silver bullet here unfortunately. auto_explain could help https://www.postgresql.org/docs/current/auto-explain.html

visch

04/22/2024, 6:06 PM

I'm not sure exactly what is going on, I would create a table from your main table and try to run the same query as target postgres going to take some debugging for sure, something is holding us up

5 Views

Open in Slack

Previous Next