Meltano

Hey everyone! I'm finding that my pipelines that involve parent-child streams that load to DB are incredibly slow (+2hrs for a couple thousands entities per day), and I believe this is because each entity and it's children are being written to the database one-by-one rather than being batched. Anyone know where I should start to address this? I'm seeing this with both target-redshift and target-duckdb (not as slow as Redshift).

Hi <@U06C909G6TF>!

Are you using <https://github.com/TicketSwap/target-redshift>?

(and this may be <https://github.com/meltano/sdk/issues/1025|#1025> rearing its head)

<@U06CASRQ0H0> I am using our own fork of the pipelinewise target-redshift - <https://github.com/hrm13/pipelinewise-target-redshift/tree/sso-credential-provider-support>

I didn't realise it could be down to the schema messages. Do you know if the SDK built target would perform better? I had a cursory look and it does similar batching. So perhaps it's a tap issue?

&gt; So perhaps it's a tap issue?
It might be, since I see your target flushes streams whenever a new schema is received.

There's <https://github.com/meltano/sdk/issues/1025#issuecomment-1658615182|a comment> in that issue
&gt;  I _believe_ we dealt with this recently by deduping SCHEMA messages. Will try and find the exact PR.
but I don't think that PR ever got merged.

The target could also be updated to check if the schema has actually changed, i.e. storing a mapping of known stream -&gt; schema, and adding another condition to check if the schema has changed here:
<https://github.com/hrm13/pipelinewise-target-redshift/blob/9e141f1d33df784c70593c652909e6b1273aa03c/target_redshift/__init__.py#L209-L213>

Oh, actually SDK-based targets do seem to handle this: <https://github.com/meltano/sdk/blob/3c55694ad94f28ba0005b82ed43e1eca6d3d1c31/singer_sdk/target_base.py#L371-L377>

OK I guess the easiest thing to do is switch out for the SDK based target and see if it works.