How do I get my target-redshift plugin to perform ...
# troubleshooting
h
How do I get my target-redshift plugin to perform better (using Pipelinewise variant). At the moment it seems to write out each parent, and it's children as separate loads to Redshift. E.g. in the below you will see it only loads one campaign and it's children. This makes for a very slow overall load time. Is this because a flush is occurring after each STATE message is emitted?
Copy code
2025-05-23T07:58:27.671301Z [info     ] time=2025-05-23 08:58:27 name=target_redshift level=INFO message=Target S3 bucket: cleo-data-science, local file: /var/folders/dk/ykbgcs_n1ll169x550jb901r0000gp/T/campaigns__7c0z0pn.csv.1, S3 key: data-ingestion-services/dev/pipelinewise_campaigns_20250523-085827-668554.csv.1 cmd_type=elb consumer=True name=target-redshift producer=False stdio=stderr string_id=target-redshift
2025-05-23T07:58:27.678182Z [info     ] time=2025-05-23 08:58:27 name=target_redshift level=INFO message=Uploading 12 rows to S3 cmd_type=elb consumer=True name=target-redshift producer=False stdio=stderr string_id=target-redshift
2025-05-23T07:58:27.678379Z [info     ] time=2025-05-23 08:58:27 name=target_redshift level=INFO message=Target S3 bucket: cleo-data-science, local file: /var/folders/dk/ykbgcs_n1ll169x550jb901r0000gp/T/campaign_actions_lvbcyl4b.csv.1, S3 key: data-ingestion-services/dev/pipelinewise_campaign_actions_20250523-085827-668678.csv.1 cmd_type=elb consumer=True name=target-redshift producer=False stdio=stderr string_id=target-redshift
2025-05-23T07:58:28.068307Z [info     ] time=2025-05-23 08:58:28 name=target_redshift level=INFO message=Loading 1 rows into 'cio_dev."STG_CAMPAIGNS"' cmd_type=elb consumer=True name=target-redshift producer=False stdio=stderr string_id=target-redshift
2025-05-23T07:58:29.337212Z [info     ] time=2025-05-23 08:58:29 name=target_redshift level=INFO message=Loading 12 rows into 'cio_dev."STG_CAMPAIGN_ACTIONS"' cmd_type=elb consumer=True name=target-redshift producer=False stdio=stderr string_id=target-redshift
2025-05-23T07:58:32.203898Z [info     ] time=2025-05-23 08:58:32 name=target_redshift level=INFO message=Loading into cio_dev."CAMPAIGNS": {"inserts": 1, "updates": 0, "size_bytes": 706} cmd_type=elb consumer=True name=target-redshift producer=False stdio=stderr string_id=target-redshift
2025-05-23T07:58:33.950961Z [info     ] time=2025-05-23 08:58:33 name=target_redshift level=INFO message=Deleting data-ingestion-services/dev/pipelinewise_campaigns_20250523-085827-668554.csv.1 from S3 cmd_type=elb consumer=True name=target-redshift producer=False stdio=stderr string_id=target-redshift
2025-05-23T07:58:34.771988Z [info     ] time=2025-05-23 08:58:34 name=target_redshift level=INFO message=Loading into cio_dev."CAMPAIGN_ACTIONS": {"inserts": 12, "updates": 0, "size_bytes": 268318} cmd_type=elb consumer=True name=target-redshift producer=False stdio=stderr string_id=target-redshift
2025-05-23T07:58:35.884576Z [info     ] time=2025-05-23 08:58:35 name=target_redshift level=INFO message=Deleting data-ingestion-services/dev/pipelinewise_campaign_actions_20250523-085827-668678.csv.1 from S3 cmd_type=elb consumer=True name=target-redshift producer=False stdio=stderr string_id=target-redshift
👀 1
e
Is this because a flush is occurring after each STATE message is emitted?
Rather probably because of the interspersed SCHEMA messages: https://github.com/transferwise/pipelinewise-target-redshift/blob/6e118664fd74d6fa1fd54a44903a2434316978ed/target_redshift/__init__.py#L200-L213