Anthony Shook
03/31/2025, 9:34 PMtarget-snowflake
— I recently switched from the transferwise
variant, and am getting massively decreased performance. A little about the job — it is an upsert on an updated_at
key for a table that, for every run of Meltano, adds or updates around 1.5 million rows (it’s one of many tables in the config, but is by far the long pole in the tent).
• In the transferwise
variant, with batch_size_rows
set to 500000, I’d typically see three merge statements, each taking about 8 minutes for a total of 24 minutes.
◦ I could look at the query execution and see that the number of rows inserted/updated added up to around 500k each time, reflecting the batch output.
• Before adding some auto-clustering to the table, the meltanolabs
variant, with batch_size_rows
still set to 500000, results in hundreds of merge statements, each of exactly 10,000 rows
◦ Each of these took about 2 minutes, so the total is anywhere from 260-300 minutes, a massive increase in processing time on the server (and in cost, in snowflake)
◦ This makes me think the batch_size_rows parameter is somehow being ignored, but I’m not sure how or why.Anthony Shook
03/31/2025, 9:38 PMmeltano/meltano:v3.5.4-python3.12
and my target version is target-snowflake v0.15.1.post5+20e07d0, Meltano SDK v0.44.3
Edgar Ramírez (Arch.dev)
03/31/2025, 10:24 PMbatch_size_rows
is seemingly being ignore. How are you configuring this for the target?Anthony Shook
03/31/2025, 10:29 PMmeltano run tap-source target-snowflake-trg1
- name: target-snowflake
variant: meltanolabs
pip_url: git+<https://github.com/MeltanoLabs/target-snowflake>
config:
account: XXXXX
database: XXXXX
user: XXXXX
private_key_path: XXXXXX
warehouse: XXXXX
role: XXXXX
add_record_metadata: true
batch_size_rows: 500000
load_method: upsert
- name: target-snowflake-trg1
inherit_from: target-snowflake
config:
default_target_schema: XXXXX
Anthony Shook
04/02/2025, 2:04 PMBuzzCutNorman
04/02/2025, 2:49 PMAnthony Shook
04/02/2025, 2:54 PMAnthony Shook
04/02/2025, 4:06 PM- name: target-snowflake
variant: meltanolabs
pip_url: git+<https://github.com/MeltanoLabs/target-snowflake>
config:
account: XXXXX
database: XXXXX
user: XXXXX
private_key_path: XXXXXX
warehouse: XXXXX
role: XXXXX
add_record_metadata: true
batch_size_rows: 500000
#----# BEGIN NEW #----#
batch_config:
encoding:
format: jsonl
compression: gzip
storage:
root: file://
batch_size: 500000
#----# END NEW #----#
load_method: upsert
- name: target-snowflake-trg1
inherit_from: target-snowflake
config:
default_target_schema: XXXXX
You do have to explicitly set the encoding and storage, too, but just using the defaults is fine.BuzzCutNorman
04/02/2025, 4:09 PMEdgar Ramírez (Arch.dev)
04/02/2025, 4:25 PMMatt Menzenski
04/25/2025, 9:34 PMbatch_size_rows
is being respected 🙂