Another question about the meltano labs variant of...
# troubleshooting
a
Another question about the meltano labs variant of
target-snowflake
— I recently switched from the
transferwise
variant, and am getting massively decreased performance. A little about the job — it is an upsert on an
updated_at
key for a table that, for every run of Meltano, adds or updates around 1.5 million rows (it’s one of many tables in the config, but is by far the long pole in the tent). • In the
transferwise
variant, with
batch_size_rows
set to 500000, I’d typically see three merge statements, each taking about 8 minutes for a total of 24 minutes. ◦ I could look at the query execution and see that the number of rows inserted/updated added up to around 500k each time, reflecting the batch output. • Before adding some auto-clustering to the table, the
meltanolabs
variant, with
batch_size_rows
still set to 500000, results in hundreds of merge statements, each of exactly 10,000 rows ◦ Each of these took about 2 minutes, so the total is anywhere from 260-300 minutes, a massive increase in processing time on the server (and in cost, in snowflake) ◦ This makes me think the batch_size_rows parameter is somehow being ignored, but I’m not sure how or why.
1
for environment context, two things to note: 1. I did not change taps, in both cases I was using the meltanolabs variant of tap-postgres 2. I’m using the meltano hosted docker image
meltano/meltano:v3.5.4-python3.12
and my target version is
target-snowflake v0.15.1.post5+20e07d0, Meltano SDK v0.44.3
e
Hi Anthony! 10,000 is indeed the default and
batch_size_rows
is seemingly being ignore. How are you configuring this for the target?
a
Here’s the relevant section of the config, private stuff redacted: Command is
meltano run tap-source target-snowflake-trg1
Copy code
- name: target-snowflake
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/target-snowflake>
    config:
      account: XXXXX
      database: XXXXX
      user: XXXXX
      private_key_path: XXXXXX
      warehouse: XXXXX
      role: XXXXX
      add_record_metadata: true
      batch_size_rows: 500000
      load_method: upsert
  - name: target-snowflake-trg1
    inherit_from: target-snowflake
    config:
      default_target_schema: XXXXX
I’m going to try maybe setting it in the child loader that inherits. (note, didn’t change the effect)
Hey Edgar! Any other thoughts here? Should I just drop a bug report in the git issues?
b
I found this thread in which someone resolved the batch size issue. It is older but it might still work. I m using target snowflake and had some questions around bat Meltano #singer-targets
a
Nice, thanks, I’ll take a look at this!
Okay, @BuzzCutNorman — great find! So, I can get meltano to respect that batch size with the following update to the config:
Copy code
- name: target-snowflake
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/target-snowflake>
    config:
      account: XXXXX
      database: XXXXX
      user: XXXXX
      private_key_path: XXXXXX
      warehouse: XXXXX
      role: XXXXX
      add_record_metadata: true
      batch_size_rows: 500000
      #----# BEGIN NEW #----#
      batch_config:
        encoding:
          format: jsonl
          compression: gzip
        storage:
          root: file://
        batch_size: 500000
      #----# END NEW #----#
      load_method: upsert
  - name: target-snowflake-trg1
    inherit_from: target-snowflake
    config:
      default_target_schema: XXXXX
You do have to explicitly set the encoding and storage, too, but just using the defaults is fine.
b
That is great news that config combo worked 🥳. Thanks for clarifying that you need to set the encoding and storage as well.
e
Thanks for sharing that @BuzzCutNorman! I opened https://github.com/MeltanoLabs/target-snowflake/pull/340 to fix the behavior.
m
just chiming in to say that this week we updated to latest target-snowflake and can confirm that
batch_size_rows
is being respected 🙂
🙌 2
👌 1