My Meltano ELT is taking a long time and its only ...
# getting-started
b
My Meltano ELT is taking a long time and its only running batch of 10,000 records at a time.. Can I increase batch size of my meltano fetch ?
e
Hi @binoy_shah! Which extractor and loader are you using?
b
I am using Mysql Tap(extractor) and Snowflake Loader
Copy code
extractors:
  - name: acadia
    inherit_from: tap-mysql
    variant: transferwise
    pip_url: pipelinewise-tap-mysql
    select:
    - '*.*'
    metadata:
      '*':
        replication-method: LOG_BASED
      'acadia-user_event':
        replication-method: INCREMENTAL
        replication-key: id
Copy code
loaders:
  - name: data-warehouse
    inherit_from: target-snowflake
    variant: transferwise
    pip_url: git+<https://github.com/transferwise/pipelinewise-target-snowflake.git>
    config:
      parallelism: 4
      add_metadata_columns: true
      hard_delete: true
Its draining Warehouse credits like anything
😬 1
e
b
correct, but from what i read is that tap-mysql is also required to do batching, only then it works.. am i correct ?
e
from what i read
don't know what you read 😅
b
So I am on Meltano 2.20 still, and since I couldnt find historic docs, I asked ChatGPT 🤷
When I searched docs for "batch" it only led to some meltano 3.x refernce to BATCH mode.
and target-postgres example
e
BATCH mode is available to Singer SDK-based taps and targets but you're using the Wise variants, so that's not an option. I'm not sure if tap-mysql does any batching.
b
so just setting snowflake batch size, will it work ?
e
It should help at least. If you have a lot of different streams coming from the extractor, the target may be flushing even if the max batch size hasn't been reached.
b
Is there a concurrency control flag ?
e
You mean something like
parallelism
for target-snowflake?
b
Well.. to clarify we're talking about same things, you mentioned, "lot of streams coming from extractor", which I interpreted as concurrent data sync streams in the same jobs, then yeah, can I control that I want not more than 7 tables to be sync'ed concurrently kind of thing
e
oh, you could
select
a subset of streams
b
That I am already doing, I am explicitly `select`ing streams for sync'ing
e
Ok then, above was just a thought about cases in which the target may still be doing too frequent flushes despite of a configured large batch size