How do I make meltano use BATCH mode, rather than ...
# getting-started
a
How do I make meltano use BATCH mode, rather than RECORD? Specifically, tap-salesforce to target-snowflake could be very efficient if the data was passed as CSV since it comes from salesforce Bulk API as csv, and snowflake can load CSVs directly... Or am I going to be building BATCH mode tap-salesforce and target-snowflake? 🙂
j
We use:
Copy code
plugins:
  extractors:
  - name: tap-salesforce
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-salesforce.git>
    config:
      api_type: BULK
and it works well
Well, if I remember correctly, there are some exceptional API endpoints which do not support BULK mode. So we explicitly declare streams which we wanna discover and fields which we wanna tap.
a
@jan_soubusta Thanks for the reply. What volume of data are you moving? The issue for us is that we have these 45M rows to move initially, and it will take at least 1-2 days overall, which is getting on towards unacceptable. We expect to have to do the full refresh again for some kinds of changes in Salesforce (e.g. new column to be included with historical data that we don't already have)
j
I see. Well, our SFDC tables are much smaller - the biggest one is TASK and it contains 2.5M rows. Full refresh takes circa 1h, which is acceptable so far. There are discussions in the community regarding performance in general, e.g. here: https://meltano.slack.com/archives/C013Z450LCD/p1685703921187269 Would be great if the community defines an ultimate target and the way how to get there (iterations). I would love to contribute then 😉
a
This may be a specific case (Salesforce to Snowflake) where the source can produce output that is directly consumable by the target (compressed CSV files), so it would be possible to implement tap and target that do no row manipulation at all. I can get the 2.7M rows of data out of Salesforce in 35 seconds (excluding the query time), but it takes tap-salesforce ~55 minutes (direct to JSON on disk). I think that https://sdk.meltano.com/en/latest/batch.html is the answer, but I find it hard to understand from the page what state it is in, how to build the tap and target I need etc (rather novice Python programmer which doesn't help either). We will be looking at options...