daniel_luftspring
07/26/2023, 11:00 PMconfig:
batch_config:
batch_size: <number>
but that doesn't seem to do ittaylor
07/27/2023, 2:16 AMdaniel_luftspring
07/27/2023, 4:00 PMtaylor
07/27/2023, 4:10 PMedgar_ramirez_mondragon
07/27/2023, 4:12 PMdaniel_luftspring
07/27/2023, 5:40 PMbatch_config from the project yaml? It seems like it should. It's just not clear why overriding the default config in that file is going to behave differently than just supplying it from meltano.ymluser
07/27/2023, 6:03 PMandrew_merton
07/31/2023, 1:56 AMBATCH messages."
Please correct me if I'm wrong :)daniel_luftspring
07/31/2023, 7:34 PMbatch_size_rows ? A tap shouldn't be writing anything to a file system as its only job is to emit state, record, and schema messages to stdout. The target on the other hand may have to write to a file system as an intermediate step before loading into a sink. In the case of target-snowflake I think the default behaviour is to create gzipped json line files of 10k batches, put them into a Snowflake stage, then copy into the target table. You can actually see this happen in real time on your local machine if you run any tap and load to target-snowflakeandrew_merton
07/31/2023, 8:45 PMtarget-snowflake (Transferwise variant). The loader section is
loaders:
- name: target-snowflake
config:
account: <account id>
default_target_schema: <schema>
file_format: <MELTANO_CSV>
add_metadata_columns: false
batch_size_rows: 100000
But this is defining the size of the files that are staged to S3 and loaded into Snowflake. Note that the default for this parameter is 100,000 rows rather than 10,000.
Plugin:
loaders:
- name: target-snowflake
variant: transferwise
pip_url: pipelinewise-target-snowflake
The batch_config.batch_size you are referring to I believe is related to the BATCH message which is a new feature still in preview, according to https://meltano-sdk--1876.org.readthedocs.build/en/1876/batch.html? The BATCH message allows a tap to write a file or files (locally by default) to disk, then send a list of file names to the target. For example, if the tap gets compressed CSV format by default (as tap-salesforce does) it could dump the file directly to disk, then the target (Snowflake) could send the file directly to the stage and load it, avoiding all the row-by-row overhead of translating CSV to JSON and back (noting that only gzipped JSONL is defined at present).
I am considering learning enough python to build tap-salesforce and target-snowflake that do this, but it's a journey :)daniel_luftspring
08/01/2023, 9:41 PMtarget-snowflake because it's not built with the modern version of the singer_sdkandrew_merton
08/01/2023, 10:07 PM