I'm running meltano inside dagster. For a csv file...
# getting-started
t
I'm running meltano inside dagster. For a csv file with 99k rows to be loaded into iceberg, it's taking a long time. I'm using
tap-csv
and a custom loader
target-iceberg
. Is there any way to enable parallel processing to speed up the integration process? I read in some thread that loaders, that deal with large volume of data usually slows down the process. It was also suggested to change the
batch size
. I don't know how I can change the batch size in my custom target. Currently,
max_size
is 10000. Will it help if I increase it?
e
If you bump your custom target to the singer-sdk 0.36.0, you should get a new
batch_size_rows
setting for free
t
Hi @Edgar Ramírez (Arch.dev) I have to somehow lock my version of meltano to an older one, unfortunately. Apart from that I just realized that changing the batch size will not actually make much of a difference. Because no matter how many records I write per batch, I'm seeing delay in each record. Maybe enabling parallel processing will help. Does meltano support parallel processing for
process_batch
? Just curious.
e
The sdk already drains records from streams in parallel: https://github.com/meltano/sdk/blob/6070c58c1393a76923bf9e59c475bd66f259cf14/singer_sdk/target_base.py#L515-L525 I'm also happy to fix a bug or review a PR if we're missing something there.
t
Great! Thanks. I'll log the time of processing each record in local and check a few other things. I'll give you an update if the increased time is caused by meltano.
e
Thanks!
🙏 1