Hey team I have a question regarding `batch size rows` param Meltano #troubleshooting

Hey team, I have a question regarding `batch_size_...

abhinav_prakash

03/17/2023, 10:58 PM

Hey team, I have a question regarding

batch_size_rows

parameter thats defined under the loader section in meltano.yml. Context: I am currently reading from our mongodb and writing to snowflake. I am running meltano on our k8s cluster and using airflow for orchestration and dbt for transformation. Question: The rows in different mongodb collection differ in length. For instance, I have a collection with an average document size of 1kB and other with an average document size of 100kB. While running the sync, I have specified the

batch_size_rows

as 100000. And while this works, the resource utilization for my pod can vary pretty drastically(almost by a 100 times, give or take). This is obviously because of the different document sizes in these collection. So my question: Is there a way to specify this

batch_size_rows

differently for each stream? So maybe I can have a

batch_size_rows = 100000

when the doc size is 1kB and

batch_size_rows = 1000

when the doc size is 100kB? This is basically so that I can set proper limits on the meltano deployment on the k8s cluster and not have meltano monopolize all the resources of the underlying node and end up booting the other pods running on the said node? tia 🙏

taylor

03/17/2023, 11:31 PM

Check out plugin inheritance https://docs.meltano.com/concepts/plugins#plugin-inheritance you can have an inherited plug-in for each stream and then configure bespoke settings that way.

abhinav_prakash

03/17/2023, 11:38 PM

thank you @taylor. Will try to see if I can make this work with my current meltano setup and will report back if I face any issues :)

Open in Slack

Previous Next