Hey team, I have a question regarding `batch_size_...
# troubleshooting
a
Hey team, I have a question regarding
batch_size_rows
parameter thats defined under the loader section in meltano.yml. Context: I am currently reading from our mongodb and writing to snowflake. I am running meltano on our k8s cluster and using airflow for orchestration and dbt for transformation. Question: The rows in different mongodb collection differ in length. For instance, I have a collection with an average document size of 1kB and other with an average document size of 100kB. While running the sync, I have specified the
batch_size_rows
as 100000. And while this works, the resource utilization for my pod can vary pretty drastically(almost by a 100 times, give or take). This is obviously because of the different document sizes in these collection. So my question: Is there a way to specify this
batch_size_rows
differently for each stream? So maybe I can have a
batch_size_rows = 100000
when the doc size is 1kB and
batch_size_rows = 1000
when the doc size is 100kB? This is basically so that I can set proper limits on the meltano deployment on the k8s cluster and not have meltano monopolize all the resources of the underlying node and end up booting the other pods running on the said node? tia 🙏
t
Check out plugin inheritance https://docs.meltano.com/concepts/plugins#plugin-inheritance you can have an inherited plug-in for each stream and then configure bespoke settings that way.
a
thank you @taylor. Will try to see if I can make this work with my current meltano setup and will report back if I face any issues :)