abhinav_prakash
03/17/2023, 10:58 PMbatch_size_rows parameter thats defined under the loader section in meltano.yml.
Context: I am currently reading from our mongodb and writing to snowflake. I am running meltano on our k8s cluster and using airflow for orchestration and dbt for transformation.
Question: The rows in different mongodb collection differ in length. For instance, I have a collection with an average document size of 1kB and other with an average document size of 100kB. While running the sync, I have specified the batch_size_rows as 100000. And while this works, the resource utilization for my pod can vary pretty drastically(almost by a 100 times, give or take). This is obviously because of the different document sizes in these collection.
So my question: Is there a way to specify this batch_size_rows differently for each stream? So maybe I can have a batch_size_rows = 100000 when the doc size is 1kB and batch_size_rows = 1000 when the doc size is 100kB?
This is basically so that I can set proper limits on the meltano deployment on the k8s cluster and not have meltano monopolize all the resources of the underlying node and end up booting the other pods running on the said node?
tia 🙏taylor
03/17/2023, 11:31 PMabhinav_prakash
03/17/2023, 11:38 PM