Hi All,
I have a query regarding the processing time of Meltano VS a custom Python script I wrote ( which does the same job).
So now, I have made a custom tap to read Splunk Windows logs. For Target I'm using target-parquet ( available OOB on meltano hub).
My Job is simply to parse logs and convert them into Parquet.
So Here are the Time Stats:
----------------------------------------
For 1 GB of Splunk Windows Logs :
Meltano pipeline ( without Batch Capability) : 8 min 7 secs (average of 3 runs)
Meltano pipeline ( with Batch Capability) : 3 min 52 secs (avg. of 3 runs)
My Custom Python Script : 1 min 40 secs (avg. of 3 runs)
For 5GB of Splunk Windows Logs:
Meltano pipeline ( without Batch Capability) : 40 min 55 secs (average of 3 runs)
Meltano pipeline ( with Batch Capability) : 15 min 48 secs (avg. of 3 runs)
My Custom Python Script : 8 min 25 secs (avg. of 3 runs)
Now, I have some questions.
1)Is the time required by Meltano within the expected range?
2) Is there any way in which I can further decrease the processing time for Meltano?
( Only got one article i.e 6X YOUR SPEED USING BATCHING (
https://meltano.com/blog/6x-more-speed-for-your-data-pipelines-with-batch-messages/)
Although the above article did help me to reduce processing time as provided above in the stats, I still need to improve more on performance for my use case.
3) Is there any parameter, config etc which is there in Meltano, which can help me boost the performance?
4) Why my custom Python script is much faster than Meltano. Where is Meltano taking time? And if that can be changed?
Thanks in Advance!