Hi, I hope this message finds you well. I've encou...
# troubleshooting
p
Hi, I hope this message finds you well. I've encountered an issue with a custom data extraction process that I'd like to bring to your attention. While the process itself is functioning as expected, I'm experiencing a significant delay in target/file writing. Here are the specifics: Observations: • The data pull seems to be working as expected, but I'm noticing a delay in file writing. Specifically, I'm observing a rate of about 100 records being processed every 10 minutes. Each record varies in length, ranging from 400 to 3000 characters. Endpoint Information: • Stream Endpoint (No Parent/Child streams):
/queries/run/json?limit=100&apply_formatting=false&apply_vis=false&cache=false&force_production=true&server_table_calcs=false
• Query Response Time: < 2.5 seconds • REST api Method
POST
--data { ... }
Result Summary: • Authentication: Successful • Discovery: Successful (command:
meltano invoke tap-custom --discover
) with dynamic schema • Output: Successful (command:
meltano elt tap-custom target-jsonl
) • Time Taken for Completion: Approximately 10 minutes • Response Status: OK (HTTP 200) • Response JSON Size: 167 KB • Number of Records: 100 Issue: • The main concern lies in the extended completion time, which is quite high—taking around 50 minutes to process 500 records. This is significantly longer than expected, especially when compared to other taps that handle similar data sizes much more efficiently. ◦ After careful verification of the code using various debugging techniques, I've identified a few points:The total completion time is consistently high. ◦ The response status, the contents of Response.json, and the catalog all seem to be in order. Similarly, the final output when using the
target-jsonl
appears to be correct. ◦ Increasing the buffer size hasn't led to any noticeable improvement. ◦ Interestingly, I've observed that the tap's performance remains consistent both in a containerised environment and when running locally. Expectation: Our goal is to optimize the ETL completion time for pulling data, particularly when dealing with larger data sizes in megabytes. Sample Records JSON:
Copy code
json

[ { "history.created_time": "YYYY-MM-DD hh:mm:ss", "user.id": 9999, "user.name": "RAMBO CHARLES", "user.email": "<mailto:rcharles@example.com|rcharles@example.com>", "sql_text.text": null } // ... other records ... ]
I appreciate your attention to this matter and any insights or assistance you can provide in resolving this performance concern. If there's any additional information you require, please don't hesitate to reach out. Thank you!
v
https://docs.meltano.com/guide/troubleshooting/#:~:text=Problem%3A%20%22My%20runs%20take%20too%20long.%22 https://github.com/meltano/meltano/issues/6613#issuecomment-1215074973 Is a good starting point for performance issues. Should help isolate if you need more help come back with data from some of those steps!
After you figure out if it's the tap / target, then if you decide you want to figure out what exactly is slow in the tap/target then use a tool like https://sdk.meltano.com/en/latest/dev_guide.html#testing-performance
Based on the data you have listed, I'd run viztracer on your tap. If it were me I'd read through the code first to see if I smell anything bad but viztracer gives you an objective answer